Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
0 of .
Results for:
P. 1
2003Data Mining Tut 3

2003Data Mining Tut 3

Ratings: (0)|Views: 12 |Likes:

Availability:

See more
See less

11/12/2011

pdf

text

original

Intelligent Data Analysis and Probabilistic InferenceData Mining Tutorial 3: Clustering and Associations Rules
1.i.Explain the operation of the k-means clustering algorithm using pseudo code.ii.Given the following eight points, and assuming initial cluster centroids given by A, B, C,and that a Euclidean distance function is used for measuring distance between points, usek-means to show only the three clusters and calculate their new centroids after the secondround of execution.2.i.Explain the meaning of support and confidence in the context of association rulediscovery algorithms and explain how the a priori heuristic can be used to improve theefficiency of such algorithms.ii.Given the transactions described below, find all rules between single items that havesupport >= 60%. For each rule report both support and confidence.1: (Beer)2: (Cola, Beer)3: (Cola, Beer)4: (Nuts, Beer)5: (Nuts, Cola, Beer)6: (Nuts, Cola, Beer)7: (Crisps, Nuts, Cola)8: (Crisps, Nuts, Cola, Beer)9: (Crisps, Nuts, Cola, Beer)10:(Crisps, Nuts, Cola, Beer)yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk 16
th
Dec2003IDXYA210B25C84D58E75F64G12H49

3.a. Explain how hierarchical clustering algorithms work, make sure your answer describes what ismeant by a linkage method and how it is used. b. Explain the advantages and disadvantages of hierarchical clustering compared to K-meansclustering.4.The following table shows the distance matrix between five genes,G1G2G3G4G5G10G290G3370G46590G51110280i.Based on a complete linkage method show the distance matrix between the first formedcluster and the other data points.ii.Draw a dendrogram showing the full hierarchical clustering tree for five points based oncomplete linkage.iii.Draw a dendrogram showing the full hierarchicatree for the five points based on singlelinkage.yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk 16
th
Dec2003

1.Clusters after 1
st
iterationCluster1: A (2,10), D (5,8), H (4,9)Cluster2: B: B (2,5), G (1,2)Cluster3: C (8,4), E (7,5), F (6,4)Centroids after 1
st
iterationCluster1: centroid: (3.66, 9)Cluster2: centroid: (1.5, 3.5)Cluster3: centroid: (7, 4.33)Clusters after 2
nd
iteration(no change)Cluster1: A (2,10), D (5,8), H (4,9)Cluster2: B: B (2,5), G (1,2)Cluster3: C (8,4), E (7,5), F (6,4)Centroids after 2
nd
iteration (no change)Cluster1: centroid: (3.66, 9)Cluster2: centroid: (1.5, 3.5)Cluster3: centroid: (7, 4.33)
2.
Initial Supports Beer: Support = 9/10Cola: Support=8/10 Nuts: Support=7/10Crisps: Support=4/10 (Drop Crisps) Beer, Cola: Support=7/10 Beer, Nuts: Support=6/10Cola, Nuts: Support=6/10  Beer->Cola (Support=70%, Confidence= 7/9=77%Cola->Beer (Support=70%, Confidence= 7/8=87.5 Beer->Nuts (Support=60%, Confidence= 6/9=66% Nuts->Beer (Support= 60%, Confidence= 6/7=85.7%Cola->Nuts (Support=60%, Confidence= 6/8=75% Nuts->Cola (Support=60%, Confidence= 6/7=85.7%
yg@doc.ic.ac.uk, mmg@doc.ic.ac.uk 16
th
Dec2003