Professional Documents
Culture Documents
Outline
• DT
• KNN
• Unsupervised algorithm
• Clustering
DT
Humidity Wind
Yes
No Yes No Yes
(Outlook = Sunny and Humidity = Normal)
or
(Outlook = Overcast)
or
(Outlook = Rain and Wind = Weak)
Tree Algorithms
ID3
C4.5 C5.0
Iterative Dichotomiser 3
CART
Classi cation &
Regression Trees
fi
Impurity Function
Yes - 9
No - 5
http://www.cs.princeton.edu/courses/archive/spr07/cos424/papers/mitchell-dectrees.pdf
Information Gain
• Entropy — 9+, 5-
( 9 ) 14 (5 )
9 14 5 14
E= log2 + log2 = 0.940
• 14
Information Gain
• Outlook:
5 4 5
G(S, Outlook) = 0.940 − × 0.9709 − ×0− × 0.9709
14 14 14
• Two methods:
Temperature > C
Yes No
48 + 60 80 + 90
= 54 = 85
2 2
With these new features, construct the DT by maximising the information gain
DT — Regression/Classi cation
Count 14
Mean 39.7857143
S 9.32108647
CV 0.23428225
Root Node
SDR = 9.32- (4/14)*3.49 -(5/14)*7.78 -(5/14)*10.87
S Mean Count
S Mean Count
S Mean Count
SDR = 0.27
Humidity High 9.3634112 37.5714286 7
Normal 8.73416935 42 7
S Mean Count
SDR = 0.28
Windy TRUE 10.5934991 37.6666667 6
1 175 80 Male
2 160 58 Female
3 179 78 Male
4 163 68 Female
5 159 75 Female
6 180 77 Male
7 183 75 Male
8 158 69 ?
fi
KNN Regressor
Height Age Weight
1 5 45 77
2 5.11 26 47
3 5.6 30 55
4 5.9 34 59
5 4.8 40 72
6 5.8 36 60
7 5.3 19 40
8 5.8 28 60
9 5.5 23 45
10 5.6 32 58
11 5.5 38 ?
KNN Regressor
Milk Milk
Milk
Internet Bread Bread
Bread
Butter Sugar
Toothpaste
Biscuit Rice
Mouthwash
Veggie
Call Duration
Hierarchical Bayesian
Decision Non
Divisive Agglomerative
Based Parametric
Partitional
Model Graph
Centroid Spectral
Based Theoretic
K-means
Clustering Tech.
• Hierarchical — nd successive clusters using previously found
clusters
John Snow
Image Segmentation
https://www.mathworks.com/matlabcentral/ leexchange/41967-fast-fuzzy-c-means-image-segmentation
fi
Hierarchical Clustering
Distance Measure
• Minimum distance — distance between nearest points
between clusters (Single Linkage/ Nearest Neighbour)
https://stanford.edu/~cpiech/cs221/handouts/kmeans.html