Lecture 05.decision Tree and K Means PDF

Example:
Face Detection and Recognition
2
Example: Text Clustering
3
Improve Healthcare, Win $3M
 Motivation:
◦ 71M Americans are admitted to hospitals per year
◦ $30 billion was spent on unnecessary hospital admissions
◦ Can we identify earlier those most at risk and ensure they get
the treatment they need?
 Objective:
◦ Identify patients who will be admitted to a hospital within
the next year, using historical claims data.
◦ Develop new care plans and strategies to reach patients before
emergencies occur, thereby reducing the number of
unnecessary hospitalizations.
 Competition:
◦ Grand Prize: $3M
◦ Milestone Prizes: $230K/6
◦ Time: 4 April 2011 - 3 April 2013
5
Methods/algorithms
for data analysis / data mining
ICDM’06 survey: KDnuggets poll’ 11:
1. C4.5 (61)
2. K-Means (60)
3. SVM (58)
4. Apriori (52)
5. EM (48)
6. PageRank (46)
7. AdaBoost (45)
8. k-NN (45)
9. Naïve Bayes (45)
10. CART (34)
6
CLASSIFICATION BY
DECISION TREE
INDUCTION
7
BuyComputer Data
8
Classification by Decision Tree
Node: condition
Leaf: conclusion
9
General Algorithm
 Create a new node N
 If all the data belongs to the same class C Then
◦ Return N as leaf node labeled with C
 Select the “best” attribute A
 Label N with A
 For each value Ai of attribute A
◦ Select a subset Di of examples according to Ai
◦ Iterate the algorithm on Di
 EndFor
10
General Algorithm
 Create a new node N
 If all the data belongs to the same class C Then
◦ Return N as leaf node labeled with C
 Select the “best” attribute A
 Label N with A
 For each value Ai of attribute A
◦ Select a subset Di of examples according to Ai
◦ Iterate the algorithm on Di
 EndFor
11
Which Attribute is “best”?
[29+,35-] A1=? A2=? [29+,35-]
True False True False
[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]
12
Entropy
 Given a collection S, containing positive
and negative examples of some target
concept, the entropy of S relative to
this Boolean classification is
H(S) = -p+log2p+ -p-log2p-
◦ S is a sample of training examples

◦ p+ is the proportion of positive examples
◦ p- is the proportion of negative examples
c
 C-class: H ( X ) 
  p log
i 1
i 2 ( pi )
13
ID3: Information Gain
c Sv
Gain( S , A)  Entropy( S )  
vValues( A ) S
Entropy( S v )
◦ S – a collection of examples
◦ A – an attribute
◦ Values(A) – possible values of attribute A
◦ Sv – the subset of S for which attribute A has value v
 Entropy: measure of the impurity

 Information gain: measure of the effectiveness of
an attribute (expected reduction in entropy by
partitioning the examples according to an attribute)
14
Which Attribute is “best”?
E([29+,35-]) = 0.99 E([29+,35-]) = 0.99
A? B?
a b c d
[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]
E([21+,5-]) = 0.71 E([8+,30-]) = 0.74 E([18+,33-]) = 0.94 E([11+,2-]) = 0.62
Gain(S,A) = Entropy(S) Gain(S,B) = Entropy(S)

-26/64*Entropy([21+,5-]) -51/64*Entropy([18+,33-])
-38/64*Entropy([8+,30-]) -13/64*Entropy([11+,2-])
= 0.27 = 0.12
A provides greater information gain than B.

A is a better classifier than B.
15
Search in Decision Tree Learning
 Space:
◦ The set of possible decision trees
 Strategy:
◦ Simple to complex
◦ Begin with the empty tree, then more elaborate
hypotheses.
◦ The information gain measure guides the hill-
climbing search
 Property
◦ Maintain single hypothesis
◦ No backtracking
◦ Robust to noisy data
16
Attribute Selection
 Gain ratio
 Gini-index
17
Tree Pruning
Occam’s razor:
prefer the simplest hypothesis that fits the data
18
SUPPORT VECTOR
MACHINES
19
CLUSTERING
20
Visualization of the Iraq War Logs
21
Image Clustering
Break up the image

into meaningful or perceptually similar regions
22
Classification vs. Clustering
23
K-means Clustering: Example
 Step 1: pick K
point randomly
24
 Step 2: assign
data points to
closest data
center
25
 Step 3: changer
cluster center to
the
mean/average of
the assigned
data points
26
 Repeat until
convergence
27
K-means Clustering
 Initialize
◦ Pick K random points
as cluster centers
 Repeat
1. Assign data points to
closest data center
2. Changer cluster
center to the
mean/average of the
assigned data points
Until no points
assignments change
28
K-means Clustering
29
K-means: Features
 Guaranteed to converge in a finite
number of iterations
 Complexity
1. Assign data points to closest data center:
O(Kn)
2. Changer cluster center to the average of its
assigned points: O(n)
30
K-means: Convergence
31
K-means: Randomness
 Input
 Output 1
 Output 2
32
K-means: Local Minimum
33
Hierarchical Clustering
34
Distance Measures
35
36
Algorithms
Evaluation
 Unsolved problem: number of clusters
unknown
 Solution: using supervised data

◦ Purity of clusters: F-measure, confusion matrix,…
37
Segmentation – Classification
38
Report: Decision Tree Learning
 Algorithm  Tree pruning
◦ Pre-pruning
◦ Post-pruning
 Attribute selection  Numerical attribute

◦ Information gain ◦ Discretization
◦ Gain ratio
◦ …
39

Lecture 05.decision Tree and K Means PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 05.decision Tree and K Means PDF

Uploaded by

Copyright:

Available Formats

Example:

Face Detection and Recognition

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

H(S) = -p+log2p+ -p-log2p-

◦ S is a sample of training examples

 Entropy: measure of the impurity

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

E([21+,5-]) = 0.71 E([8+,30-]) = 0.74 E([18+,33-]) = 0.94 E([11+,2-]) = 0.62

Gain(S,A) = Entropy(S) Gain(S,B) = Entropy(S)

A provides greater information gain than B.

Break up the image

 Solution: using supervised data

 Attribute selection  Numerical attribute

You might also like