You are on page 1of 38

Example:

Face Detection and Recognition

2
Example: Text Clustering

3
Improve Healthcare, Win $3M
 Motivation:
◦ 71M Americans are admitted to hospitals per year
◦ $30 billion was spent on unnecessary hospital admissions
◦ Can we identify earlier those most at risk and ensure they get
the treatment they need?
 Objective:
◦ Identify patients who will be admitted to a hospital within
the next year, using historical claims data.
◦ Develop new care plans and strategies to reach patients before
emergencies occur, thereby reducing the number of
unnecessary hospitalizations.
 Competition:
◦ Grand Prize: $3M
◦ Milestone Prizes: $230K/6
◦ Time: 4 April 2011 - 3 April 2013

5
Methods/algorithms
for data analysis / data mining
ICDM’06 survey: KDnuggets poll’ 11:

1. C4.5 (61)
2. K-Means (60)
3. SVM (58)
4. Apriori (52)
5. EM (48)
6. PageRank (46)
7. AdaBoost (45)
8. k-NN (45)
9. Naïve Bayes (45)
10. CART (34)

6
CLASSIFICATION BY
DECISION TREE
INDUCTION

7
BuyComputer Data

8
Classification by Decision Tree

Node: condition

Leaf: conclusion

9
General Algorithm
 Create a new node N
 If all the data belongs to the same class C Then
◦ Return N as leaf node labeled with C
 Select the “best” attribute A
 Label N with A
 For each value Ai of attribute A
◦ Select a subset Di of examples according to Ai
◦ Iterate the algorithm on Di
 EndFor

10
General Algorithm
 Create a new node N
 If all the data belongs to the same class C Then
◦ Return N as leaf node labeled with C
 Select the “best” attribute A
 Label N with A
 For each value Ai of attribute A
◦ Select a subset Di of examples according to Ai
◦ Iterate the algorithm on Di
 EndFor

11
Which Attribute is “best”?

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

12
Entropy
 Given a collection S, containing positive
and negative examples of some target
concept, the entropy of S relative to
this Boolean classification is

H(S) = -p+log2p+ -p-log2p-

◦ S is a sample of training examples


◦ p+ is the proportion of positive examples
◦ p- is the proportion of negative examples

c
 C-class: H ( X ) 
  p log
i 1
i 2 ( pi )

13
ID3: Information Gain
c Sv
Gain( S , A)  Entropy( S )  
vValues( A ) S
Entropy( S v )

◦ S – a collection of examples
◦ A – an attribute
◦ Values(A) – possible values of attribute A
◦ Sv – the subset of S for which attribute A has value v

 Entropy: measure of the impurity


 Information gain: measure of the effectiveness of
an attribute (expected reduction in entropy by
partitioning the examples according to an attribute)

14
Which Attribute is “best”?
E([29+,35-]) = 0.99 E([29+,35-]) = 0.99
A? B?

a b c d

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

E([21+,5-]) = 0.71 E([8+,30-]) = 0.74 E([18+,33-]) = 0.94 E([11+,2-]) = 0.62

Gain(S,A) = Entropy(S) Gain(S,B) = Entropy(S)


-26/64*Entropy([21+,5-]) -51/64*Entropy([18+,33-])
-38/64*Entropy([8+,30-]) -13/64*Entropy([11+,2-])
= 0.27 = 0.12

A provides greater information gain than B.


A is a better classifier than B.
15
Search in Decision Tree Learning
 Space:
◦ The set of possible decision trees
 Strategy:
◦ Simple to complex
◦ Begin with the empty tree, then more elaborate
hypotheses.
◦ The information gain measure guides the hill-
climbing search
 Property
◦ Maintain single hypothesis
◦ No backtracking
◦ Robust to noisy data
16
Attribute Selection
 Gain ratio

 Gini-index

17
Tree Pruning

Occam’s razor:
prefer the simplest hypothesis that fits the data
18
SUPPORT VECTOR
MACHINES

19
CLUSTERING

20
Visualization of the Iraq War Logs

21
Image Clustering

Break up the image


into meaningful or perceptually similar regions
22
Classification vs. Clustering

23
K-means Clustering: Example
 Step 1: pick K
point randomly

24
K-means Clustering: Example
 Step 2: assign
data points to
closest data
center

25
K-means Clustering: Example
 Step 3: changer
cluster center to
the
mean/average of
the assigned
data points

26
K-means Clustering: Example
 Repeat until
convergence

27
K-means Clustering
 Initialize
◦ Pick K random points
as cluster centers
 Repeat
1. Assign data points to
closest data center
2. Changer cluster
center to the
mean/average of the
assigned data points
Until no points
assignments change

28
K-means Clustering

29
K-means: Features
 Guaranteed to converge in a finite
number of iterations

 Complexity
1. Assign data points to closest data center:
O(Kn)
2. Changer cluster center to the average of its
assigned points: O(n)

30
K-means: Convergence

31
K-means: Randomness

 Input

 Output 1

 Output 2

32
K-means: Local Minimum

33
Hierarchical Clustering

34
Distance Measures

35
36
Algorithms
Evaluation
 Unsolved problem: number of clusters
unknown

 Solution: using supervised data


◦ Purity of clusters: F-measure, confusion matrix,…

37
Segmentation – Classification

38
Report: Decision Tree Learning
 Algorithm  Tree pruning
◦ Pre-pruning
◦ Post-pruning

 Attribute selection  Numerical attribute


◦ Information gain ◦ Discretization
◦ Gain ratio
◦ …

39

You might also like