Professional Documents
Culture Documents
Learning Trained
algorithm machine
TRAINING
DATA Answer
OCR
103 Machine vision
HWR
102
Bioinformatics
10
• Medicine:
– Screening
– Diagnosis and prognosis
– Drug discovery
• Security:
– Face recognition
– Signature / fingerprint /
iris verification
– DNA fingerprinting
Computer / Internet
• Computer interfaces:
– Troubleshooting wizards
– Handwriting and speech
– Brain waves
• Internet
– Hit ranking
– Spam filtering
– Text categorization
– Text translation
– Recommendation
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
• Hundreds new every year
• Every machine learning algorithm has three
components:
– Representation
– Evaluation
– Optimization
Machine Learning 7
Representation
• Decision trees
• Sets of rules / Logic programs
• Instances
• Graphical models (Bayes/Markov nets)
• Neural networks
• Support vector machines
• Model ensembles
• Etc.
Machine Learning 8
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• Etc.
Machine Learning 9
Optimization
• Combinatorial optimization
– E.g.: Greedy search
• Convex optimization
– E.g.: Gradient descent
• Constrained optimization
– E.g.: Linear programming
Machine Learning 10
Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
• Unsupervised learning
– Training data does not include desired outputs
• Semi-supervised learning
– Training data includes a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Machine Learning 11
Supervised Learning
Learning Through Examples
Supervised Learning
1 9
1 3 16 36
4 6 25 4
5 2
What We’ll Cover
Supervised learning
Decision tree induction
Neural networks
Rule induction
Instance-based learning
Bayesian learning
Support vector machines
Model ensembles
Learning theory
Machine Learning 15
Classification: Decision Trees
2 5 X
16
Classification: Neural Nets
17
Decision Tree Learning
Learning Through Examples
Learning decision trees
Problem: decide whether to wait for a table at a restaurant,
based on the following attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some,
Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-
60, >60)
Attribute-based representations
• Examples described by attribute values (Boolean, discrete, continuous)
• E.g., situations where I will/won't wait for a table:
Gain( S , Ai ) = H ( S) − P( A
vValues ( Ai )
i = v) H ( Sv )
Boolean
functions
with the same
number of
ones and
zeros have
largest
entropy
–Does Entropy Make Sense?
– If an event conveys information, that means it’s
a surprise.
– If an event always occurs, P(Ai)=1, then it carries
no information. -log2(1) = 0
– If an event rarely occurs (e.g. P(Ai)=0.001), it
carries a lot of info. -log2(0.001) = 9.97
– The less likely (uncertain) the event, the more
the information it carries since, for 0 P(Ai) 1,
-log2(P(Ai)) increases as P(Ai) goes from 1 to 0.
• (Note: ignore events with P(Ai)=0 since they never
occur.)
Information Gain
• Measures the expected reduction in entropy
caused by partitioning
• Statistical quantity measuring how well an
attribute classifies the data.
Gain( S , Ai ) = H ( S) − P( A
vValues ( Ai )
i = v) H ( Sv )
• Trivially, there is a consistent decision tree for any training set with one
path to leaf for each example (unless f nondeterministic in x) but it
probably won't generalize to new examples
61
Ockham’s Razor
PREDICTED CLASS
Class=Yes Class=No
Class=Yes TP FN
ACTUAL Class=No FP TN TP (true positive)
CLASS FN (false negative)
• TP: predicted to be in YES, and is actually in it
• FP: predicted to be in YES, but is not actually in it FP (false positive)
• TN: predicted not to be in YES, and is not actually in it TN (true negative)
• FN: predicted not to be in YES, but is actually in it
Accuracy
Metrics for Performance Evaluation…
PREDICTED CLASS
Class=Yes Class=No
ACTUAL Class=Yes TP FN
CLASS
Class=No FP TN
68
Classifier Evaluation Metrics:
Precision and Recall, and F-measures
• Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive TP
precision =
TP + FP