Professional Documents
Culture Documents
OF STATISTICS AND
MACHINE LEARNING
STATISTICS - boring
MACHINE LEARNING – exciting!
STATISTICS - boring
Predictions Model
“IT’S DIFFICULT TO MAKE PREDICTIONS,
ESPECIALLY ABOUT THE FUTURE”
Example of ML predictions:
Observation: Among flu patients, those who take medicine have a longer
recovery
If you take medicine for flu then you have longer recovery.
Obviously wrong!
Just because a machine learning,
data mining, or data analysis
application outputs a result - it
doesn’t mean that it’s right
classification or
clustering
categorization
Continuous
17
MACHINE LEARNING PROBLEMS
classification or
clustering
categorization
Continuous
18
EXAMPLE: TITANIC DATASET
Label Features
19
21
22
23
DECISION TREE
survive pclass
1
2 3
age
25
WHAT IS A CLASSIFIER
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Slide credit: L. Lazebnik
THE MACHINE LEARNING FRAMEWORK
y = f(x)
output prediction features
function
x
x
x o
x x
x
o x
o x
o o
o
o
x2
x1
CLASSIFIER OVERVIEW
RECAP: DECISION TREES
Representation
• A set of rules: IF…THEN conditions
Evaluation
• coverage: # of data points that satisfy conditions
• accuracy = # of correct predictions / coverage
Optimization
• Build decision tree that maximize accuracy
ML PIPELINE (SUPERVISED)
31
FEATURES Fact Table
Product
- Product_ID
Fact Table - Shop_ID - Type_ID
- Customer_ - Brand_ID
- Shop_ID
ID - Length
- Customer_
- Date_ID - Height
ID
- Product_ID - Depth
- Date_ID
- Amount - Weight
- Product_ID
- Volume - …
- Amount
- Volume - Profit
- Profit - Delivery Product_Type Brand
- … Time
- … - Type_ID - Brand_ID
- Name - Name
- Description - …
- …
Histograms
Spam