LINFO2262: Machine Learning: Classification and Evaluation

LINFO2262: Machine Learning
Classification and Evaluation
Pierre Dupont
ICTEAM Institute
Université catholique de Louvain – Belgium
P. Dupont (UCL Machine Learning Group) LINFO2262 1.

What is Machine Learning?
Outline
1 What is Machine Learning?

Introduction
Examples
What is a well-defined learning problem?
A mathematical viewpoint
Many disciplines, many more methods
2 Some Machine Learning Challenges
3 Course organization

What is Machine Learning? Introduction
What is Machine Learning?
Machine Learning
The science of getting computers to act without being explicitly
programmed
Construction of computer programs that automatically improve
with experience: spam filtering software, autonomous driving,
speech recognition, chess programs, etc
Induction of a general theory (i.e. a model) from observed
examples (training set) in order to apply the model to previously
unseen examples (test set)
Note: Unsupervised learning and, more generally, data mining aim at

extracting knowledge from observed data

What is Machine Learning? Examples
Autonomous Driving
DARPA Grand challenge 2005

build a robot capable of navigating 175 miles through desert terrain in
less than 10 hours, with no human intervention
The actual wining time of Stanley [Thrun et al. 05] was 6 hours 54
minutes.

Credit Risk Analysis

Data:
Customer103: (time=t0) Customer103: (time=t1) ... Customer103: (time=tn)
Years of credit: 9 Years of credit: 9 Years of credit: 9
Loan balance: $2,400 Loan balance: $3,250 Loan balance: $4,500
Income: $52k Income: ? Income: ?
Own House: Yes Own House: Yes Own House: Yes
Other delinquent accts: 2 Other delinquent accts: 2 Other delinquent accts: 3
Max billing cycles late: 3 Max billing cycles late: 4 Max billing cycles late: 6
Profitable customer?: ? Profitable customer?: ? Profitable customer?: No
... ... ...
Logical rules learned from training data:
If Other-Delinquent-Accounts > 2, and
Number-Delinquent-Billing-Cycles > 1
Then Profitable-Customer? = No
[Deny Credit Card application]
If Other-Delinquent-Accounts = 0, and
(Income > $30k) OR (Years-of-Credit > 3)
Then Profitable-Customer? = Yes
[Accept Credit Card application]

Medical prognosis
Data:
Patient103 time=1 Patient103 time=2 ... Patient103 time=n
Age: 23 Age: 23 Age: 23

FirstPregnancy: no FirstPregnancy: no FirstPregnancy: no
Anemia: no Anemia: no Anemia: no
Diabetes: no Diabetes: YES Diabetes: no
PreviousPrematureBirth: no PreviousPrematureBirth: no PreviousPrematureBirth: no
Ultrasound: ? Ultrasound: abnormal Ultrasound: ?
Elective C−Section: ? Elective C−Section: no Elective C−Section: no
Emergency C−Section: ? Emergency C−Section: ? Emergency C−Section: Yes
... ... ...
One of 18 learned logical rules:
If No previous normal delivery, and

Abnormal 2nd Trimester Ultrasound, and
Malpresentation at admission
Then Probability of Emergency C-Section is 0.6

Software that Customizes to User

PageRank
Likely, the most frequently used algorithm in the world
The PageRank algorithm learns the ranking of web pages from the
(changing) hyperlink structure of the Internet

Many many more applications. . .

What is Machine Learning? What is a well-defined learning problem?
What is a Learning Problem?
Learning = Improving with experience at some task

Improve over task T ,
with respect to performance measure Perf ,
based on experience E.

Example: Gender Recognition
T : Determine the gender of people

E: some features assumed to be representative of gender in a set
of labeled images (= training data)
Perf : % of people correctly recognized in new and unlabeled
images
Note: one needs to collect, pre-process and label the training data!

Choose the Target Function and Features
Target function
A direct mapping
f : Image → Gender
An integer coding for the person’s gender
f : Image → N
A probability estimate of the person’s gender
f : Image, Gender → [0, 1]
Features to be extracted or computed from an Image

Pixel intensities (black or white, grey intensity or RGB coding)
Resolution?
Shade information?
Shape information?

Choose Representation for Function to be Learned
Target function representation

Collection of logical rules encoded in a decision tree?
Linear or Polynomial function of numerical features from the

image?
Probability distribution?
A weighted sum of new implicit features computed from the input

features?
...

What is Machine Learning? A mathematical viewpoint
Learning is the estimation of a mathematical function
Mathematical viewpoint
Supervised learning is the estimation of a function f : X → Y mapping
the space X of input data to some output Y . The estimation is based
on a finite training set of input data for which the mapping is known:
{(x 1 , y1 ), (x 2 , y2 ), . . . , (x n , yn )}
Two standard cases:

Supervised classification: Y is discrete or even binary{−1, 1}
Regression: Y is continuous
In both cases, the input space X may be made of continuous or
discrete features, a mix of both, or even more complex object features

Supervised classification example

Gender recognition task
A vectorial input space X

I a grey intensity [0, 255] for each pixel
 
...
I each image can be represented by a vector 104
 
 ... 
x of pixel intensities  
I 1024x768 = 786432 dimensions  75 
...
A binary output space Y = {−1, 1} coding
for Male versus Female

A Representation for the Function to be Learned
X
g(x) = w j · xj
j
xj ’s are the image features: grayscale intensity for each pixel

x is a vector of pixel intensities
wj is the weight of the j-th feature to be estimated from learning
examples
Simple decision function

f (x) = sign(g(x))
If f (x) == +1 then Female else Male

Design Choices
Type of training experience
DNA Picture FingerPrint
Target Function
Picture −> Gender Picture −> scalar Picture,Gender −> [0,1]
Feature Extraction
RGB Coding Compressed Format

GrayScale Intensities
Function Representation
Logical rules Linear Function
Polynomial Function
Learning Algorithm
Convex Optimization
Perceptron Gradient Descent

Regression
Time series prediction
One-dimensional input space X : time indexes x1 , x2 , . . .

One-dimensional real output value Y : e.g. a traffic load, local
temperature, currency exchange rate, etc
Note: general regression problems may use multi-dimensional inputs

and/or outputs
What is Machine Learning? Many disciplines, many more methods
Some Relevant Disciplines

Probability and statistics
I real data include noise and hidden variables
I some models are essentially estimates of probability distributions
I performance assessment requires refined statistical analysis
Mathematical optimization
I fitting a model to some data is often formulated as a constrained
optimization problem
I links exist with operational research
Artificial intelligence
Computational complexity theory
Information theory
Control Theory
Signal Processing
...
What is Machine Learning? Many disciplines, many more methods
Many model classes and learning algorithms

A non-exhaustive list of learning methods and model classes:
Tree-based models : Decision Trees, Random Forests, . . .

Linear discriminants : Perceptron, Fisher Linear Discriminants, . . .
Kernel methods : Support Vector Machines, Gaussian Processes
Deep learning : Deep Multi-layer Perceptrons, Convolutional
Neural Nets, Generative Adversarial Networks, . . .
Probabilistic models : Naive Bayes Classifier, Gaussian Classifier,
Logistic Regression, . . .
Instance based methods : k-Nearest Neighbor, LVQ, . . .
Graphical models : Bayesian Networks, Markov Logic Networks, . . .
Sequential models : Markov Chains, Kalman Filters, Probabilistic
Automata, Hidden Markov Models, Conditional Random
Fields, . . .
Some Machine Learning Challenges
Outline

Fundamental questions
Good and bad news

Some Machine Learning Challenges Fundamental questions
Some Recurrent Questions in Machine Learning
What algorithms can approximate functions well (and when)?

How does the number of training examples influence accuracy?
How does the complexity of the target function representation
impact it?
How does noisy data influence accuracy?
Which are the theoretical limits of learnability?
How to estimate the performance of a predictive model in a
statistically sound way?
How can prior knowledge of learner help?
Can we understand what the computer system has learned?
How to optimally combine several models?

Some Machine Learning Challenges Good and bad news
Good news
Bayes classifier is optimal
A probabilistic view of the classification problem

x: input random variables = feature values computed from the
input data
y : output random variable = class label 1, . . . , C
MAP decision rule

Choose class
y ∗ = argmaxy∈{1,...,C} P(y|x) = argmaxy∈{1,...,C} P(x|y)P(y)
Good news from Bayes Decision Theory: MAP decision rule is

optimal as it minimizes the probability of classification error

Bad news
Bayes classifier is often impossible to estimate reliably
P(y) and P(x|y ) need to be reliably estimated from the training
data and/or prior knowledge
I P̂(y ) easy
I P̂(x|y ) often very hard
Example: Gender classification from grey-coded pictures
I P̂(y = Female) = .52, P̂(y = Male) = .48
I P̂(x|y ) = ?
1024 ∗ 768 = 786432 pixels ⇒ x1 , x2 , . . . , x786432
[0, 255] = 256 intensity values
For each of the 2 classes:
6
256786432 ≈ 1010 parameters to estimate!!
Inductive bias
Need to restrict the set of possible distributions (e.g. Gaussian)
The relevant inductive bias is hard to choose
Good news
There is a wide range of mathematical functions to build a model
Generalization as an (often implicit) search through a space of possible
models. Example: regression with M-degree polynomials
training data
target function
predictive model
Illustrations from Pattern Recognition and Machine Learning, C. Bishop, Springer, 2006

Bad news
Learning is impossible without inductive bias
The need for an inductive bias

If the function class is rich enough, you will always find at least 2
functions perfectly fitting the training data but predicting exactly
the opposite on new data
There is no way to tell which function (= predictive model) to
choose
Restriction of the function class is a necessary inductive bias

The need for capacity control and regularization

degree 1 degree 9 degree 9, smoother
Recall that the target model is unknown!
Capacity control: restrict the function class to be rich enough but

not too rich (inductive bias is required)
Regularization: favor smooth models in the chosen function class
to avoid overfitting
Illustrations from Pattern Recognition and Machine Learning, C. Bishop, Springer, 2006

From William of Ockham to Vladimir Vapnik
Ockham’s Razor: Principle of Simplicity (14th century)

Entia non sunt multiplicanda praeter necessitatem
Among all theories explaining the world equally well,
the simplest is the best
Vapnik’s Statistical Learning Theory (20th century)

s
1 2n 4
R[f ] ≤ Remp [f ] + h ln + 1 + ln
|{z} | {z } n h δ
True Error Training Error | {z }
Capacity term

A few words of caution

Machine learning is not magic: good generalizations are possible
from examples but the examples matter as well as the inductive
bias and the learning algorithm
Look at your data, Look at your data, LOOK
at your data
Check your model, CHECK your model:
I try to understand the acquired knowledge in your model
I confront its predictions with your expectations
Define the learning task properly
I Be prepared to do things (semi-)manually first, before automating
them
I Pre-process your learning data (data cleansing, feature extraction,
noise reduction, . . . )
I Do not confuse learning a predictive model and using a predictive
model
I Useless to learn existing knowledge (ex. chess rules)

Course organization
Outline

Course organization
Course objectives
A student completing successfully this course will be able to

understand and apply standard techniques to build computer
programs that automatically improve with experience
assess the quality of a learned model for a given task
assess the relative performance of several learning algorithms
justify the use of a particular learning algorithm given the nature of
the data, the learning problem and a relevant performance
measure
use, adapt and extend learning software (in Python)

Course organization
Some References
Pattern Recognition and Machine Learning, C. Bishop, Springer, 2006.

An Introduction to Statistical Learning, G. James, D. Witten, T. Hastie, R.
Tibshirani, Springer, 2013.
The Elements of Statistical Learning, T. Hastie, R. Tibshirani, N. Friedman, 2nd
edition, Springer, 2009.
Deep Learning, I. Goodfellow, Y. Bengio, A. Courville, MIT Press, 2016.

Course organization
Instructors
Prof. Pierre Dupont

Office: Réaumur A.142
Teaching Assistants: Alexander Gerniers, Victor Hamer
Office: Réaumur A.337.10

Course organization
Course organization
Lectures (usually on Friday at 10:45)
Review slides, look at references
Theoretical questions and practical projects on

inginious.info.ucl.ac.be/course/LINFO2262

Course organization

Course organization
Evaluation
First session (June): 100% of the global grade is based on the

Inginious assignments (no (re-)submission after the deadline)
Relative weights between assignments

I A1: 10% Decision Trees, Ensemble of Trees
I A2: 15% Linear Models, Support Vector Machines
I A3: 10% Evaluation Protocols - Performance Assessment
I A4: 15% Deep Learning
I A5: 50% ML Competition
Second session (August)

I A1 ⇒ A4: 50% of the global grade (projects are not re-evaluated)
I A5 replaced by a written exam (closed book): 50 %

Course organization
Course website
https://moodle.uclouvain.be/course/view.php?id=1836
Assignments
Submit your results and/or Python code through
inginious.info.ucl.ac.be/course/LINFO2262
in due time ⇒ check inginious deadlines, e.g. at 23:00 the day
before the next lecture
Online feedback
I Theory: feedback after submission is closed
I Practical Problems: warm-up questions: real time feedback (no
impact on the grade)
I Practical Problems: test questions: feedback after submission is
closed
Do not expect the inginious server to play the role of a Python
debugger!

Course organization
UCLouvain anti-plagiarism policy

Submitting your answers and Python code on inginious implies that
you agree with the following:
I hereby certify that the results and code that I will submit for this project is coming
from my own work. The submitted works will not be (even partial) copy/paste from the
work of other students. Re-use of publicly available code fragments (e.g. from Stack
Overflow) is allowed, provided any such use is explictly quoted as a comment in your
submitted code.
I also certify that I will not distribute any answer or code related to these projects,
in person or on any repository (github, bitbucket, Facebook groups, etc.) accessible to
anybody, even after the deadlines.
Any violation of the above statements will be considered as cheating or plagiarism and
will be reported as such to the President of the Jury.
Automated similarity checks between various student codes +

answers will be performed
You are more than welcome to exchange ideas/questions openly
on the Moodle student forum.
Conclusion
Take Home Message
Machine Learning
ML is about making computers learn from experience
Birds and planes do not fly the same way

I Computers do not tend to learn as humans or animals even though
biological learning systems inspire some ML research
I Machine learning 6= cognitive science
ML is fun and novel applications arise every day

LINFO2262: Machine Learning: Classification and Evaluation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LINFO2262: Machine Learning: Classification and Evaluation

Uploaded by

Copyright:

Available Formats

LINFO2262: Machine Learning

Classification and Evaluation

P. Dupont (UCL Machine Learning Group) LINFO2262 1.

1 What is Machine Learning?

2 Some Machine Learning Challenges

P. Dupont (UCL Machine Learning Group) LINFO2262 2.

What is Machine Learning?

Note: Unsupervised learning and, more generally, data mining aim at

P. Dupont (UCL Machine Learning Group) LINFO2262 3.

DARPA Grand challenge 2005

P. Dupont (UCL Machine Learning Group) LINFO2262 4.

Credit Risk Analysis

P. Dupont (UCL Machine Learning Group) LINFO2262 5.

Age: 23 Age: 23 Age: 23

One of 18 learned logical rules:

If No previous normal delivery, and

P. Dupont (UCL Machine Learning Group) LINFO2262 6.

Software that Customizes to User

P. Dupont (UCL Machine Learning Group) LINFO2262 7.

P. Dupont (UCL Machine Learning Group) LINFO2262 8.

Many many more applications. . .

P. Dupont (UCL Machine Learning Group) LINFO2262 9.

What is a Learning Problem?

Learning = Improving with experience at some task

with respect to performance measure Perf ,

P. Dupont (UCL Machine Learning Group) LINFO2262 10.

Example: Gender Recognition

T : Determine the gender of people

P. Dupont (UCL Machine Learning Group) LINFO2262 11.

Choose the Target Function and Features

Features to be extracted or computed from an Image

P. Dupont (UCL Machine Learning Group) LINFO2262 12.

Choose Representation for Function to be Learned

Target function representation

Linear or Polynomial function of numerical features from the

A weighted sum of new implicit features computed from the input

P. Dupont (UCL Machine Learning Group) LINFO2262 13.

Learning is the estimation of a mathematical function

Two standard cases:

P. Dupont (UCL Machine Learning Group) LINFO2262 14.

Supervised classification example

A vectorial input space X

P. Dupont (UCL Machine Learning Group) LINFO2262 15.

A Representation for the Function to be Learned

xj ’s are the image features: grayscale intensity for each pixel

Simple decision function

P. Dupont (UCL Machine Learning Group) LINFO2262 16.

DNA Picture FingerPrint

Picture −> Gender Picture −> scalar Picture,Gender −> [0,1]

RGB Coding Compressed Format

Logical rules Linear Function

P. Dupont (UCL Machine Learning Group) LINFO2262 17.

One-dimensional input space X : time indexes x1 , x2 , . . .

Note: general regression problems may use multi-dimensional inputs

Some Relevant Disciplines

Many model classes and learning algorithms

Tree-based models : Decision Trees, Random Forests, . . .

1 What is Machine Learning?

2 Some Machine Learning Challenges

P. Dupont (UCL Machine Learning Group) LINFO2262 21.

Some Recurrent Questions in Machine Learning

What algorithms can approximate functions well (and when)?