Professional Documents
Culture Documents
Pierre Dupont
ICTEAM Institute
Université catholique de Louvain – Belgium
Outline
3 Course organization
Machine Learning
The science of getting computers to act without being explicitly
programmed
Construction of computer programs that automatically improve
with experience: spam filtering software, autonomous driving,
speech recognition, chess programs, etc
Induction of a general theory (i.e. a model) from observed
examples (training set) in order to apply the model to previously
unseen examples (test set)
Autonomous Driving
The actual wining time of Stanley [Thrun et al. 05] was 6 hours 54
minutes.
Medical prognosis
Data:
Patient103 time=1 Patient103 time=2 ... Patient103 time=n
PageRank
Likely, the most frequently used algorithm in the world
The PageRank algorithm learns the ranking of web pages from the
(changing) hyperlink structure of the Internet
based on experience E.
Target function
A direct mapping
f : Image → Gender
An integer coding for the person’s gender
f : Image → N
A probability estimate of the person’s gender
f : Image, Gender → [0, 1]
Probability distribution?
...
Mathematical viewpoint
Supervised learning is the estimation of a function f : X → Y mapping
the space X of input data to some output Y . The estimation is based
on a finite training set of input data for which the mapping is known:
{(x 1 , y1 ), (x 2 , y2 ), . . . , (x n , yn )}
X
g(x) = w j · xj
j
Design Choices
Type of training experience
Target Function
Feature Extraction
Function Representation
Polynomial Function
Learning Algorithm
Convex Optimization
Perceptron Gradient Descent
Regression
Time series prediction
Outline
3 Course organization
Good news
Bayes classifier is optimal
Bad news
Bayes classifier is often impossible to estimate reliably
P(y) and P(x|y ) need to be reliably estimated from the training
data and/or prior knowledge
I P̂(y ) easy
I P̂(x|y ) often very hard
Example: Gender classification from grey-coded pictures
I P̂(y = Female) = .52, P̂(y = Male) = .48
I P̂(x|y ) = ?
1024 ∗ 768 = 786432 pixels ⇒ x1 , x2 , . . . , x786432
[0, 255] = 256 intensity values
For each of the 2 classes:
6
256786432 ≈ 1010 parameters to estimate!!
Inductive bias
Need to restrict the set of possible distributions (e.g. Gaussian)
The relevant inductive bias is hard to choose
P. Dupont (UCL Machine Learning Group) LINFO2262 24.
Some Machine Learning Challenges Good and bad news
Good news
There is a wide range of mathematical functions to build a model
Generalization as an (often implicit) search through a space of possible
models. Example: regression with M-degree polynomials
training data
target function
predictive model
Illustrations from Pattern Recognition and Machine Learning, C. Bishop, Springer, 2006
Bad news
Learning is impossible without inductive bias
Outline
3 Course organization
Course objectives
Some References
Instructors
Course organization
Evaluation
Course website
https://moodle.uclouvain.be/course/view.php?id=1836
Assignments
Submit your results and/or Python code through
inginious.info.ucl.ac.be/course/LINFO2262
in due time ⇒ check inginious deadlines, e.g. at 23:00 the day
before the next lecture
Online feedback
I Theory: feedback after submission is closed
I Practical Problems: warm-up questions: real time feedback (no
impact on the grade)
I Practical Problems: test questions: feedback after submission is
closed
Do not expect the inginious server to play the role of a Python
debugger!
Machine Learning
ML is about making computers learn from experience