TBMI 26
Ola Friman
ola.friman@foi.se
Lecture 1
Introduction
Course information
www.imt.liu.se/edu/courses/TBMI26
2
Please register!
The Course  Overview
Lectures
Classes
1 laboratory exercise
4 assignments
Exam
3
The Course  Lectures
1. Introduction
2. Supervised learning  Linear models
3. Supervised learning  Nonlinear models
4. Unsupervised learning  Clustering
5. Unsupervised learning Discovering structure
6. Semisupervised learning & Kernel methods
7. Ensemble learning & Boosting methods
8. Reinforcement learning
9. Genetic algorithms
4
The Course  Classes
One class after each lecture
Two groups: A & B
Exercises
Preparations and help with assignments
5
The Course  Laboratory exercise
Introduction lab
In pairs or alone.
30 places in the computer lab at the
Department of Biomedical Engineering
Two groups
Matlab
6
How to find the Department of Biomedical
Engineering/ Institutionen fr Medicinsk
Teknik?
http://www.imt.liu.se
7
The Course  Assignments
4 assignments
Neural networks and Support Vectors Machines
Unsupervised learning PCA & CCA
Boosting techniques
Reinforcement learning
In pairs or alone.
4h supervision/help may not be able to
complete assignments in this time.
Preparations required!
Matlab
Written report that convinces us that you have
completed the assignment.
8
Course literature
C.M Bishop, Pattern Recognition
and Machine Learning
Supplementary material:
 Canonical correlation analysis
 Independent component analysis
 Genetic algorithms
Exercise collection
Lecture notes!
9
Prerequisites
Linear algebra
Vectors, scalar product, norm, eigenvalues
Multidimensional calculus
Partial derivatives
Mathematical statistics
Mean, variance, covariance, correlation, Gaussian
distribution,
Programming
General programming knowledge
Matlab knowledge helps a lot
10
Applications of AI & machine learning
Pattern recognition
Robots & autonomous systems
Models of the brain
Games
Evolutionary systems
Data mining
Expert systems & decision support
11
Pattern recognition examples
Face recognition
Medical image segmentation
Pedestrian detection
Speech recognition
Optical Character Recognition
12
Source: www.gecameras.com.au
Source: Autoliv Night Vision
Source: http://blog.damiles.com
Machine learning vs. AI
13
AI
ML
Depends on who you ask.
ML
AI
symbolic processing, logic, rule
based, expert systems, knowledge
representation, databases,
planning, reasoning, etc. etc.
"adapt to new circumstances
and to detect and extrapolate
patterns, generalize from
limited sets of data
What is learning?
Bishop:
The core objective of a learner is to generalize from its
experience
Narendra & Thathachar:
Learning is defined as any relatively permanent
change in behaviour resulting from past experience,
and a learning system is characterized by its ability to
improve its behaviour with time, in some sense
towards an ultimate goal.
14
How can a machine learn?
The behaviour of the machine is
determined by its parameters.
The behaviour is changed by changing the
parameter values.
15
Learning is defined as any relatively permanent change
in behaviour resulting from past experience ...
Why machine learning?
Application too complex for a human to
implement an algorithm, but we can easily
come up with examples of what we want the
algorithm to learn.
Relationships in highdimensional data too
complex for a human to see, but a computer
can find these.
The computer should learn and adapt
continuously to new situations and conditions.
16
Three main categories of machine
learning methods
Supervised learning: Learn to generalize and
classify new data based on labeled training data.
Unsupervised learning: Discover structure and
relationships in complex highdimensional data.
Reinforcement learning: Generate
policies/strategies that lead to a (possibly
delayed) reward.
17
Mathematical foundations
of machine learning
18
Machine Learning
Optimization
theory
Probability
theory
Multidimensional calculus
Linear algebra
Optimize
parameters
for optimal
behaviour
Model
uncertainty
Continuous vs. discrete data
This course focusses mostly on continuous
measurements (signal, images, etc.)
Examples of discrete symbols include text,
genome (DNA), etc
Continuous signals makes it easier to
interpolate, i.e. generalize to now cases.
19
Features
A feature is a measurement or scalar number that
describes some aspect of a phenomenon or
object, for example
Size
Length
Intensity/Color (RGB)
Position (x,y)
Velocity
Shape
Feature extraction is the process of measuring
features from data.
20
Features Iris dataset
21
Iris setosa
Iris versicolor Iris virginica
From Wikipedia
(
(
(
(
=
2 . 0
4 . 1
5 . 3
1 . 5
1
x
(
(
(
(
=
2 . 0
4 . 1
0 . 3
9 . 4
2
x
(
(
(
(
=
2 . 0
3 . 1
2 . 3
7 . 4
3
x
Feature vectors:
Feature space
22
4 4.5 5 5.5 6 6.5 7 7.5 8
0
0.5
1
1.5
2
2.5
Sepal length
P
e
t
a
l
w
i
d
t
h
Irissetosa
Irisversicolor
Irisvirginica
Iris setosa
Iris versicolor
Iris virginica
Supervised learning
Task: Learn to predict/classify new data from
labeled examples.
Input: Training data examples {x
i
, y
i
} i=1...N,
where x
i
is a feature vector and y
i
is a class
label in the set O.
Output: A function f(x;w
1
,,w
k
) O
23
Find a function f and adjust the parameters w
1
,,w
k
so that
new feature vectors are classified correctly. Generalization!!
Classification vs. Regression
Classification: Select one of a discrete set of
classes (the set O is discrete).
Which horse is going to win this race?
Which letter does this image depict?
Regression: Learn to predict a continuous value
(O = ).
Learn to predict the temperature tomorrow.
What is the probability that this image depicts the
letter a?
24
Supervised learning
25
Environment
Learning
system
Teacher
+

E
error signal
Requires a teacher
Can never become better than the teacher
f(x;w
1
,,w
k
)
Example
26
New samples?
kNearest Neighbors (kNN)
Save all training data.
For a new case, find similar examples among
the training data.
Requires a similarity measure (metric), for
example the Euclidian distance
A majority vote among the k nearest
neighbors decides the class, where k can be
1,2,3,4...
27
( )
=
i
i i
y x y x
2
Classification areas
28
Sepal length
P
e
t
a
l
w
i
d
t
h
4 4.5 5 5.5 6 6.5 7 7.5 8
0
0.5
1
1.5
2
2.5
k = 1
Classification areas
29
k = 1
k = 3
k = 7
Pros and cons of kNN
Very simple no training or modelling
required
Must store a lot of data problem for large
data sets.
Slow classification for large data sets must
compare new samples with all stored samples.
30
Evaluating classifiers
How can we compare the performance of
different classifiers?
What happens if we use the same data for
training and evaluation?
How can we train and test a classifier if we
only have a finite amount of collected data?
31
Confusion matrix
32
Iris setosa
Iris versicolor
Iris virginica
Accuracy:
50 0 0
0 45 5
0 7 43
A
c
t
u
a
l
c
l
a
s
s
Predicted class
V
e
r
s
i
c
o
l
.
S
e
t
o
s
a
V
i
r
g
i
n
i
c
a
Setosa Versicol. Virginica
% 92
150
43 45 50
=
+ +
Training data vs. test data
A classifier must be able to generalize, i.e., it
must be tested using previously unseen data.
Evaluating using training data will give an
overly optimistic accuracy.
Three ways to perform the evaluation:
Hold out
Cross validation
Leave one out
33
Hold out
Simplest approach, hold out one part of the
entire data set as test data.
34
Training data Test data
nfold CrossValidation
Divide data set into n segments. Train using n
1 segments and evaluate using the n:th.
Example of 3fold CrossValidation:
35
Training data Training data Test data
Training data Test data Training data
Test data Training data Training data
Leaveoneout
Extreme case of CrossValidation: Use all data
but one example for training and use the last
one to evaluate
36
Training data
Training data
Training data
Training data
Test example