Professional Documents
Culture Documents
2 Institut AIFB
Die Datenmatrix für Überwachtes Lernen
Xj j-te Eingangsvariable
X = (X0, . . . , XM 1)T
Vektor von Eingangsvariablen
M Anzahl der Eingangsvariablen
N Anzahl der Datenpunkte
Y Ausgangsvariable
xi = (xi,0, . . . , xi,M 1)T
i-ter Eingangsvektor
xi,j j-te Komponente von xi
yi i-te Zielgröße
di = (xi,0, . . . , xi,M 1, yi)T
i-tes Muster
D = {d1, . . . , dN }
(Trainings-) Datensatz
z Testeingangsvektor
t Unbekannte Testzielgröße zu z
X = (x1, . . . xN )T design matrix
1
3 Institut AIFB
Recap: Linear Models
Estimate: w, b
4 Institut AIFB
Recap: Perceptron – Components
Model class
Learning algorithm
Optimization criterion
5 Institut AIFB
Linear Separability and the XOR problem
Famous perceptron tasks: simulate simple Boolean functions
(i.e. encode corresponding truth tables) by encoding inputs as -1
(for false) and +1 (for true)
??
w1 =1,w2=1,b=-1
AND
AND XOR
XOR
6 Institut AIFB
Linear separability and the Iris Dataset…
7 Institut AIFB
Linear Separability and the XOR problem
Minsky, M. and Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge, MA, USA.
8 Institut AIFB
Basis expansion (Prelude for ANNs & SVMs)
M
X1
Linear model: f (xi , w) = w0 + wj xi,j
j=1
w = (1, 2, 2, 4)
f (x) = 1 2x1 2x2 + 4x1 x2
12 Institut AIFB
Example 2: Basis Expansion
Basis expansion:
z = (1, x, x , x )
2 3
Weights:
w = (0, 1, 0, 0.3)
13 Institut AIFB
Popular Basis Expansions
j (x) = x1 , x2 , ..., xM 1
j (x) = 2
x1 , x1 x2
p
j (x) = log(x1 ), x1
Picewise polynomials, splines, wavelet bases
j (x)
= exp( vkx xm k )
2
MX1
sig(arg) =
(x) = sig( vj xj )
15 Institut AIFB
Chapter 4.2.
16 Institut AIFB
Abstract Model of the Neuron
Cell body
Dendrites
Axon
Summation
Activation function
17 Institut AIFB
Feed-Forward Networks
18 Institut AIFB
Application: ALVINN [Mitchell 1997]
Training Algorithm
"Learning" means determining appropriate connection and
threshold weights
20 Institut AIFB
Multi-Layer Neural Networks
21 Institut AIFB
Expressivity of Multi-Layer ANNs
x1
1
t=1
0
1
1
t=2
-‐2
t=1
1
1
0
1
t=1
x2
XOR
input
layer
hidden
layer
output
layer
22 Institut AIFB
Interpretation of Multi-Layer ANNs
23 Institut AIFB
Sigmoid Activation Function
24 Institut AIFB
Notation
25 Institut AIFB
Example – 2-layer ANN:
final activation at output layer
1 w11(1)
1
2 w11(2)
2 1
...
...
w21 (2)
...
...
...
NB: To simplify notation we again drop b parameter and include it into the weight
vector: w0 = b. (Imagine this to be a dummy feature that equals 1 for all inputs.)
26 Institut AIFB
Backpropagation – General Considerations
Recall: To simplify notation we will again drop the bias term b and
include it into the weight vector as: w0 = b.
27 Institut AIFB
Backpropagation – Illustration
Data
Input
Error
28 Institut AIFB
Backpropagation – General Considerations (cont)
For any weight wjk(l) in the network we would like to know it's influence
on the final error E, i.e. the corresponding component of the gradient
Note that weight wjk(l) influences the rest of the network (and thus the
error) only through the net input netk(l) of the neuron k it connects to.
We can rewrite the partial derivative, regardless of further details, as:
29 Institut AIFB
Computing ± – Case I: Units at the output layer (n)
We thus have:
30 Institut AIFB
Computing ± – Case II: Units at the hidden layer (h)
We thus have:
31 Institut AIFB
Backpropagation – Algorithm
32 Institut AIFB
Multi-Layer ANNs – Properties
33 Institut AIFB
Review: Multi-Layer ANNs – Components
Model class
Learning algorithm
Optimization criterion
34 Institut AIFB
Problems of Multi-Layer ANNs
35 Institut AIFB
Problems of Multi-Layer ANNs
Long oscillations in
'narrow valeys' E
E
Stagnation on flat surfaces
Local minima E
36 Institut AIFB
Problems of Multi-Layer ANNs: Overfitting
Figure 8a: Plots of error E as a function of the number of weight updates, for two
different robot perception tasks (Mitchell 1997)
Figure taken from Mitchell (1997)
37 Institut AIFB
Chapter 3.2.c
39 Institut AIFB
Other Network Types
40 Institut AIFB
Recurrent Networks
41 Institut AIFB
Deep Learning
Autoencoders
42 Institut AIFB
Deep Learning - Motivation
43 Institut AIFB
Sparse Coding [Olshausen & Field, 1996]
44 Institut AIFB
Sparse Coding – Learned Bases (Example)
Images Learned Bases “edges”
New
Sample
[0,0………0.8,..,0.3,…………..0.5..] = Coefficients (Feature Representation)
45 Institut AIFB
Sparse Coding - Application
46 Institut AIFB
Spare Coding (Training)
Optimization Approach
47 Institut AIFB
Spare Coding (Testing)
48 Institut AIFB
Autoencoder [Hinton & Zemel, 1994]
Feature Representation
Top-down, Bottom-up,
Generative, Encoder Decoder Feed-
Feed-back Forward
Input
49 Institut AIFB
Autoencoder - Application
50 Institut AIFB
Autoencoder (Training)
Given an input x. The reconstruction of encoder and decoder is.
Decoder Encoder
where D represents input and output and K represent hidden layers (K< D)
Optimization : W and D determined by minimizing reconstruction error.
51 Institut AIFB
Knowledge Discovery Lecture WS14/15
22.10.2014 Einführung
Basics, Overview
29.10.2014 Design of KD-experiments
05.11.2014 Linear Classifiers
12.11.2014 Data Warehousing & OLAP
19.11.2014 Non-Linear Classifiers (ANNs) Supervised Techniques,
26.11.2014 Kernels, SVM Vector+Label Representation
03.12.2014 entfällt
10.12.2014 Decision Trees
17.12.2014 IBL & Clustering Unsupervised Techniques
07.01.2015 Relational Learning I
Semi-supervised Techniques,
14.01.2015 Relational Learning II
Relational Representation
21.01.2015 Relational Learning III
28.01.2015 Textmining
04.01.2015 Gastvortrag Meta-Topics
11.02.2015 Crisp, Visualisierung
52 Institut AIFB