Professional Documents
Culture Documents
ETHEM ALPAYDIN
© The MIT Press, 2010
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Outline
Last class Chapter 13 Kernel Machines
-Non separable case: Soft Margin Hyperplane
-Kernel Trick
-Vectorial Kernels
-Multiple Kernel Learning
-Multiclass Kernel Machines
Today: Finish Chapter 13 Kernel Machines
Chapter 16 Hidden Markov Models
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
SVM for Regression
Use a linear model (possibly kernelized)
f(x)=wTx+w0
Use the є-sensitive error function
0 if r t f xt
e r , f x t
t t
r f x
t
otherwise
min w C t t
1 2
2
t
r t w T x w0 t
w x w r
T
0
t
t
t , t 0
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 5
Kernel Regression
Polynomial kernel Gaussian kernel
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6
One-Class Kernel Machines
Consider a sphere with center a and radius R
min R 2 C t
t
subject to
xt a R 2 t , t 0
Ld x x r r x
N
t t T s t s t s t T
xs
t t 1 s
subject to
0 t C , t 1
t
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8
Kernel Dimensionality Reduction
Kernel PCA does
PCA on the kernel
matrix (equal to
canonical PCA with
a linear kernel)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
Introduction
Assumption
Modeling dependencies in input; no longer iid (independent and identically
distributed)
Sequences
Temporal:
In speech: phonemes in a word (dictionary), words in a sentence (syntax,
Spatial:
In a DNA sequence: base pairs
probability distribution.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
11
Discrete Markov Process
N states: S1, S2, ..., SN
State at “time” t, qt = Si
First-order Markov
P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)
Transition probabilities
aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1
Initial probabilities
πi ≡ P(q1=Si) Σj=1N πi=1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
12
Stochastic Automaton
T
P ( O=Q∣ A,Π ) =P ( q1 ) ∏ P ( q t∣qt −1 ) =π q a q q .. . aq q
t=2
1 1 2 T −1 T
For example:
π 3 a31 a12 a22 a 23 a32 a21 ...
Q= 3 1 2 2 3 2 1...
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
13
Example: Balls and Urns
Three urns each full of balls of one color
S1: red, S2: blue, S3: green
S1 S2 S3
¿
S1 S2
0.4 0 .3 0.3
Π= [ 0 .5,0 . 2,0 . 3 ]
T
A= 0 . 2
[ 0 .6 0.2
]
S3
0.1 0.1 0.8
O= { S1 ,S 1 ,S 3 ,S 3 } = {red, red, green, green}
P ( O∣A,Π ) =
¿
¿
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
14
Example: Balls and Urns
Three urns each full of balls of one color
S1: red, S2: blue, S3: green
S1 S2 S3
0.4 0.3 0.3
S1 S2
0.5,0.2,0.3 A 0.2 0.6 0.2
T
S3
O S1 ,S1 ,S3 ,S3 = {red, red, green, green}
P O | A , P S1 P S1 | S1 P S3 | S1 P S3 | S3
1 a11 a13 a33
0.5 0.4 0.3 0.8 0.048
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
15
Balls and Urns: Learning
Observable Markov Model
Given K example sequences of length T
How to estimate the parameters?
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
16
Balls and Urns: Learning
Given K example sequences of length T
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
17
Hidden Markov Models
States are not observable
Discrete observations {v1,v2,...,vM} are recorded; a probabilistic function
of the state
Emission probabilities
bj(m) ≡ P(Ot=vm | qt=Sj)
Example
In each urn, there are balls of different colors, but with different
probabilities.
For each observation sequence, there are multiple state sequences
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
18
Another Example
A colored ball choosing example :
Observation : RRGGBRGR
State Sequence : ??
Here :
S = {U1, U2, U3} U1 U2 U3
A=
V = { R,G,B} U1 0.1 0.4 0.5
For observation: U2 0.6 0.2 0.2
O ={o1… on}
U3 0.3 0.4 0.3
And State sequence R G B
Q ={q1… qn} B=
U1 0.3 0.5 0.2
π is
U2 0.1 0.4 0.5
i P(q1 U i )
U3 0.6 0.1 0.3
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
22
Elements
of an HMM
N: Number of states
V = {v1,v2,...,vM}
A = [aij]: N by N state transition probability matrix
πi ≡ P (q1=Si)
λ = (A, B, Π), parameter set of HMM
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
23
Examples
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
24
Three Basic Problems of HMMs
1. Evaluation:
Given λ, and O, calculate P (O | λ)
1. State sequence:
Given λ, and O, find Q* such that
P (Q* | O, λ ) = maxQ P (Q | O , λ )
1. Learning:
Given X={Ok}k, find λ* such that
P ( X | λ* )=maxλ P ( X | λ )
(Rabiner, 1989)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
25
Evaluation: Naïve solution
State sequence Q = {q1,…qT}
Assume independent observations:
T
P(O∣Q , λ )=∏ P(Ot ∣q t , λ )=b q (O1 )b q (O 2 ). .. b q (OT )
i=1 1 2 T
Initialization:
α 1 ( i ) =π i bi ( O 1 )
Recursion:
N
α t+1 ( j )=
N
[∑ ] (
i=1
α t ( i ) aij b j O t+1 )
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
29
Evaluation
Backward variable:
β t ( i ) ≡P ( O t+1 ⋯O T ∣q t =S i ,λ )
The probability of being in Si at time t
and observing the partial sequence
{O t+1 ,⋯,OT }
Initialization:
β T ( i )=1 (=P(OT+1∣qT =Si ,λ ))
Recursion:
N
β t ( i ) =∑ aij b j ( Ot+1 ) β t+1 ( j )
j=1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
30