You are on page 1of 30

Lecture Slides for

ETHEM ALPAYDIN
© The MIT Press, 2010
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Outline
Last class Chapter 13 Kernel Machines
-Non separable case: Soft Margin Hyperplane
-Kernel Trick
-Vectorial Kernels
-Multiple Kernel Learning
-Multiclass Kernel Machines
Today: Finish Chapter 13 Kernel Machines
Chapter 16 Hidden Markov Models

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
SVM for Regression
Use a linear model (possibly kernelized)
f(x)=wTx+w0
Use the є-sensitive error function
0 if r t  f  xt   
e  r , f  x     t
t t

 r  f  x   
t
otherwise

min w  C   t   t 
1 2

2
 
t

r t  w T x  w0     t
w x  w   r
T
0
t
    t
 t , t  0
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 5
Kernel Regression
Polynomial kernel Gaussian kernel

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6
One-Class Kernel Machines
Consider a sphere with center a and radius R
min R 2  C  t
t

subject to
xt  a  R 2   t , t  0

Ld    x  x    r r  x 
N
t t T s t s t s t T
xs
t t 1 s

subject to
0   t  C ,  t  1
t

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8
Kernel Dimensionality Reduction
Kernel PCA does
PCA on the kernel
matrix (equal to
canonical PCA with
a linear kernel)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
Introduction
 Assumption
 Modeling dependencies in input; no longer iid (independent and identically
distributed)

 Sequences
 Temporal:
In speech: phonemes in a word (dictionary), words in a sentence (syntax,

semantics of the language).


 In handwriting: pen movements

 Spatial:
 In a DNA sequence: base pairs

 Base pairs in a DNA sequence can not be modeled as simple

probability distribution.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
11
Discrete Markov Process
 N states: S1, S2, ..., SN
 State at “time” t, qt = Si
 First-order Markov
P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)

 Transition probabilities
aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1

 Initial probabilities
πi ≡ P(q1=Si) Σj=1N πi=1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
12
Stochastic Automaton

T
P ( O=Q∣ A,Π ) =P ( q1 ) ∏ P ( q t∣qt −1 ) =π q a q q .. . aq q
t=2
1 1 2 T −1 T

For example:
π 3 a31 a12 a22 a 23 a32 a21 ...
Q= 3 1 2 2 3 2 1...
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
13
Example: Balls and Urns
Three urns each full of balls of one color
S1: red, S2: blue, S3: green
S1 S2 S3
¿

S1 S2
0.4 0 .3 0.3
Π= [ 0 .5,0 . 2,0 . 3 ]
T
A= 0 . 2
[ 0 .6 0.2
]
S3
0.1 0.1 0.8
O= { S1 ,S 1 ,S 3 ,S 3 } = {red, red, green, green}
P ( O∣A,Π ) =
¿
¿

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
14
Example: Balls and Urns
Three urns each full of balls of one color
S1: red, S2: blue, S3: green
S1 S2 S3
0.4 0.3 0.3

S1 S2
   0.5,0.2,0.3 A  0.2 0.6 0.2
T

0.1 0.1 0.8

S3
O  S1 ,S1 ,S3 ,S3 = {red, red, green, green}
P O | A ,    P  S1   P  S1 | S1   P  S3 | S1   P  S3 | S3 
 1  a11  a13  a33
 0.5  0.4  0.3  0.8  0.048
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
15
Balls and Urns: Learning
Observable Markov Model
Given K example sequences of length T
How to estimate the parameters?

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
16
Balls and Urns: Learning
Given K example sequences of length T

{sequences starting with Si } ∑k 1 ( q1k =Si )


π̂ i = =
{ sequences } K
{ transitions from Si to S j }
a
̂ ij=
{ transitions from Si }
T −1
∑k ∑t 1 ( qkt =S i and qt+k 1 =S j )
= T −1
∑k ∑t 1 ( qkt =S i )

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
17
Hidden Markov Models
 States are not observable
 Discrete observations {v1,v2,...,vM} are recorded; a probabilistic function
of the state
 Emission probabilities
bj(m) ≡ P(Ot=vm | qt=Sj)
 Example
 In each urn, there are balls of different colors, but with different
probabilities.
 For each observation sequence, there are multiple state sequences

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
18
Another Example
A colored ball choosing example :

Urn 1 Urn 2 Urn 3


# of Red = 30 # of Red = 10 # of Red =60
# of Green = 50 # of Green = 40 # of Green =10
# of Blue = 20 # of Blue = 50 # of Blue = 30

Probability of transition to another Urn after picking a ball:


U1 U2 U3
U1 0.1 0.4 0.5
U2 0.6 0.2 0.2
U3 0.3 0.4 0.3
Example (contd.)
U1 U2 U3 R G B
Given : U1 0.1 0.4 0.5 U1 0.3 0.5 0.2
and
U2 0.6 0.2 0.2 U2 0.1 0.4 0.5
U3 0.3 0.4 0.3 U3 0.6 0.1 0.3

Observation : RRGGBRGR

State Sequence : ??

Not so Easily Computable.


Example (contd.)

Here :
S = {U1, U2, U3} U1 U2 U3
A=
V = { R,G,B} U1 0.1 0.4 0.5
For observation: U2 0.6 0.2 0.2
O ={o1… on}
U3 0.3 0.4 0.3
And State sequence R G B
Q ={q1… qn} B=
U1 0.3 0.5 0.2
π is
U2 0.1 0.4 0.5
 i  P(q1  U i )
U3 0.6 0.1 0.3
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
22
Elements

of an HMM
N: Number of states

S = {S1, S2, ..., SN }


 M: Number of observation symbols

V = {v1,v2,...,vM}
 A = [aij]: N by N state transition probability matrix

aij ≡ P (qt+1=Sj | qt=Si)


 B = bj(m): N by M observation probability matrix

bj(m) ≡ P (Ot=vm | qt=Sj)


 Π = [πi]: N by 1 initial state probability vector

πi ≡ P (q1=Si)
λ = (A, B, Π), parameter set of HMM
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
23
Examples

•Gene regulation O={A, C, G, T} S={Gene,Transcription factor binding site,Junk


DNA,...}

•Speech processing O=speech signal S=word or phoneme being uttered•

Text understanding O=words S=topic (e.g. sports, weather, etc)

•Robot localization–O=sensor readings S=discretized position of the robot

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
24
Three Basic Problems of HMMs
1. Evaluation:
Given λ, and O, calculate P (O | λ)
1. State sequence:
Given λ, and O, find Q* such that
P (Q* | O, λ ) = maxQ P (Q | O , λ )
1. Learning:
Given X={Ok}k, find λ* such that
P ( X | λ* )=maxλ P ( X | λ )
(Rabiner, 1989)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
25
Evaluation: Naïve solution
State sequence Q = {q1,…qT}
Assume independent observations:
T
P(O∣Q , λ )=∏ P(Ot ∣q t , λ )=b q (O1 )b q (O 2 ). .. b q (OT )
i=1 1 2 T

Observations are mutually independent, given the


hidden states.
Evaluation: Naïve solution
Observe that :

P(Q∣λ)=π q1 aq1q2 aq2q3 ...aqT −1qT


And that:

P(O∣λ )=∑ P(O∣Q , λ )P (Q∣λ )


q
Evaluation: Naïve solution
Finally get:

P(O∣λ )=∑ P(O∣Q , λ )P (Q∣λ )


q

-The above sum is over all state paths


-There are NT states paths, each ‘costing’
O(T) calculations, leading to O(TNT)
time complexity.
Evaluation
 Forward variable:
α t ( i ) ≡P ( O 1 ⋯O t ,q t =Si∣ λ )
 The probability of observing the partial
sequence {O1 ,⋯,O
until timet } t and being

in Si at time t, given the model λ

Initialization:
α 1 ( i ) =π i bi ( O 1 )
Recursion:
N
α t+1 ( j )=
N
[∑ ] (
i=1
α t ( i ) aij b j O t+1 )

P ( O∣λ ) =∑ αT ( i ) Evaluation result


i=1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
29
Evaluation
 Backward variable:
β t ( i ) ≡P ( O t+1 ⋯O T ∣q t =S i ,λ )
 The probability of being in Si at time t
and observing the partial sequence
{O t+1 ,⋯,OT }
Initialization:
β T ( i )=1 (=P(OT+1∣qT =Si ,λ ))
Recursion:
N
β t ( i ) =∑ aij b j ( Ot+1 ) β t+1 ( j )
j=1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
30

You might also like