You are on page 1of 52

Clustering applied to hidden Markov models

Ibrahim Kaddouri
Supervised by: Elisabeth Gassiat and Zacharie Naulet

Université Paris Saclay, Laboratoire de mathématiques d’Orsay

September 8, 2022

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 1 / 35


Clustering and Hidden Markov Models

Outline

1 Clustering and Hidden Markov Models

2 Inference in HMMs

3 Reconstructing the hidden states

4 Efficiency of the plug-in empirical Bayes classifier

5 Asymptotic Bayes risk

6 Bounds on the asymptotic Bayes risk

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 2 / 35


Clustering and Hidden Markov Models

Clustering
Clustering is an ill-posed problem which aims to find out interesting
structures in the data or to derive a useful grouping of the observations.

DBSCAN clustering K-means clustering


2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
−0.5 −0.5
−1 0 1 2 −1 0 1 2

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 3 / 35


Clustering and Hidden Markov Models

Mixture models: Probabilistic definition of clusters

Observations Y = (Yk )1≤k≤n coming from J populations.


Given that the observation comes from population j, it has emission
distribution Fj .
Define latent variables X = (Xk )1≤k≤n such that: for each k,

Yk | Xk = j ∼ Fj

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 4 / 35


Clustering and Hidden Markov Models

Mixture models: Probabilistic definition of clusters

Observations Y = (Yk )1≤k≤n coming from J populations.


Given that the observation comes from population j, it has emission
distribution Fj .
Define latent variables X = (Xk )1≤k≤n such that: for each k,

Yk | Xk = j ∼ Fj

Then Yk has distribution


J
X
πj Fj
j=1

πj : Probability to come from population j

Useful to model data coming from heterogeneous populations.

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 4 / 35


Clustering and Hidden Markov Models

Mixture models: Identifiability

Mixture models are not identifiable :


J π1 J
!
π1 π1 2 F1 + π 2 F2
X   X
πj Fj = F1 + + π2 π1 + π j Fj
j=1
2 2 2 + π2 j=3

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 5 / 35


Clustering and Hidden Markov Models

Mixture models: Identifiability

Mixture models are not identifiable :


J π1 J
!
π1 π1 2 F1 + π 2 F2
X   X
πj Fj = F1 + + π2 π1 + π j Fj
j=1
2 2 2 + π2 j=3

Learning of population components possible only under additional


structural assumptions such as:
Parametric mixtures
Shape restrictions (gaussian, multinomial, ...)

− This might lead to poor results in practice.

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 5 / 35


Clustering and Hidden Markov Models

Hidden Markov Models and why they are useful

Markov process : X0 X1 X2 ··· XT −1

Observations : Y0 Y1 Y2 ··· YT −1

Figure: A Hidden Markov Model.

Observations (Yk )k are independent conditionnally to (Xk )k .


Latent (unobserved) variables (Xk )k form a Markov chain.

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 6 / 35


Clustering and Hidden Markov Models

Hidden Markov Models and why they are useful

Markov process : X0 X1 X2 ··· XT −1

Observations : Y0 Y1 Y2 ··· YT −1

Figure: A Hidden Markov Model.

Observations (Yk )k are independent conditionnally to (Xk )k .


Latent (unobserved) variables (Xk )k form a Markov chain.

HMMs are identifiable without any additional assumption

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 6 / 35


Inference in HMMs

Outline

1 Clustering and Hidden Markov Models

2 Inference in HMMs

3 Reconstructing the hidden states

4 Efficiency of the plug-in empirical Bayes classifier

5 Asymptotic Bayes risk

6 Bounds on the asymptotic Bayes risk

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 7 / 35


Inference in HMMs

Inference in Hidden Markov Models

Purpose: Estimate the model parameters and the hidden states associated
to the observations.

The HMM parameters are:


The initial distribution ν.
The transition matrix Q.
The emission distributions (Fi )1≤i≤J

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 8 / 35


Inference in HMMs

Inference in Hidden Markov Models (HMM)

Definition (Smoothing, Filtering, Prediction)


For positive indices k, l and n with l ≥ k, denote by ϕk:l|n the conditional
distribution of Xk:l = (Xk , .., Xl ) given Y0:n = (Y0 , .., Yn ).
Specific choices of k and l give rise to several cases of interest:

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 9 / 35


Inference in HMMs

Inference in Hidden Markov Models (HMM)

Definition (Smoothing, Filtering, Prediction)


For positive indices k, l and n with l ≥ k, denote by ϕk:l|n the conditional
distribution of Xk:l = (Xk , .., Xl ) given Y0:n = (Y0 , .., Yn ).
Specific choices of k and l give rise to several cases of interest:
- Smoothing: ϕk|n for n ≥ k ≥ 0;

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 9 / 35


Inference in HMMs

Inference in Hidden Markov Models (HMM)

Definition (Smoothing, Filtering, Prediction)


For positive indices k, l and n with l ≥ k, denote by ϕk:l|n the conditional
distribution of Xk:l = (Xk , .., Xl ) given Y0:n = (Y0 , .., Yn ).
Specific choices of k and l give rise to several cases of interest:
- Smoothing: ϕk|n for n ≥ k ≥ 0;
- Filtering: ϕn|n for n ≥ 0;

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 9 / 35


Inference in HMMs

Inference in Hidden Markov Models (HMM)

Definition (Smoothing, Filtering, Prediction)


For positive indices k, l and n with l ≥ k, denote by ϕk:l|n the conditional
distribution of Xk:l = (Xk , .., Xl ) given Y0:n = (Y0 , .., Yn ).
Specific choices of k and l give rise to several cases of interest:
- Smoothing: ϕk|n for n ≥ k ≥ 0;
- Filtering: ϕn|n for n ≥ 0;
- Prediction filtering: ϕn+1|n for n ≥ 0

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 9 / 35


Inference in HMMs

Inference in Hidden Markov Models (HMM)

Definition (Smoothing, Filtering, Prediction)


For positive indices k, l and n with l ≥ k, denote by ϕk:l|n the conditional
distribution of Xk:l = (Xk , .., Xl ) given Y0:n = (Y0 , .., Yn ).
Specific choices of k and l give rise to several cases of interest:
- Smoothing: ϕk|n for n ≥ k ≥ 0;
- Filtering: ϕn|n for n ≥ 0;
- Prediction filtering: ϕn+1|n for n ≥ 0
Because the use of filtering will be preeminent in the following, we shall
most often abbreviate ϕn|n to ϕn .

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 9 / 35


Inference in HMMs

Forward filtering (discrete emissions)

Algorithm 1: Forward filtering


Assume X = {0, ..., r − 1}, (ν, Q, F ) is given;
Initialization: for i = 0, .., r − 1 ϕ0|−1 (i) = ν(i)
for k ∈ {0, .., n} do
P −1
ck ← ri=0 ϕk|k−1 (i)Fi (Yk );
for j ∈ {0, .., r − 1} do
ϕk (j) ← ϕk|k−1 (j)Fj (Yk )/ck ;
P −1
ϕk+1|k (j) ← ri=0 ϕk (i)q(i, j);
end
end

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 10 / 35


Inference in HMMs

Backward smoothing

Algorithm 2: Backward smoothing


Assume X = {0, ..., r − 1}, (ν, Q, F ) is given.;
Using Algorithm 1, compute c0 , .., cn and ϕ0 , .., ϕn .;
for j ∈ {0, .., r − 1} do
βn|n (j) ← cn−1 ;
for k ∈ {n − 1, .., 0} do
for i ∈ {0, .., r − 1} do
βk|n (i) ← ck−1 rj=0
P −1
q(i, j)gk+1 (j)βk+1|n (j);

for k ∈ {0, .., n − 1} do


ϕ (i)βk|n (i)
ϕk|n (i) ← Pr −1k ;
j=0
ϕk (j)βk|n (j)

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 11 / 35


Inference in HMMs

EM algorithm

Algorithm 3: EM algorithm
Randomly initialize matrices Q and F , set nb_iter number of iterations
Compute marginal distributions (ϕk|n )0≤k≤n and joint distributions
(ϕk:k+1|n )0≤k≤n−1 using algorithm 1
for k ∈ {0, .., nb_iter − 1} do
for i, j ∈ {0, .., r P
− 1} do
n
k−1:k|n ϕ (i,j;θ′ )
q(i, j) ← Pn k=1
Pr −1
k=1 l=0
ϕk−1:k|n (i,l;θ′ )

for i ∈ {0, .., r − 1} do


for j ∈ {0, ..,P
m − 1} do
ϕ (i,θ ′)
0≤k≤n,Yk =j k|n
f (i, j) = nP
ϕ (i,θ′ )
k=0 k|n

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 12 / 35


Inference in HMMs

Example

Consider the following HMM with the parameters:


! !
0.22 0.78 0.1 0.9
Q= , F = , n = 7000
0.66 0.34 0.85 0.15

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 13 / 35


Inference in HMMs

Example

0.8
0.8

0.7 0.7

0.6
0.6
Estimated value 0.5 Estimated value
Q

F
True value True value
0.5 0.4

0.3
0.4
0.2

0.3 0.1
0 50 100 150 200 0 50 100 150 200
Iterations Iterations

Figure: The two plots represent the estimated parameters at each iteration. On
the left: parameters Q0,1 and Q1,1 . On the right: parameters F0,0 and F1,0 .

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 14 / 35


Inference in HMMs

Example

−4620

−4640

−4660
Log − Likelihood

−4680

−4700

−4720

−4740

−4760

0 50 100 150 200


Iterations

Figure: Log-likelihood growth during the iterations.

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 15 / 35


Inference in HMMs

Behavior near the i.i.d frontier


The dependence structure in a HMM is lost whenever the following
quantity approaches 0.
2 !
q−p

1− (1 − p − q) ∥f0 − f1 ∥
q+p

where : !
1−p p
Q= and F = (f0 , f1 ).
q 1−q

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 16 / 35


Inference in HMMs

Behavior near the i.i.d frontier


The dependence structure in a HMM is lost whenever the following
quantity approaches 0.
2 !
q−p

1− (1 − p − q) ∥f0 − f1 ∥
q+p

where : !
1−p p
Q= and F = (f0 , f1 ).
q 1−q
When p + q = 1, the Markov chain becomes:

p
1−p 0 1 p

1−p

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 16 / 35


Inference in HMMs

Behavior near the i.i.d frontier


The dependence structure in a HMM is lost whenever the following
quantity approaches 0.
2 !
q−p

1− (1 − p − q) ∥f0 − f1 ∥
q+p

where : !
1−p p
Q= and F = (f0 , f1 ).
q 1−q
When p + q = 1, the Markov chain becomes:

p
1−p 0 1 p

1−p


− It becomes impossible to learn the model parameters.
Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 16 / 35
Reconstructing the hidden states

Outline

1 Clustering and Hidden Markov Models

2 Inference in HMMs

3 Reconstructing the hidden states

4 Efficiency of the plug-in empirical Bayes classifier

5 Asymptotic Bayes risk

6 Bounds on the asymptotic Bayes risk

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 17 / 35


Reconstructing the hidden states

First loss function

Let θ = (ν, Q, F ).
Consider the loss function:

L1 (x1:n , x1:n ) = 1x1:n ̸=x1:n

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 18 / 35


Reconstructing the hidden states

First loss function

Let θ = (ν, Q, F ).
Consider the loss function:

L1 (x1:n , x1:n ) = 1x1:n ̸=x1:n

The risk associated to a classifier h is:

Rn,HMM (h) = Eθ [L1 (X1:n , h(Y1:n ))] = Pθ (h(Y1:n ) ̸= X1:n )

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 18 / 35


Reconstructing the hidden states

First loss function

Let θ = (ν, Q, F ).
Consider the loss function:

L1 (x1:n , x1:n ) = 1x1:n ̸=x1:n

The risk associated to a classifier h is:

Rn,HMM (h) = Eθ [L1 (X1:n , h(Y1:n ))] = Pθ (h(Y1:n ) ̸= X1:n )

The associated Bayes risk is

R⋆n,HMM = Eθ [ min n Pθ (X1:n ̸= x1:n | Y1:n )]


x1:n ∈X

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 18 / 35


Reconstructing the hidden states

Reconstruction algorithm

Algorithm 4: Viterbi algorithm


for i = 0, .., r − 1 m0 (i) ← log(ν(i)g0 (i))
for k ∈ {0, .., n − 1} do
for j ∈ {0, .., r − 1} do
mk+1 (j) ← maxi∈{0,..,r −1} [mk (i) + log(q( i, j))] + log(gk+1 (j))
end
end
xn ← arg maxj∈{0,..,r −1} mn (j);
for k ∈ {n − 1, .., 0} do
xk be the state i for which the maximum is attained in update above for
mk+1 (j) with j = xk+1
end

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 19 / 35


Reconstructing the hidden states

Second loss function

Consider the loss function:


n
′ 1X
L2 (x1:n , x1:n )= 1 ′
n k=1 xk ̸=xk

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 20 / 35


Reconstructing the hidden states

Second loss function

Consider the loss function:


n
′ 1X
L2 (x1:n , x1:n )= 1 ′
n k=1 xk ̸=xk

The risk associated to a classifier h is:


n
" #
1 X
Rn,HMM (h) = Eθ [L2 (X1:n , h(Y1:n ))] = Eθ 1[h(Y1:n )]i ̸=Xi
n i=1

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 20 / 35


Reconstructing the hidden states

Second loss function

Consider the loss function:


n
′ 1X
L2 (x1:n , x1:n )= 1 ′
n k=1 xk ̸=xk

The risk associated to a classifier h is:


n
" #
1 X
Rn,HMM (h) = Eθ [L2 (X1:n , h(Y1:n ))] = Eθ 1[h(Y1:n )]i ̸=Xi
n i=1

The associated Bayes risk is:


n
1X
R⋆n,HMM = Eθ [min Pθ (Xi ̸= x | Y1:n )]
n i=1 x ∈X

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 20 / 35


Reconstructing the hidden states

Reconstruction algorithm

Algorithm 5: MAP classifier algorithm


Assume X = {0, ..., r − 1}, (ν, Q, F ) is given.;
Using Algorithm 2, compute ϕ0|n , .., ϕn|n .;
for k ∈ {0, .., n} do
xk = arg max0≤i≤r −1 ϕk|n (i)

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 21 / 35


Reconstructing the hidden states

Example

Consider the following HMM with the parameters:


! !
0.75 0.25 0.1 0.9
Q= , F = , n = 1000
0.3 0.7 0.85 0.15

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 22 / 35


Reconstructing the hidden states

Example
Below are the observations and the associated states:

Figure: The observations and the associated hidden states. For the observations,
the top vertical lines correspond to observations of 1, the lower ones are
observations of 0
Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 23 / 35
Reconstructing the hidden states

Reconstructions of the hidden states

The following are the reconstructions of the hidden states:

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 24 / 35


Efficiency of the plug-in empirical Bayes classifier

Outline

1 Clustering and Hidden Markov Models

2 Inference in HMMs

3 Reconstructing the hidden states

4 Efficiency of the plug-in empirical Bayes classifier

5 Asymptotic Bayes risk

6 Bounds on the asymptotic Bayes risk

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 25 / 35


Efficiency of the plug-in empirical Bayes classifier

Notations

(f0 (y ), f1 (y )) = (p0y (1 − p0 )1−y , p1y (1 − p1 )1−y )


θ = (ν, Q, (fx )x =0,1 ) true parameters
θ̂ = (ν̂, Q̂, (fˆx )x =0,1 ) estimators of the true parameters
ϕ⋆i|n (., Y1:n ) smoothing distribution under true parameters θ
ϕ̂i|n (., Y1:n ) smoothing distribution under estimated parameters θ̂

h (Y1:n ) = (1ϕ⋆ (1,Y1:n )>1/2 )1≤i≤n Bayes classifier
i|n

ĥ(Y1:n ) = (1ϕ̂ )1≤i≤n plug-in empirical Bayes classifier


i|n (1,Y1:n )>1/2

ϕ⋆i+1|i (., Y0:i ) prediction filter under true θ


ϕ̂i+1|i (., Y0:i ) prediction filter under estimated θ̂
Efficiency of the plug-in empirical Bayes classifier

Notations

(f0 (y ), f1 (y )) = (p0y (1 − p0 )1−y , p1y (1 − p1 )1−y )


θ = (ν, Q, (fx )x =0,1 ) true parameters
θ̂ = (ν̂, Q̂, (fˆx )x =0,1 ) estimators of the true parameters
ϕ⋆i|n (., Y1:n ) smoothing distribution under true parameters θ
ϕ̂i|n (., Y1:n ) smoothing distribution under estimated parameters θ̂

h (Y1:n ) = (1ϕ⋆ (1,Y1:n )>1/2 )1≤i≤n Bayes classifier
i|n

ĥ(Y1:n ) = (1ϕ̂ )1≤i≤n plug-in empirical Bayes classifier


i|n (1,Y1:n )>1/2

ϕ⋆i+1|i (., Y0:i ) prediction filter under true θ


ϕ̂i+1|i (., Y0:i ) prediction filter under estimated θ̂

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 26 / 35


Efficiency of the plug-in empirical Bayes classifier

Online vs offline clustering

We study the risk of clustering observations in two frameworks:


Online: Classification can use only past observations. Classifiers are of
the from: h(Y0:n−1 ) = (hi (Y0:i−1 ))1≤i≤n
Offline: All observations are used in the classification procedures.
Classifiers are of the form: h(Y1:n ) = (hi (Y1:n ))1≤i≤n

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 27 / 35


Efficiency of the plug-in empirical Bayes classifier

Control of smoothing distribution

Theorem (De Castro, Gassiat, Le Corff (2018))


Suppose the initial distribution is the stationary distribution and
δ = mini,j Qi,j > 0. Then, for all 1 ≤ i ≤ n and all y1:n ∈ Yn :

4(1 − δ) i−1 
∥ϕ⋆i|n (., y1:n ) − ϕ̂i|n (., y1:n )∥TV ≤ ρ ∥ν − ν̂∥ 2 + 1/(1 − ρ)
δ2
n
!
 (ρ̂ ∨ ρ)|l−i|
max fx (yl ) − fˆx (yl )
X
+1/(1 − ρ̂) ∥Q − Q̂∥F +

f (y )
l=1 0 l
∨ f1 (yl ) x =0,1

where:
δ̂ = mini,j Q̂i,j
1−2δ 1−2δ̂
ρ= 1−δ and ρ̂ = 1−δ̂
fx (y ) = pxy (1 − px ) 1−y

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 28 / 35


Efficiency of the plug-in empirical Bayes classifier

Efficiency of plug-in empirical Bayes classifier


Theorem
Assume in addition that c ⋆ = min(p0 , 1 − p0 , p1 , 1 − p1 ) > 0. Then:
"
4(1 − δ) 1 
ROffline
n,HMM (ĥ) − R⋆,Offline
n,HMM ≤ Eθ ∥ν − ν̂∥ 2 + 1/(1 − ρ)
δ2 n(1 − ρ)
#
 2
+1/(1 − ρ̂) ∥Q − Q̂∥F + ⋆ max |px − p̂x |∞
c x =0,1
"
8(1 − δ) 1 
ROnline
n,HMM (ĥ) − R⋆,Online
n,HMM ≤ Eθ ∥ν − ν̂∥ 2 + 1/(1 − ρ)
δ2 n(1 − ρ)
#
 1 h i
+1/(1 − ρ̂) ∥Q − Q̂∥F + ⋆ max |px − p̂x |∞ +2Eθ ∥Q − Q̂∥F
c x =0,1

1−2δ
where δ = mini,j Qi,j > 0 and ρ = 1−δ .
Efficiency of the plug-in empirical Bayes classifier

Efficiency of plug-in empirical Bayes classifier


Theorem
Assume in addition that c ⋆ = min(p0 , 1 − p0 , p1 , 1 − p1 ) > 0. Then:
"
4(1 − δ) 1 
ROffline
n,HMM (ĥ) − R⋆,Offline
n,HMM ≤ Eθ ∥ν − ν̂∥ 2 + 1/(1 − ρ)
δ2 n(1 − ρ)
#
 2
+1/(1 − ρ̂) ∥Q − Q̂∥F + ⋆ max |px − p̂x |∞
c x =0,1
"
8(1 − δ) 1 
ROnline
n,HMM (ĥ) − R⋆,Online
n,HMM ≤ Eθ ∥ν − ν̂∥ 2 + 1/(1 − ρ)
δ2 n(1 − ρ)
#
 1 h i
+1/(1 − ρ̂) ∥Q − Q̂∥F + ⋆ max |px − p̂x |∞ +2Eθ ∥Q − Q̂∥F
c x =0,1

1−2δ
where δ = mini,j Qi,j > 0 and ρ = 1−δ .
Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 29 / 35
Asymptotic Bayes risk

Outline

1 Clustering and Hidden Markov Models

2 Inference in HMMs

3 Reconstructing the hidden states

4 Efficiency of the plug-in empirical Bayes classifier

5 Asymptotic Bayes risk

6 Bounds on the asymptotic Bayes risk

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 30 / 35


Asymptotic Bayes risk

Exponential forgetting
Proposition
Assume:
The initial distribution is the stationary distribution
The HMM model is comprised of two hidden states and discrete
emissions
δ = mini,j Qi,j > 0

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 31 / 35


Asymptotic Bayes risk

Exponential forgetting
Proposition
Assume:
The initial distribution is the stationary distribution
The HMM model is comprised of two hidden states and discrete
emissions
δ = mini,j Qi,j > 0
Then, for 0 ≤ j ′ ≤ j, k ≥ 0 and n ≥ 0:

∥Pθ (Xk ∈ . | Y−j:n ) − Pθ (Xk ∈ . | Y−j ′ :n )∥TV ≤ 2ρk∧n+j
0 ρk−k∧n
1

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 31 / 35


Asymptotic Bayes risk

Exponential forgetting
Proposition
Assume:
The initial distribution is the stationary distribution
The HMM model is comprised of two hidden states and discrete
emissions
δ = mini,j Qi,j > 0
Then, for 0 ≤ j ′ ≤ j, k ≥ 0 and n ≥ 0:

∥Pθ (Xk ∈ . | Y−j:n ) − Pθ (Xk ∈ . | Y−j ′ :n )∥TV ≤ 2ρk∧n+j
0 ρk−k∧n
1

Similarly, for 0 ≤ k ≤ j ′ ≤ j and n ≥ 0 one has:



∥Pθ (Xk ∈ . | Y−n:j ) − Pθ (Xk ∈ . | Y−n:j ′ )∥TV ≤ 2ρ−k+j
0

1−2δ
where ρ0 = 1−δ and ρ1 = 1 − 2δ
Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 31 / 35
Asymptotic Bayes risk

Asymptotic Bayes risk

Consider the following quantities:


n
⋆,Offline 1X
Rn,HMM = Eθ [min(Pθ (Xi = 1 | Y1:n ) , Pθ (Xi = 0 | Y1:n ))]
n i=1
⋆,Offline
R∞,HMM = Eθ [min (Pθ (X0 = 1 | Y−∞:+∞ ) , Pθ (X0 = 0 | Y−∞:+∞ ))]
n
⋆,Online 1X
Rn,HMM = Eθ [min(Pθ (Xi = 1 | Y0:i−1 ) , Pθ (Xi = 0 | Y0:i−1 ))]
n i=1
⋆,Online
R∞,HMM = Eθ [min (Pθ (X1 = 1 | Y−∞:0 ) , Pθ (X1 = 0 | Y−∞:0 ))]

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 32 / 35


Asymptotic Bayes risk

Asymptotic Bayes risk

Theorem
Under the same assumptions:

⋆,offline

⋆,offline 2
Rn,HMM − R∞,HMM ≤
n(1 − ρ0 )

⋆,Online
ρ1 1
Rn,HMM − R⋆,Online
∞,HMM ≤

2n 1 − ρ0
1−2δ
where ρ0 = 1−δ , ρ1 = 1 − 2δ and δ = mini,j Qi,j > 0.

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 33 / 35


Bounds on the asymptotic Bayes risk

Outline

1 Clustering and Hidden Markov Models

2 Inference in HMMs

3 Reconstructing the hidden states

4 Efficiency of the plug-in empirical Bayes classifier

5 Asymptotic Bayes risk

6 Bounds on the asymptotic Bayes risk

Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 34 / 35


Bounds on the asymptotic Bayes risk

Case of online clustering


We introduce the following reparametrization which will be useful in the
statement of the results:
q−p
 
ϕ(θ) = , 1 − p − q, ∥f0 − f1 ∥
q+p

Theorem
- The asymptotic Bayes risk for online clustering verifies:

1 |ϕ1 |
R⋆,Online
∞,HMM ≤ −
2 2
c c2
- Assuming in addition that min (f0 , f1 ) ≥ c and that |ϕ2 | ≤ 12 ∧ 4,
then one has:
1 |ϕ1 | (1 − ϕ21 )ϕ22 ϕ3
R⋆,Online
∞,HMM ≥ − −
2 2 2c
Ibrahim KADDOURI Clustering applied to hidden Markov models September 8, 2022 35 / 35

You might also like