Professional Documents
Culture Documents
Beamer
Beamer
Mollen KHAN
Model:
X K
X
p(x) = p(z)p(x|z) = πk N (x|µk , Σk )
z k=1
Q(θt+1 , θt ) ≥ Q(θt , θt )
G (θ) = θ + α∇Q(θ|θ)
Population EM operator:
Self-Consistency:
Assume vector θ∗ maximizes population likelihood.
Must satisfy self-consistency condition:
Theoretical Guarantee
If M is contractive on B2 (r ; θ∗ ) and θ0 ∈ B2 (r ; θ∗ ), EM convergence
is guaranteed under the defined rates.
Mollen KHAN (Université Paris 1 Panthéon-Sorbonne)
Statistical guarantees for the EM algorithm: From population
Decemberto sample-based
14, 2023 analysis
24 / 34
Guarantees for Population-level Gradient EM
G (θ) := θ + α∇Q(θ|θ).
rq(✓)
rQ(✓)
rQ(✓|✓1 )
rQ(✓|✓2 )
✓⇤ ✓1 ✓2 ✓
∥Mn (θ) − M(θ)∥2 ≤ ϵG (n, δ), for any fixed vector θ ∈ B2 (r ; θ∗ ), with
probability at least 1 − δ.
Uniform deviation analogue:
ϵunif
G (n, δ) be the smallest scalar such that
∥θ∗ ∥2
> η, (6)
σ
where η is a sufficiently large constant.
Population EM operator is contractive over the ball B2 (r ; θ∗ ) with
radius r and contractivity parameter κ, which decreases exponentially
with η 2 .
∥θ∗ ∥2
r= ,
4
2
κ(η) ≤ e −cη .
−2 −2
−4 −4
Log error
Log error
−6 −6
−8 −8
−10 −10
Opt. error Opt. error
−12 Stat. error −12 Stat. error
2 4 6 8 10 12 20 40 60 80
Iteration Count Iteration Count
−4
−6
−8
−10
−12
−14
20 40 60 80 100
Iteration Count
Figure: Plot of the iteration count versus the (log) optimization error
∗
log(∥θt − θ̂∥) for different values of the SNR ∥θσ ∥ . For each SNR, we performed
10 independent trials of a Gaussian mixture model with dimension d = 10 and
sample size n = 1000. Larger values of SNR lead to faster convergence rates,
consistent with Corollary 2
Mollen KHAN (Université Paris 1 Panthéon-Sorbonne)
Statistical guarantees for the EM algorithm: From population
Decemberto sample-based
14, 2023 analysis
33 / 34
Conclusion and Perspectives