Professional Documents
Culture Documents
M. H. da Silva Bassani
Departamento de Engenharia Mecânica, DEPES,CEFET-RJ
Av. Maracanã 229, Rio de Janeiro, RJ, BRAZIL, CEP 20271-110
E-mail: marcos@cefet-rj.br
does not continuously depend on the data due to the fact that
ABSTRACT motion estimation is highly sensitive to the presence of
Pel-recursive motion estimation is a well-established approach. observation noise in video images.
However, in the presence of noise, it becomes an ill-posed In this article, we plan to use the MAP estimate to
problem that requires regularization. In this paper, motion find u, that is, the update of the motion, from our observation
vectors are estimated in an iterative fashion by means of the model by means of the Espectation-Maximization technique.
Expectation-Maximization (EM) algorithm and a Gaussian data The main advantage of the EM method is that the final
model. Our proposed algorithm also utilizes the local image algorithm deals with closed-form expressions and does not
properties of the scene to improve the motion vector estimates require the use of optimization techniques.
following a spatially adaptive approach. Numerical experiments We organized this work as follows. Section 2 provides
are presented that demonstrate the merits of our method. some necessary background on the pel-recursive motion
estimation problem. Section 3 introduces our spatially adaptive
approach. Section 4 describes the EM framework. Section 5
Keywords: expectation-maximization; EM; computer vision;
defines the metric used to evaluate our results. Section 6
motion estimation; pel-recursive, maximum likelihood.
describes some implementation aspects and the experiments
used to access the performance of our proposed algorithm.
Finally, Section 7 has some conclusions.
1. INTRODUCTION
where the displacement update vector u=[ux, uy]T = d(r) – di(r), 4.1. The Ordinary ML Estimate and Its Limitations
e(r, d(r)) represents the error resulting from the truncation of The MAP estimate of u is given by [11]
the higher order terms (linearization error) and ∇=[∂/∂x, ∂/∂y]T uˆ MAP = ΛuG T ( G ΛuG T + Λn )−1 z . (4)
represents the spatial gradient operator. Applying (2) to all The computation of ûMAP requires knowledge of the
points in a neighborhood R gives
covariance matrices of the update vector (Λu) and the
z = Gu+ n, (3) noise/linearization error term (Λn), respectively.
where the temporal gradients ∆(r, r-di(r)) have been stacked to The estimate in Eq. (4) was derived assuming that u and n have
form the N×1 observation vector z containing DFD information zero mean, and are uncorrelated. The MAP for the linear
on all the pixels in a neighborhood R, the N×2 matrix G is observation model in Eq. (3) is also the maximum a posteriori
obtained by stacking the spatial gradient operators at each (MAP) estimate, assuming a Gaussian prior on u and Gaussian
observation, and the error terms have formed the N×1 noise noise n (for more details, see [11]).
vector n which is assumed Gaussian with n~N(0, σn2I). Each In this work, we resort to the maximum likelihood
row of G has entries [gxi, gyi]T, with i = 1, …, N. The spatial (ML) estimation of the second-order statistics Λu and Λn of
gradients of Ik-1 are calculated through a bilinear interpolation the model. The calculation of the ML estimates of Λu and Λn
scheme [2]. is done iteratively by means of the Expectation-Maximization
(EM) algorithm formalized by Dempster et al [3] which has
3. SPATIAL ADAPTATION been used in a variety of applications [7, 8, 10]. The EM
algorithm was previously used for motion estimation in [6].
Aiming to improve the estimates given by the pel-recursive However, this work deals with estimate parameters of affine
algorithm, we introduced an adaptive scheme for determining models and it uses a block-matching framework.
the optimal shape of the neighborhood of pixels with the same For the model described by Eq. (3), let us assume that
DV used to generate the overdetermined system of equations the update vector u and the noise n are normally-distributed
given by (3). More specifically, the masks in Fig. 1 show the with mean zero and covariance matrices equal to Λu and Λn,
geometries of the neighborhoods used. respectively. If we consider u and n as being uncorrelated, then
Errors can be caused by the basic underlying the pdf of z is
1
assumption of uniform motion inside R (the smoothness − 1
f z ( z ) = 2π ( G ΛuG H + Λn ) 2 × exp − z H ( G ΛuG H + Λn )−1 z .
constraint), by not grouping pixels adequately, and by the way 2
gradient vectors are estimated, among other things. Since it is Let us take Φ = {Λu, Λn} as the parameter set to be
known that in a noiseless image not containing pixels with estimated. The pdf of z can be considered a continuous and
constant intensity, most errors, when estimating motion, occur differentiable function of Φ. This dependency can be better
close to motion boundaries, we propose a hypothesis testing captured by the notation fz(z; Φ). Furthermore, we can assume
(HT) approach to determine the best neighborhood shape for a that the additive noise is white, with covariance matrix Λn =
given pixel. We pick up the neighborhood from the finite set of σn2I, where I is an N × N identity matrix.
templates shown in Fig. 1, according to the smallest DFD The ML estimation of Φ is the ΦML that maximizes
criterion, in an attempt to adapt the model to local features
the logarithm of the likelihood function of fz(z;Φ), which is
associated to motion boundaries.
given by
u
z = [G ; I ] = Hx .
n b11( p) ( p)
b12 ( p)
" b1m
( p) ( p) ( p )
The foundation of the EM algorithm is the b b22 " b2m
= 21 ,
maximization of the expectation of log{fx(x;Φ)} given the # # % #
incomplete observed data z and the current estimate of the ( p) ( p) ( p)
parameter set Φ. fx(x;Φ) is the pdf of the complete data and it m1
b bm2 " bmm
is given by c( p )
c( p ) = µ u|z
( p)
= 1( p ) = Λu( p )G H ( G Λu( p )G H + Λn( p ) )−1 z , (7)
−
1
1 c2
f x ( x ;Φ ) = 2πΛx 2 exp − x H Λx−1 x . (6)
2 e1( p )
Taking the logarithm of both sides of Eq. (11) leads to
and e( p ) = µ n|z
( p)
= # = Λn( p ) ( G Λu( p )G H + Λn( p ) )−1 z .
1 1
log {f x ( x ;Φ )} = − log 2πΛx − x H Λx−1 x em( p )
2 2
1
= K − [Λx + x H Λ ]. M-Step:
2
where Φ = Λx and K is a constant independent of Φ. The E-step
requires the computation of the expectation In order to obtain this step, we need to find the minimum of
F(Φ;Φ(p)). Differentiating F(Φ;Φ(p)) with respect to σn2 and
making it equal to zero yields
Q( Φ ;Φ ( p ) ) = E[log{ f x ( x ;Φ )} | z ;Φ ( p ) ] ,
1
m
{ 2
σ n2( p + 1 ) = Tr B( p ) + e( p ) . (8) }
where Φ(p)=[Λu(p);Λn(p)] is the estimate of the parameter set
Applying similar procedure for σ12 and σ22, gives
Φ={Λu;Λn} at the p-th iteration. The M-step
maximizes Q( Φ ;Φ ( p ) ) :
σ 12( p + 1 ) = a11
( p)
+ c12( p ) , and (9)
Φ ( p +1 )
= arg{ max Q( Φ ;Φ ( p)
)} . σ =a +c2( p + 1 )
2 . ( p)
22
2( p )
2 (10)
Φ
Now, we are ready to state the resulting EM algorithm using
The E-step can be re-written in terms of the parameter set Φ by multiple masks.
means of the relationships developed in terms of the complete
data x and the pdf f(x|z;Φ(p)) as
4.3. The EM-Based Motion Estimation Descrptive
F( Φ ;Φ ( p ) ) = log Λu + log Λn + tr( Λu−1Λu|z
( p)
) Algorithm
. For each pixel in the current frame k, located at r, do the
+ tr( Λn−1Λn|z
( p)
) + µ u|z
( p )H
Λu−1µu|z
( p)
+ µ n|z
( p )H
Λn−1µ n|z
( p)
following:
1) Initialize the system: d 0 ( r ) , σ n2( 0 ) , σ 12( 0 ) , and σ 22( 0 ) ,
Our goal is to estimate the update vector u. The MAP/MMSE
( p)
m ← 0 (m=mask counter), and p ← 0 (p= iteration
estimate of u is µu|z which corresponds to its conditional
counter).
2) If DFD < T , then stop. T= threshold for DFD .
3) Calculate Gi and zi for the current mask and current initial 6. EXPERIMENTS
estimate.
4) Calculate the current update vector by means of Eq. (7) , This section presents results illustrating the effectiveness of the
(byproduct of the E-Step). EM algorithm and compare it to the Wiener filter described by
5) Perform the M-Step and update the parameter estimates
u Wiener =
u LMMSE = (G T G + µ I ) −1 G T z ,
using Eqs. (8), (9), and (10).
6) Calculate the new displacement vector: with µ=50 and constant for the entire frame [1]. The algorithms
d p +1( r ) = d p ( r ) + u p . (11) were tested using two real video sequences: the "Foreman" and
7) For the current mask m: the "Mother and Daughter" (MD). The sequences are 144 x
If Φ p + 1 − Φ p ≤ ξ (convergence test for the EM 176, 8-bit (QCIF). The algorithm called “EM” uses one
mask , while algorithm “EMm” employs multiple masks.
algorithm), d p + 1 ( r ) − d p ( r ) ≤ ε and DFD < T , then stop.
Otherwise, if p < ( I − 1 ) , where I is the maximum number
Mother and Daughter Sequence - Noiseless Case
of iterations allowed, then go to step 3 with p ← p + 1 . 10.5
If p=I-1, try another neighborhood geometry
(m←m+1),and reset variables: p ← 0 , d 0 ( r ) , σ n2( 0 ) , σ 12( 0 ) , 10
8.5
o---o EMm
5. METRIC
+---+ EM
8
*---* Wiener
This work assesses the motion field quality through the use of
the fowlling metric [2, 4, 5]: 7.5
7
5.1 Improvement in Motion Compensation 31 32 33 34 35 36 37 38 39 40
Frame number, k
The IMC( dB ) between two consecutive frames is given by
[ I k ( r ) − I k −1( r )]
∑
2
(a)
IMC k ( dB ) = 10 log10 r ∈S
2
, where S is
∑ I k ( r ) − I k −1 ( r − d ( r ))
r ∈S
Mother and Daughter Sequence - SNR=20 dB
the frame being currently analyzed. It shows the ratio in 9.5
2
decibel (dB) between the mean-squared frame difference ( FD ) 9
defined by
Improvement in Motion Compensation (dB)
2
∑ [ I k ( r ) − I k −1( r )] 2 8.5
FD = r ∈S
RC 8
2 o---o EMm
and the DFD between frames k and (k-1) . +---+ EM
7.5
As far as the use of the this metric goes, we chose to *---* Wiener
apply it to a sequence of K frames, resulting in the following 7
equation for the average improvement in motion compensation:
6.5
K
∑∑ [ I k ( r ) − I k −1( r )]
2
6
IMC( dB ) = 10 log10 K k = 2 r ∈S
31 32 33 34 35 36
Frame number, k
37 38 39 40
∑∑ I k ( r ) − I k −1 ( r − d ( r )) 2
k = 2 r∈S
(b)
When it comes to motion estimation, we seek
Figure2: IMC( dB ) for the noiseless (a) and noisy (b)
algorithms that have high values of IMC (dB) . If we could
detect motion without any error, then the denominator of the
cases for the “MD” sequence.
previous expression would be zero (perfect registration of
motion) and we would have IMC( dB ) =∞.
Foreman Sequence - Noiseless Case
12.5
12
11
9.5
8.5
11 12 13 14 15 16 17 18 19 20
Frame number, k
(a) (a)
10.5
9.5
o---o EMm
9 +---+ EM
*---* Wiener
8.5
7.5
11 12 13 14 15 16 17 18 19 20
Frame number, k
(b) (b)
Figure 3: Motion-compensated errors for frame 32 of the Figure 4: IMC ( dB) for frames 11-20 of the noiseless (a)
“Mother and Daughter” sequence with SNR=20dB: the Wiener and noisy cases with SNR=20dB (b) for the “Foreman”
(a) and the EMA (b) algorithms. sequence.
has great potential for applications such as video coding
and image segmentation.
8. REFERENCES