You are on page 1of 6

132 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 14, NO.

1, MARCH 1995

A Modified Expectation Maximization


Algorithm for Penalized Likelihood
Estimation in Emission TomorzraDhv
Alvaro R. De Pierro

Abstruct- The maximum likelihood (ML) expectation max- Suppose now that we discretize the problem by subdividing
imization (EM) approach in emission tomogragphy has been the reconstruction region into n small abutting square-shaped
very popular in medical imaging for several years. In spite of picture elements (pixels, for short) and we assume that the
this, no satisfactory convergent modifications have been proposed
for the regularized approach. In this paper, a modification of activity in each pixel j is a constant, denoted by xj.If we count
the EM algorithm is presented. The new method is a natural yi coincidences along m lines and a;j denotes the probability
extension of the EM for maximizing likelihood with concave that a photon emitted by pixel j is detected by pair i, then yi is
priors. Convergence proofs are given. a sample from a Poisson distribution whose expected value is
cy=, aijxj and, for the sake of simplicity [4], we assume that
I. INTRODUCTION m
x a i j = 1.
I N THIS paper, a new method for maximizing penalized
likelihoods arising in Emission Computed Tomography
(ECT) (see [l]) is presented. The goal of ECT is the quanti-
i=l

If x represents the image vector x = (XI,... ,z,)~ and


tative determination of the moment-to-moment changes in the y = ( y l , . . . ,Y,)~ the data vector, the likelihood function,
chemistry and flow physiology of radioactive labeled com- that is, the probability of obtaining y if the image is x, is
pounds inside the body. The mathematical problem consists
of reconstructing a function representing the distribution of
radioactivity in a body cross-section from measured data that
are the total activity along lines of known location. One of
the things that distinguishes this problem from that arising in where (, ) denotes the standard inner product in R" and ai are
X-ray tomography [2] is that measurements tend to be much the row-vectors of the matrix A = { a i j } E Rmx".
more noisy; so, in ECT, it is desirable to have a reconstruction In order to estimate x from y, one possible approach is to
approach incorporating an estimation procedure that takes into maximize PL(y/x) subject to nonnegative constraints on x,
account the statistical nature of the noise. or equivalently
For example, in Positron Emission Tomography (PET) [3], m
the isotope used emits positrons which annihilate with nearby maxL(z) = C y i l o g ( a i , s ) - (a;,z), (3)
electrons generating two photons travelling away from each x>o
i=l
other in (nearly) opposite directions; the number of such
photon pairs (detected in time coincidence) for each line or where L( .) denotes the log-likelihood function. This approach
pair of detectors is related to the integral of the concentration was first proposed in [ 5 ] , and in [6] the use of the EM
of the isotope along the line. The emission of positrons is a algorithm [7] was suggested for solving (3), obtaining very
Poisson process [2], the mean of which is determined by the good results. Unfortunately, in its practical application, the
concentration of the isotope that we wish to estimate; more EM algorithm applied to (3) has to be stopped before a
exactly, we aim at recovering a function whose value is the deteriorating "checkerboard effect" shows up [8]. This effect
expected number of emissions at each point given measured comes from the fact that we are dealing with an ill-posed
values of coincidences along lines (many) during the data problem and some a priori knowledge, not contained in the
collection period. maximum likelihood (ML) model, is needed. To overcome
this difficulty, several authors [9]-[ 111 proposed a maximum

USA. The Editor responsible for coordinating the reiiew of this paper and maximizing PB (x/y), the conditional probability distribution
recommending its publication was G. T. Herman.
The author is with the Applied Mathematics Department, State Uni-
Of the image vector given the measurement vector y' Using
versity of Campinas, CP6065, 13081, Campinas, SP, Brazil; e-mail: Bayes' equation we get that
alvar@ime.unicamp.br.
IEEE Log Number 9408758.

02784062/95$04.00 0 1995 IEEE


DE PIERRO: A MODIFIED EXPECTATION MAXIMIZATION ALGORITHM 133

where PA(^) and P ( y ) are the a priori probability distribu- A possible solution to this problem is to perform the thought
tions of the image and measurement vectors respectively. Since experiment of embedding the sample space for Y in a richer
P ( y ) is known and independent of z, and PA(%)contains the or larger sample space 2 in which optimization problems are
a priori knowledge of z, taking the logarithm of both sides of easier to solve. The observed data are a realization from Y
(4), the new problem becomes (incomplete data). The corresponding z in Z is not observed
directly, but through g . Especially, we assume that there is a
220
+
maxG(z) = L ( z ) F ( z ) , (5) h
mapping z + y(z) from Z to Y , and that z is known only
where F ( z ) = logPA(z). The EM algorithm applied to (5) to lie in X(y), the subset of 2 determined by the equation
for a given differentiable F needs to solve at each iteration y = y(z), where y is the observed data. z will be referred as
step a system of equations of the form the complete data.
Our problem is now finding a value of z which maximizes
g(Y,z) given an observed y; in other words, to find the E R
which maximizes

(Observe that the presence of the logarithm forces the solution


to be positive.)
where 0 is a given convex set in R”.
If the variables in F are separated, z j will be the solu-
tion of the corresponding single unknown equation in (6); If 2 has a density function f ( Z , z ) ,
but this is a very strong limitation for F since F should
contain information on correlation between variables. In the (9)
general case (correlated variables), a huge system of nonlinear otherwise,
equations would have to be solved in each iteration (typical
values for PET are m , n e lo5). Several modifications have is the conditional density of 2 given Y and z. In this case,
been proposed to overcome this problem. In [13], variables L ( z ) can be written in the form
are sequentially updated by blocks using equations in (6) in
L ( z ) = logf(Z,z) - logk(Z/Y,z). (10)
such a way that for each block variables are separated; this
modification converges, but it is quite expensive (see [lo] for We define
numerical experiments). This is very similar to the approach in
[ 141. Another possibility is to split F into two parts, linearizing H ( S ,z) = E(1og k(Z/Y,S ) / Y ,z) (1 1)
the one with nonseparated variables [ 151, i.e., set F = F1+ F2
where E(./Y,z) denotes the expectation given Y and z. Using
and solve instead of (6)
(10) and ( l l ) , define

+
Q ( S ,z) = L ( 5 ) H ( 2 ,z). (12)
(7) The following property of H motivates the EM algorithm and
F1 = 0 is the altemative suggested in [9] and FI the diagonal its generalizations.
part of the quadratic (Gaussian prior) in [lo]. But, for these Lemma 1: For all pairs ( 2 , ~in) R2
examples, no general convergence results are available and
divergence may occur. In the following sections, we present H ( S >). I H ( z , (13)
a modification of the EM algorithm together with a full
with equality if and only if k(Z/Y,S) = k ( Z / Y , z ) almost
convergence proof that solves the problem stated above for
everywhere.
very general concave priors.
Proof: See [7], Lemma 1.
In Section 11, we give a description of the EM algorithm
Equation (13) is a straightforward consequence of Jensen’s
in its general form. Then, we analyze its application to the
inequality and the Lemma essentially says that H ( S ,z) has a
ML problem (3) and how it separates the variables. Using
maximum for 5 = z as a function of the first variable.
this analysis, we present, in Section 111, the new algorithm
Using (12), for a given zk and for all 5, we have that
that, when applied to the penalized problem (6) with concave
F , gives rise to n independent equations, each with a single L ( z ) - L ( z k )= [Q(z,xk) - Q(zklz’)]
unknown. We prove that the new algorithm is convergent
under mild conditions.
+ [ H ( z k , z k -) H ( z , z k ) ] . (14)

By Lemma 1, the second term between brackets is always


11. THE EXPECTATION (EM) ALGORITHM nonnegative, then if we choose z such that
MAXIMIZATION
The basic idea of the EM algorithm is very simple. Suppose
Q ( z , z k2
) Q(zk,zk), (15)
the observed data in some experiment or sequence of exper-
iments is a random vector y with density function g(Y,z), L ( z ) will be greater or equal than L ( z k ) . Therefore, the
where z is some vector of parameters to be estimated. In function L will be nondecreasing and this is the first step
general, it may be difficult to maximize g(Y,z) with respect to toward an algorithm for maximizing L. Taking into account
z, especially if z is a very large vector, as it is the case in ECT. this property, we define the following:
134 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 14, NO. 1, MARCH 1995

EM Algorithm: Given xo E 0, for IC = 0, 1, 2,. . ., convex combination of the xj's with coefficients & and
a zk

E-step: Compute the conditional expectation the concavity of log x as follows:

E ( l o g f ( Z , x ) / Y , x k )= Q ( x , x k ) . (16) y ,(l o g ~
L ( x ) = cj=1
m
) (uijxf
Ui,Xk)
';jXj
UijXf (ai,xk)-(ai,s)
i=l
M-step: Choose xk+' to be equal to
x k ) - aijzj = Q ~ ( xx').
,
arg max Q ( x ,x k ) , (17)
XES7

yi = CYij,
111. THE MODIFIED
EM ALGORITHM
j=1
Let S be a real p x n matrix without zero columns and
and the expected value of yij (unknown) given y; and x k is f g (.! = 1, . . . ,p) be a real-valued strictly concave function
twice continuously differentiable in R and bounded above.
We will consider general penalty functions of the form
P

The extended log-likelihood for this problem is (except for a JYX) = fe((se,x))- (26)
e=i
constant that will not be considered because we are dealing
with a maximization problem) For f e ( x ) = - $ V l , we obtain penalty functions derived from
Gaussian priors [21; fe(x) = -cllogcosh(czx) (with c1, c2
possibly depending on .!) gives the penalization suggested in
~71.
Another condition on the function fe that will be required
and, applying the expectation given y and x k to L , we obtain
is that for every bounded set B there exists a constant ~ k ~
,
y.a. .& such that
Q ( x , x k )= L?Q
log aijxj - aijxj, (21)
.. (U2,Xk) -~ (27)
23
dx2
using (19) and the linearity of the expectation. In the same Note that (27) is satisfied by the examples just mentioned and
way, by many reasonable existing concave regularizations.
m
Now let A: (l = l , . - . , p ;j = l , . . . , n ) be nonnegative
y.a. . X k
H(x,xk)= -log aijxj - yi log(ai,x ) (22) numbers such that
.. (Ui,Xk) n
i=l
2
.
7
EA:
= I, for e = l , . . . , p , (28)
and the sequence defined by (18) is, for IC = 0, 1, 2,. . .. j=1
cs=s,j/As, forj=l,...,n,C=l,...,p . (29)
(23)
(If S e j = 0, we can define c: = 0 and choose A: = 0; if Sej #
where 0, then A: # 0), and

- ASgjxf, if A: # o for j = 1,.. . , n,


' j
otherwise.
Analyzing the EM algorithm in this particular case in a (30)
Using (28)-(30) and concavity, we obtain
nonstatistical framework, we can make the following obser-
vation: the expectation step is equivalent to substituting the
original problem of maximizing L by another simpler problem
of maximizing in each iteration a function Q whose variables
are separated. Moreover, L and Q coincide at the current point
x k up to the first derivative. The function Q is obtained from 2 Xpfe(c:zj + d:')). (31)
L and xk using the fact that ( a i , x ) can be represented as a j=1
DE PIERRO: A MODIFIED EXPECTATION MAXIMIZATION ALGORITHM 135

Now we define Proposition 1: For the sequence defined by (41), the fol-
lowing properties are valid for k = 0, 1, 2, . . . :
1) G(zk+')2 G ( x k ) ,with equality if and only if zk+' =
Xk.
2) {xk} is bounded.
and 3) G ( x k + l )- G ( x k )2 yllxk+l - xk1I2 for some positive
constant y independent of k.
Proof:
From (40), (41), and Lemmas 1 and 2,
Differentiating H i with respect to xj at x;, we get G ( z k f l ) - G ( z k )2 Q1(xk+',zk)- Q 1 ( x k , z k )
+ Q ~ ( X " ' , X ~ ) - Q2(zk,xk)

2 0. (42)
If G(zk+') = G ( z k ) )the
, fact that z k f l = zk will
follow from 3).
The level sets of L are bounded and F ( x ) is bounded
On the other hand. above by some U ; so, if F ( z ) 5 U for every z,consider
G, = {x E R;: G ( x ) 2 a } {x E Rn+: L ( z ) 2
,f3} = Lp where ,f3 = cr - U . But, Lp is bounded; so G,
is bounded for every (Y E R. Taking cr = G ( x o ) ,the
where the inequality is just (31) and the equality a simple
result follows from 1).
substitution in (33).
Let
+
By differentiating Q1 Q 2 , we deduce that " :x is the
solution of

and where Aj" is defined in (24) and


P
Bj"= Sejfj(c;zj + d;k). (44)
e=i
(Observe, the B f depends on xj and xi).
Lemma 2: For all pairs ( 2 , ~ ' )in (R")',
Moreover, z"; is the only strictly positive solution of (43)
because it is the unique maximum of a strictly concave
problem, the fe 's are twice continuously differentiable, and
Proof: Equation (38) is an immediate consequence of (35) the logarithm enforces positivity. Expanding &I (z, xk) +
and (37). Q 2 ( x , x k ) in Taylor series to second-order about the point
0 xk+l, taking into account that V(Q1 Q2)(xk+l,xk)= 0 +
If G ( ~ ~ + ~ ) - ~G Q ( ~~) ( xz k~) - +
~ l (~x k, , x k )

+ Q 2 ( x k + ' , xk) - Q 2 ( x k ,xk)


1
=-+k+l - x k ) V 2 Q 1 ( Zx, k ) ( x k f l- xk)
from the definitions of G, &I, Q2,H i , and H2,and using 2
Lemma 2, we get that 1
- +k+l - x k ) V 2 Q 2 ( 3z~)(x'+'
, -xk)
2
G(z) - G ( z k ) (451
=Qi(x,xk)+ Q 2 ( z , x k ) - [Qi(xk,xk)+ Q 2 ( x k , ~ I C ) 1 where Z lies in the line segment between xk and xk+'. But
+ [-H1(x,xk)- H 2 ( x , z k ) ] . (40) -V2Q1 is a positive definite matrix and -V2Q2 is a diagonal
matrix with elements
By Lemmas 1 and 2, the second term between brackets in
(40) is always nonnegative, motivating the modification of the j = 1,. . .
EM algorithm for solving (5). e=i " j
I ) Modified EM Algorithm: Given xo,a positive vector, for bounded below by
k = 0, 1, 2, . ' .
136 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 14, NO. 1, MARCH 1995

where 7: comes from (27) (I3 = G, with Q = G(xo)). Taking Suppose now that x; = 0; VG(x*)j > 0 implies that
y = minyj, the result follows. U
We are now able to prove convergence of (41) to a solution
+
A; B; > 1. (57)
of (5) along the lines of [18]. To do this, our only additional For k large enough, (42) implies that A;
hypothesis will be that (5) has a unique solution. Of course,
+
B; > 1 by
continuity; if x:+' < x:, using (43) and the fact that A: > 0,
this is true in the application that motivates the algorithm. As
a matter of fact, in real situations, L and F do not necessarily +
zj"" = xj"Aj" x:+'B? > x;+l(AS +
El:), (58)
have a unique maximum if considered separately, but one
of the reasons to add the penalization term is to avoid this a contradiction, so x:+' 2 x:, but this contradicts the fact
uncertainty. that x: -+ 0. Therefore, VG(x*)j 5 0. 0
Theorem I : The sequence generated by (41) converges to
a solution of (5). A . Examples
Proof: From Proposition 1, ( G ( x k ) } is a nondecreasing
1) Quadratic Penalization [lo], [8]: In this case, !e(.) =
sequence bounded above, then convergent. This implies that
G(xk+') - G(xk) -+ 0. ( x k } is bounded by Proposition -;x2, y > 0, and (se,x) = xe - L C t E N p xwhere
ne.
t Ne
denotes the set of pixels in a given neighborhood of xe and
1, 2). Let ( x k s} be a convergent subsequence such that
ne, the total number of those pixels (without xe). We can
xks -+ x*. Proposition 1, 3) implies that xk+' - xk -+ 0,
and from this we deduce that xks+' + x*(IIxkS+'- x * ) (I
choose A: = & and for interior pixels we will have
k+m
+
JlxkS+'- xkSII l(xks- $*I( -+ 0). So, multiplying (43) by
xFs+'(j = 1, .. . ,n) and taking limits we have
xj*(A; - 1 + Bj*)= 0, (48)
where, by continuity, Lemma 1 and Lemma 2 (59)

So, multiplying (43) by x j ,$+' will be the unique positive


and solution of a quadratic equation.
2) Logcosh Penalization [17]: In this case, !e(.) =
-y C&Ntwen log cosh( y )y,, 6 given positive parame-
In order to prove that x* is a solution of (5), we need to ters, and wer is a weight coding the strength of neighborliness
verify that x* satisfies the Kuhn-Tucker conditions [19], i.e., between pixels and T : wen = 1 for orthogonal nearest
for j = l , . . . , n , neighbors, 5 for diagonal neighbors, and zero otherwise.
Choosing A$ = :, for interior pixels we will have
x j * V G ( ~ *=
) j 0, (51)
wej 1
x; 2 0, (52) B? = y 6 6
-tgh-(2xj - xj" - xt). (60)
VG(x*)j 5 0. (53) emj
Equation (51) is just (48); (52) is true because x* is a limit of Substituting in (43), the nonlinear equation in x j can be solved
positive vectors. If x; > 0, then VG(x*)j = 0, but if x; = very fast using a few steps of Newton's method.
0, proving (53) needs a more complicated rationale.
We need to prove first that the whole sequence converges, IV. CONCLUDING
REMARKS
i.e., that there is only one limit point. Consider two such points A modification of the EM algorithm for penalized likelihood
x* and x**,and the sets which is convergent for general concave penalizations has
been presented. In [4], numerical experiments were presented
N = {1,2,...,nj, (54)
showing that the new method performs as well as those in
W * = { j E N : x; = 0}, (55) [ 101, but converging for every penalization parameter for
W**= { j E N : xj**= O}. (56) quadratic priors. Future research directions are: 1) obtaining
a reasonable implementation for nonquadratic regularization
Let G,* (x) be the restriction of G(x) to the set S* = {x:x j =
terms, and 2) applying a similar technique to deal with
0 for j E W*j .
nonconvex penalizations as the one proposed in [ 121.
G,. (x) is strictly concave in S* and has a unique stationary
point which is necessarily the maximum. Therefore, if W* =
W**,x* and x** should be the same. ACKNOWLEDGMENT
The number of limit points is bounded by the number of The author would like to thank R. Lewitt for carefully
subsets of N , that it is finite. Now, we can use Ostrowski's reading and correcting a first version of this article, and
theorem [20] (the set of limit points of a sequence ( x * } , J. Browne for many fruitful discussions regarding the final
such that l(xk+' - x IC 0 is connected) and deduce that version of the paper. Thanks are also due to M. A. Blue for
xk --t x*. typing the manuscript.
DE PIERRO A MODIFIED EXPECTATION MAXIMIZATION ALGORITHM I37

REFERENCES matics, T. Coleman and Y. Li, Eds. Siam: SIAM Publications, 1990,
vol. 46, pp. 3-21.
T. F. Budinger, G. T. Gullberg, and R. H. Huesman, “Emission E. Levitan and G. Herman, “A maximum a posteriori probability ex-
computed tomography,” in Image Reconstruction from Projections: pectation maximization algorithm for image reconstruction in emission
Implementation and Applications, G. Herman, Ed. Berlin, Heidelberg: tomography,” IEEE Trans. Med. Imag., vol. MI-6, no. 3, pp. 185-192,
Springer-Verlag, 1979, ch. 5, pp. 147-246. 1987.
G. T. Herman, Image Reconstruction from Projections: The Funda- S. Geman and D. McClure, “Bayesian image analysis: An application
mentals of Computerized Tomography. New York: Academic Press, to single photon emission tomography,” in Proc. Statist. Comput. Sect.,
1980. Amer., Statist. Assoc., Washington, D.C., pp. 12-18, 1985.
M. M. Ter-Pogossian, M. Raichle, and B. E. Sobel, “Positron emission A. R. De Pierro, “A generalization of the EM algorithm for maximum
tomography,” Sci. Amer., vol. 243, no. 4, pp. 170-181, 1980. likelihood estimates from incomplete data,” Dep. Radiology, University
G. T. Herman, A. R. De Pierro, and N. Gai, “On methods for maximum of Pennsylvania, Tech. Rep. MIPGll9, 1987.
a posteriori image reconstruction with a normal prior,” .I Visual
. Comm. T. Hebert and R. Leahy, “A generalized EM algorithm for 3-D Bayesian
Image Repres., vol. 3, no. 4, pp. 1-9, 1992. reconstruction from Poisson data using Gibbs priors,” IEEE Trans. Med.
A. Rockmore and A. Macovski, “A maximum likelihood approach to Imag., vol. 8, pp. 19&202, 1989.
emission image reconstruction from projections,” IEEE Trans. Nucl. Sci., A. R. De Pierro, “Multiplicative iterative methods in computed to-
mography,” in Mathematical Methods in Tomography, Lecture Notes
vol. NS-23, pp. 1428-1432, 1976.
L. Shepp and Y. Vardi, “Maximum likelihood reconstruction for emis- in Mathematics, A. L. G. Herman and F. Natterer, Eds. New York:
Springer-Verlag, 1991, vol. 1497, pp. 167-186.
sion tomography,” IEEE Trans. Med. Imag., vol. MI-1, pp. 113-121, I. Csiszir and G. Tusnidy, “Information geometry and altemating
1982. minimization procedures,” in Statistics and Decisions. Munchen: R.
A. P. Dempster, N. M. Laird, and D. D. Rubin, “Maximum likelihood
Oldenbourg Verlag, 1984, pp. 205-237.
from incomplete data via the EM algorithm,” J . Roy. Stat. Soc., Series P. Green, “Bayesian reconstruction for emission tomography data using
B, vol. 39, pp. 1-38, 1977. a modified EM algorithm,” IEEE Trans. Med. Imag., vol. 9, no. 1, pp.
G. T. Herman and D. Odhner, “Performance evaluation of an iterative 84-93, 1990.
image reconstruction algorithm for positron emission tomography,” K. Lange and R. Carson, “EM reconstruction algorithms for emission
IEEE Trans. Med. Imag., vol. 10, no. 3, pp. 336346, 1991. and transmission tomography,” J . Comput. Assist. Tomog., vol. 8, pp.
P. Green, “On the use of the EM algorithm for penalized likelihood 306316, 1984.
estimation,” J . Royal Star. Soc. B, vol. 52, no. 2, pp. 443452, 1990. M. Avriel, Nonlinear Programming: Analysis and Methods. Engle-
G. T. Herman, D. Odhner, K. D. Toennies, and S. A. Zenios, “A wood Cliffs, NJ: Prentice-Hall, 1976.
parallelized algorithm for image reconstruction from noisy projections,” A. Ostrowski, Solution of Equations in Euclidean and Banach Spaces.
in Large Scale Numerical Optimization, Proceedings in Applied Mathe- New York: Academic Press, 1973.

You might also like