Professional Documents
Culture Documents
1, MARCH 1995
Abstruct- The maximum likelihood (ML) expectation max- Suppose now that we discretize the problem by subdividing
imization (EM) approach in emission tomogragphy has been the reconstruction region into n small abutting square-shaped
very popular in medical imaging for several years. In spite of picture elements (pixels, for short) and we assume that the
this, no satisfactory convergent modifications have been proposed
for the regularized approach. In this paper, a modification of activity in each pixel j is a constant, denoted by xj.If we count
the EM algorithm is presented. The new method is a natural yi coincidences along m lines and a;j denotes the probability
extension of the EM for maximizing likelihood with concave that a photon emitted by pixel j is detected by pair i, then yi is
priors. Convergence proofs are given. a sample from a Poisson distribution whose expected value is
cy=, aijxj and, for the sake of simplicity [4], we assume that
I. INTRODUCTION m
x a i j = 1.
I N THIS paper, a new method for maximizing penalized
likelihoods arising in Emission Computed Tomography
(ECT) (see [l]) is presented. The goal of ECT is the quanti-
i=l
USA. The Editor responsible for coordinating the reiiew of this paper and maximizing PB (x/y), the conditional probability distribution
recommending its publication was G. T. Herman.
The author is with the Applied Mathematics Department, State Uni-
Of the image vector given the measurement vector y' Using
versity of Campinas, CP6065, 13081, Campinas, SP, Brazil; e-mail: Bayes' equation we get that
alvar@ime.unicamp.br.
IEEE Log Number 9408758.
where PA(^) and P ( y ) are the a priori probability distribu- A possible solution to this problem is to perform the thought
tions of the image and measurement vectors respectively. Since experiment of embedding the sample space for Y in a richer
P ( y ) is known and independent of z, and PA(%)contains the or larger sample space 2 in which optimization problems are
a priori knowledge of z, taking the logarithm of both sides of easier to solve. The observed data are a realization from Y
(4), the new problem becomes (incomplete data). The corresponding z in Z is not observed
directly, but through g . Especially, we assume that there is a
220
+
maxG(z) = L ( z ) F ( z ) , (5) h
mapping z + y(z) from Z to Y , and that z is known only
where F ( z ) = logPA(z). The EM algorithm applied to (5) to lie in X(y), the subset of 2 determined by the equation
for a given differentiable F needs to solve at each iteration y = y(z), where y is the observed data. z will be referred as
step a system of equations of the form the complete data.
Our problem is now finding a value of z which maximizes
g(Y,z) given an observed y; in other words, to find the E R
which maximizes
+
Q ( S ,z) = L ( 5 ) H ( 2 ,z). (12)
(7) The following property of H motivates the EM algorithm and
F1 = 0 is the altemative suggested in [9] and FI the diagonal its generalizations.
part of the quadratic (Gaussian prior) in [lo]. But, for these Lemma 1: For all pairs ( 2 , ~in) R2
examples, no general convergence results are available and
divergence may occur. In the following sections, we present H ( S >). I H ( z , (13)
a modification of the EM algorithm together with a full
with equality if and only if k(Z/Y,S) = k ( Z / Y , z ) almost
convergence proof that solves the problem stated above for
everywhere.
very general concave priors.
Proof: See [7], Lemma 1.
In Section 11, we give a description of the EM algorithm
Equation (13) is a straightforward consequence of Jensen’s
in its general form. Then, we analyze its application to the
inequality and the Lemma essentially says that H ( S ,z) has a
ML problem (3) and how it separates the variables. Using
maximum for 5 = z as a function of the first variable.
this analysis, we present, in Section 111, the new algorithm
Using (12), for a given zk and for all 5, we have that
that, when applied to the penalized problem (6) with concave
F , gives rise to n independent equations, each with a single L ( z ) - L ( z k )= [Q(z,xk) - Q(zklz’)]
unknown. We prove that the new algorithm is convergent
under mild conditions.
+ [ H ( z k , z k -) H ( z , z k ) ] . (14)
EM Algorithm: Given xo E 0, for IC = 0, 1, 2,. . ., convex combination of the xj's with coefficients & and
a zk
E ( l o g f ( Z , x ) / Y , x k )= Q ( x , x k ) . (16) y ,(l o g ~
L ( x ) = cj=1
m
) (uijxf
Ui,Xk)
';jXj
UijXf (ai,xk)-(ai,s)
i=l
M-step: Choose xk+' to be equal to
x k ) - aijzj = Q ~ ( xx').
,
arg max Q ( x ,x k ) , (17)
XES7
yi = CYij,
111. THE MODIFIED
EM ALGORITHM
j=1
Let S be a real p x n matrix without zero columns and
and the expected value of yij (unknown) given y; and x k is f g (.! = 1, . . . ,p) be a real-valued strictly concave function
twice continuously differentiable in R and bounded above.
We will consider general penalty functions of the form
P
The extended log-likelihood for this problem is (except for a JYX) = fe((se,x))- (26)
e=i
constant that will not be considered because we are dealing
with a maximization problem) For f e ( x ) = - $ V l , we obtain penalty functions derived from
Gaussian priors [21; fe(x) = -cllogcosh(czx) (with c1, c2
possibly depending on .!) gives the penalization suggested in
~71.
Another condition on the function fe that will be required
and, applying the expectation given y and x k to L , we obtain
is that for every bounded set B there exists a constant ~ k ~
,
y.a. .& such that
Q ( x , x k )= L?Q
log aijxj - aijxj, (21)
.. (U2,Xk) -~ (27)
23
dx2
using (19) and the linearity of the expectation. In the same Note that (27) is satisfied by the examples just mentioned and
way, by many reasonable existing concave regularizations.
m
Now let A: (l = l , . - . , p ;j = l , . . . , n ) be nonnegative
y.a. . X k
H(x,xk)= -log aijxj - yi log(ai,x ) (22) numbers such that
.. (Ui,Xk) n
i=l
2
.
7
EA:
= I, for e = l , . . . , p , (28)
and the sequence defined by (18) is, for IC = 0, 1, 2,. . .. j=1
cs=s,j/As, forj=l,...,n,C=l,...,p . (29)
(23)
(If S e j = 0, we can define c: = 0 and choose A: = 0; if Sej #
where 0, then A: # 0), and
Now we define Proposition 1: For the sequence defined by (41), the fol-
lowing properties are valid for k = 0, 1, 2, . . . :
1) G(zk+')2 G ( x k ) ,with equality if and only if zk+' =
Xk.
2) {xk} is bounded.
and 3) G ( x k + l )- G ( x k )2 yllxk+l - xk1I2 for some positive
constant y independent of k.
Proof:
From (40), (41), and Lemmas 1 and 2,
Differentiating H i with respect to xj at x;, we get G ( z k f l ) - G ( z k )2 Q1(xk+',zk)- Q 1 ( x k , z k )
+ Q ~ ( X " ' , X ~ ) - Q2(zk,xk)
2 0. (42)
If G(zk+') = G ( z k ) )the
, fact that z k f l = zk will
follow from 3).
The level sets of L are bounded and F ( x ) is bounded
On the other hand. above by some U ; so, if F ( z ) 5 U for every z,consider
G, = {x E R;: G ( x ) 2 a } {x E Rn+: L ( z ) 2
,f3} = Lp where ,f3 = cr - U . But, Lp is bounded; so G,
is bounded for every (Y E R. Taking cr = G ( x o ) ,the
where the inequality is just (31) and the equality a simple
result follows from 1).
substitution in (33).
Let
+
By differentiating Q1 Q 2 , we deduce that " :x is the
solution of
where 7: comes from (27) (I3 = G, with Q = G(xo)). Taking Suppose now that x; = 0; VG(x*)j > 0 implies that
y = minyj, the result follows. U
We are now able to prove convergence of (41) to a solution
+
A; B; > 1. (57)
of (5) along the lines of [18]. To do this, our only additional For k large enough, (42) implies that A;
hypothesis will be that (5) has a unique solution. Of course,
+
B; > 1 by
continuity; if x:+' < x:, using (43) and the fact that A: > 0,
this is true in the application that motivates the algorithm. As
a matter of fact, in real situations, L and F do not necessarily +
zj"" = xj"Aj" x:+'B? > x;+l(AS +
El:), (58)
have a unique maximum if considered separately, but one
of the reasons to add the penalization term is to avoid this a contradiction, so x:+' 2 x:, but this contradicts the fact
uncertainty. that x: -+ 0. Therefore, VG(x*)j 5 0. 0
Theorem I : The sequence generated by (41) converges to
a solution of (5). A . Examples
Proof: From Proposition 1, ( G ( x k ) } is a nondecreasing
1) Quadratic Penalization [lo], [8]: In this case, !e(.) =
sequence bounded above, then convergent. This implies that
G(xk+') - G(xk) -+ 0. ( x k } is bounded by Proposition -;x2, y > 0, and (se,x) = xe - L C t E N p xwhere
ne.
t Ne
denotes the set of pixels in a given neighborhood of xe and
1, 2). Let ( x k s} be a convergent subsequence such that
ne, the total number of those pixels (without xe). We can
xks -+ x*. Proposition 1, 3) implies that xk+' - xk -+ 0,
and from this we deduce that xks+' + x*(IIxkS+'- x * ) (I
choose A: = & and for interior pixels we will have
k+m
+
JlxkS+'- xkSII l(xks- $*I( -+ 0). So, multiplying (43) by
xFs+'(j = 1, .. . ,n) and taking limits we have
xj*(A; - 1 + Bj*)= 0, (48)
where, by continuity, Lemma 1 and Lemma 2 (59)
REFERENCES matics, T. Coleman and Y. Li, Eds. Siam: SIAM Publications, 1990,
vol. 46, pp. 3-21.
T. F. Budinger, G. T. Gullberg, and R. H. Huesman, “Emission E. Levitan and G. Herman, “A maximum a posteriori probability ex-
computed tomography,” in Image Reconstruction from Projections: pectation maximization algorithm for image reconstruction in emission
Implementation and Applications, G. Herman, Ed. Berlin, Heidelberg: tomography,” IEEE Trans. Med. Imag., vol. MI-6, no. 3, pp. 185-192,
Springer-Verlag, 1979, ch. 5, pp. 147-246. 1987.
G. T. Herman, Image Reconstruction from Projections: The Funda- S. Geman and D. McClure, “Bayesian image analysis: An application
mentals of Computerized Tomography. New York: Academic Press, to single photon emission tomography,” in Proc. Statist. Comput. Sect.,
1980. Amer., Statist. Assoc., Washington, D.C., pp. 12-18, 1985.
M. M. Ter-Pogossian, M. Raichle, and B. E. Sobel, “Positron emission A. R. De Pierro, “A generalization of the EM algorithm for maximum
tomography,” Sci. Amer., vol. 243, no. 4, pp. 170-181, 1980. likelihood estimates from incomplete data,” Dep. Radiology, University
G. T. Herman, A. R. De Pierro, and N. Gai, “On methods for maximum of Pennsylvania, Tech. Rep. MIPGll9, 1987.
a posteriori image reconstruction with a normal prior,” .I Visual
. Comm. T. Hebert and R. Leahy, “A generalized EM algorithm for 3-D Bayesian
Image Repres., vol. 3, no. 4, pp. 1-9, 1992. reconstruction from Poisson data using Gibbs priors,” IEEE Trans. Med.
A. Rockmore and A. Macovski, “A maximum likelihood approach to Imag., vol. 8, pp. 19&202, 1989.
emission image reconstruction from projections,” IEEE Trans. Nucl. Sci., A. R. De Pierro, “Multiplicative iterative methods in computed to-
mography,” in Mathematical Methods in Tomography, Lecture Notes
vol. NS-23, pp. 1428-1432, 1976.
L. Shepp and Y. Vardi, “Maximum likelihood reconstruction for emis- in Mathematics, A. L. G. Herman and F. Natterer, Eds. New York:
Springer-Verlag, 1991, vol. 1497, pp. 167-186.
sion tomography,” IEEE Trans. Med. Imag., vol. MI-1, pp. 113-121, I. Csiszir and G. Tusnidy, “Information geometry and altemating
1982. minimization procedures,” in Statistics and Decisions. Munchen: R.
A. P. Dempster, N. M. Laird, and D. D. Rubin, “Maximum likelihood
Oldenbourg Verlag, 1984, pp. 205-237.
from incomplete data via the EM algorithm,” J . Roy. Stat. Soc., Series P. Green, “Bayesian reconstruction for emission tomography data using
B, vol. 39, pp. 1-38, 1977. a modified EM algorithm,” IEEE Trans. Med. Imag., vol. 9, no. 1, pp.
G. T. Herman and D. Odhner, “Performance evaluation of an iterative 84-93, 1990.
image reconstruction algorithm for positron emission tomography,” K. Lange and R. Carson, “EM reconstruction algorithms for emission
IEEE Trans. Med. Imag., vol. 10, no. 3, pp. 336346, 1991. and transmission tomography,” J . Comput. Assist. Tomog., vol. 8, pp.
P. Green, “On the use of the EM algorithm for penalized likelihood 306316, 1984.
estimation,” J . Royal Star. Soc. B, vol. 52, no. 2, pp. 443452, 1990. M. Avriel, Nonlinear Programming: Analysis and Methods. Engle-
G. T. Herman, D. Odhner, K. D. Toennies, and S. A. Zenios, “A wood Cliffs, NJ: Prentice-Hall, 1976.
parallelized algorithm for image reconstruction from noisy projections,” A. Ostrowski, Solution of Equations in Euclidean and Banach Spaces.
in Large Scale Numerical Optimization, Proceedings in Applied Mathe- New York: Academic Press, 1973.