You are on page 1of 14

Network Tomography: Estimating Souree-Destination Traffie Intensities

from Link Data

Y. Vardi

Journal ofthe American StatisticalAssociation, Volume 91, Issue 433 (Mar., 1996),
365-377.

Stable URL:
http://links.jstor.org/sici?sici=0162-1459%28199603%2991%3A433%3C365%3ANTESTI%3E2.0.CO%3B2-2

Your use of the JSTOR archive indicates your acceptance of JSTOR' s Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions ofUse provides, in part, that unless you
have obtained prior permission, you may not download an entire issue of ajoumal or multiple copies of articles, and
you may use content in the JSTOR archive only for your personal, non-commercial use.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or
printed page of such transmission.

Journal ofthe American Statistical Association is published by American Statistical Association. Please contact
the publisher for further permissions regarding the use of this work. Publisher contact information may be obtained
at http://www.jstor.org/joumals/astata.html.

Journal ofthe American Statistical Association


©1996 American Statistical Association

JSTOR and the JSTOR logo are trademarks of JSTOR, and are Registered in the U.S. Patent and Trademark Office.
For more information on JSTOR contactjstor-info@umich.edu.

©2003 JSTOR

http://www.jstor.org/
WedMar 1214:12:412003
Network Tomography: Estimating Source-Destination
Traffic Intensities From Link Data
y. VAROI

The problem of estimating the node-to-node traffic intensity from repeated measurements of traffic on the links of a network is
formulated and discussed under Poisson assumptions and two types of traffic-routing regimens: deterministic (a fixed known path
between each directed pair of nodes) and Markovian (a random path between each directed pair of nodes, determined according
to a known Markov chain fixed for that pair). Maximum likelihood estimation and related approximations are discussed, and
computational difficulties are pointed out. A detailed methodology is presented for estimates based on the method of moments.
The estimates are derived algorithmically, taking advantage of the fact that the first and second moment equations give rise to a
linear inverse problem with positivity restrictions that can be approached by an EM algorithm, resulting in a particularly simple
solution to a hard problem. A small simulation study is carried out.
KEY WORDS: Communication network; Computer network; EM algorithm; Linear inverse problems; Maximum-likelihood;
Moment method; Origin-destination tables; Poisson traffic; Positivity constraints; Trip matrix.

1. INTRODUCTION independent. We denote vector s and matrices by boldface


We consider the problem of estimating the traffic intensity and transpose by a prime. X(k) == (xi k ) , ... ,X~k)), is the
between all source-destination (directed) pairs (SD pairs SD 'transmission vector at period k, k == 1, ... , K. Let r
heretofore) of nodes of a network from repeated measure- be the number of directed links in the network, and let
ments of traffic flow along the directed links of the net- A denote.the r x e routing matrix for a fixed-routing net-
work. For concreteness and simplicity of presentation, we work. A is a zero-one matrix; its rows corresponds to the
discuss the problem in terms of "packets" or "messages" directed links, its columns correspond to the SD pairs, and
transmitted over a communication network although the its entries, aij' s, are "1" or "O"according to whether link
i does or does not belong to the directed path of the SD
development is applicable to any network for which our
pair ]. (For simplicity, we index the SD pairs and the di-
formal assumptions (described later) hold. We assume that
rected links with one dimensional indices, typically j and
the SD pairs transmit messages over a strongly connected
i. This may occasionally lead to a slight ambiguity as we
directed network (Le., there exists a direct path between any
may refer to the third SD pair (say) as "] == 3" and also as
two nodes). The networks are divided into two categories:
] == (jI, ]2) == (a, b) (say). The alternative, however, would
fixed routing (deterministic)andrandom (Markovianlrout-
be overly cluttered expressions with too many subscripts
ing networks. In fixed routing networks, the directed path
and superscripts, throughout the article.) The measured data
between any two nodes is fixed and known and remains the are the traffic on all the links of the network for K time
same for all messages traveling between these two nodes. . ds.. y(k) -_("l".(k)
perlo L 1 , ... , "l".(k)),
L r
k· - 1
, - , ... "
K ,where
In random routing networks, the directed path taken by a
message traveling from the source node j¡ to the destinatioil
nodej, is determined according to a fixed known Markov
chain specific to the SD pair (]1,]2). From a purely tech- k == 1, ... , K. (1)
nical standpoint, fixed routing is a special case of random
routing, much as a sequence of iid random variables is a
special case of a Markovian sequence. But there are' many Our goal is to estimate A == (Al, ... , Ac ) ' from y(l),
advantages to treating the two cases separately. and so, for within the framework of this model. Because of
... , y(k}

now, we proceed with the case of fixed routing, postpon- the network interpretation of the model, e (== n(n - 1))
ing the discussion and treatmerit of random routing until is typically larger than r (== O(n)), and this raises interest-
Section 5. ing questions regarding identifiability of theparameters and
Let c denote the number of SD pairs (c == n(n - 1) in a consistency of the estimates. Note that although the model
is linear, it is neither a "typical" linear regression nor a
network of n nodes), and let Xjk) be the number of trans-
random-effects model, because of the zero-one structure of
mitted messages for SD pair ] at measurement period k. We
A, the nonnegativity constraints on the .parameters, andthe
assume that Xjk} rv Poisson(Aj),j == 1, ... , c, k == 1, ... , K, Poisson assumption.
ExampleI, Consider the network of Figure 1. There
Y. Vardi is Professor of Statistics, Department of Statistics, Rutgers Uni- are (4 x 3 ==) 12 SD pairs and 7 directed links. The routing
versity, Busch Campus, Piscataway, NJ 08855. The problem originated in matrix A is given by
a consulting project to DARPA (Becker and Wilks 1993) regarding the
statistical analysis of computer data networks, and was brought to his
attention by D. Pregibon, C. L. Mallows and P. McCullagh contributed
useful comments.Teanne Yue helped with prograrnming, and National Sci-
ence Foundation provided financial support. The works of Sherali et al., © 1996 American Statistical Association
and Bell, were brought to his attention,at the proofreading stage of this ~ournal ofthe American Statistical Association
article, by R. Becker. March 1996, Vol. 91, No. 433, Theory and Methods

365
366 Journal 01 the American StatisticalAssociation, March 1996

simple and will typically be satisfied by any real-life rout-


ing matrix A. This is somewhat unexpected, because in the
simplest example, when r == 1, the Y's are sums of indepen-
dent Poisson variables and so the intensity vector A is not
identifiable (e.g., A == (1,1,0,1), so that Y == Xl +X2+X4
and hence only the sum Al + A2 + A4 is identifiable but not
the individual components Al, A2'A3' A4.)
The following holds.
Lemma. If the columns of the routing matrix A are all
distinct, and each column has at least one nonzero entry,
then A is identifiable. '

Note. A zero column means that the corresponding SD


pair is not connected by a path. Likewise, a zero row means
Figure 1. A Sketch ot a Four-Node Directed Network. (The actual that the corresponding link is not a part of the network.
routs between any two nodes are separately specified by the matrix A Thus any real-life routing matrix A would not have either
of Example 1.1.)
columns or rows of all zeros.
1 2 3 4 5 6 7 8 9 10 11 12
ab ae ad ba be bd ea C;b ed da, db de
Proof. See the Appendix.
1 (a--+b) 1 O O O O O O O O O O O Corollary. If e > 2T - 1, then sorne of the individual
2 (a--+e) O 1 1 O O 1 O O O O O O
. 3 (b --+ a) O O O 1 O 1 1 O O 1 O O rates Al, ... , Ac cannot be estimated separately.
4 (b --+ e) O O O O 1 O O O O O O O
5 (e--+b) O O O O O O 1 1 O 1 1 O
Proofi , 2T
- 1 is the maximal number of distinct zero-
6 (e --+ d) O O 1 O O 1 O O 1 O O O
7 (d--+e) O O O O O O O O O 1 1 1 one vectors with at least one "1" in each vector.

Note: The routing is deterministic and is .prespecified for 3. E8TIMATION (FIXED ROUTING)
each SD pairo (This assumption is relaxed.in Seco 5). For
We start with a discussion of the maximum likelihood
instance, although in Figure 1 both paths, b ,'---+ e ---+ d and
estimator (MLE) and an iterative procedure,' based on the
b ---+ a ---+ e ---+ d, connect the SD pair (bd), the routing
EM algorithm..The trouble with this approach, however,
matrix specifies that messages from b· tod always travel on
is .that the iteration formula is very hard .to compute. In
the path b ---+ a ---+' e ---+. d. Traffic from d to .b, on the other
the course of the discussion, .to demonstrate the difficulty
hand, always travels on the path d ---+ e ---+ b.
of the problem, wegive a simple but constructive example
In Section 2 we show that if the columns of A are all
where the likelihood equation has a. unique solution that
nonzero and different from each other, then A is identifi-
differs .from theMLE, which is also unique. Alternative
able. In' Section 3 we discuss estimation via the methods
methods that are potentially more computationally tractable
of maximum likelihood (ML) and moments. ML estimation
arediscussed next. We then propose and experiment with
is numerically complicated for the problem at hand. But
a particularly simple method motivated by the method of
the method of moments is simple, because the moment-
moments. Given that we are estimating parameters within
equations make up a LININPOS (LINear INverse POSitive)
the exponential family, it is reasonable to, expect that this
problem and so can be solved using a simple EM/ML algo-
method is relatively efficient. Our simulation experiment
rithm (Snyder, Schultz, and O'Sullivan 1992; Vardi and Lee
(discussed in the next section) indicates that the method
1993) to produce a method-of-moments estimate. "Wethen
produces .approximately unbiased estimates of A already
simulate a small network to check the performance of such
for moderate values of K.
an estimate. The simulation resultssuggest that the estimate
tends toward unbiasedness already fo.r a moderate number 3.1 Maximum Likelihood Estimation
of repeated link-traffic measurements. In Section 4 we of-
fer sorne possible modificationsand discuss an estimation We start with an interesting "toy" example that demon-
strategy when the Poissonassumption can be,viewed only as strates the difficulty of the problem due to boundary solu-
a crude approximation. The proposed method of weighted tions.
moment equations thenamounts to estimationbased on the Example 3.1: (r == 2, e == 3, K == 1). Let X,
first moments (which do not depend on the Poisson assump- rv Poissoru.x.), i == 1,2,3 independent, and suppose that
tions) regularized by the second moments (which do depend we are given
on the Poisson assumption). Finally, in Section 5 we derive
method-of-moments estimates for networks with Marko-
vian routing.

2. IDE·NTIFIABILITY
Our goal is to derive the ML estimate of A. In the notation
The conditions for identifiability of A are surprisingly of the earlier sections, we have
Vardi: Network Tomography 367

and, theoretically, the EM algorithm (Dempster, Laird, and


Rubin 1977) can be used to search for a solution. But the
implementation is a different story. If we treat the X(k), k
The following holds: The unique solution of the likeli- == 1, , K, as the complete (unobserved) data and y(k), k
hood equations is A == (0,1,2)'. But the unique MLE is = 1, , K, as the incomplete (observed) data, then the EM
~ == (1,0,1)'. To verify this, we first derive the likelihood algorithm, in vector notation, is
of the data: (2) has only two possible solutions in natural A (n+l) = E[Xly(I), ... , y(K), A (n)],
numbers for the X's: n= 0,1, ... ,

x == (1,0,1)' and X == (0,1,2)',


(4) (.x(O) > O, arbitrary). Because of linearity oí E and inde-
pendence across k's, (8) becomes
and so the likelihood of the data is K
.x (n+l) = ~ 2: E[X(k) Iy(k), .x (n)], n = 0, 1, . . . . (9)
L(A) - p~ {Y == (1, 2)'} k=l

p~ {X == (1, 0, 1)'} + p~ {X == (O, 1, 2)'} The trouble with this iteration formula is that the summands

== (AIA3 + A2A~/2) exp( -Al - A2 - A3). (5) E[XIY,A] (10)

Let l(A) == log L(A). Setting the partial derivatives of l equal (superscripts ignored for simplicity) are extremely hard to
to zero, we get the likelihood equations calculate, as they require first finding all the solutions in
natural numbers ofAX == Y. To demonstrate the details of
the procedure, consider again our "toy" example, where A
and Y are given by (3). From (4), we derive

(e) (2AIA3 + A2A~)-12(AI + A2A3) == 1. (6)


[XIY, A] r-;»

Equations (óa) and (ób) imply that


( ~) with probo 2,Atf(2,Al + ,A2,A3),
A3 == 2.
(11)
Substituting this back into (6) results in the equations
and so (9) leads to the iteration

°
which have a unique solution Al == and A2 == 1. Thus
A == (O, 1,2)' is the unique solution of the likelihood equa-
tions. But the MLE is ~ == (1,0,1)'. To prove this, first
note that the MLE must be a boundary point of the region (12)
Ai 2:: 0, i = 1,2,3, because no interior point satisfies the
likelihood equations (as we just proved). From (5), it is clear The computational difficulties in replicating this process
that any candidate for a MLE must have A3 > and one of °
Al or A2 strictlypositive (but not both, because otherwise it
and in calculating (10) for anything but a "toy problem'tare
apparent. Unconnected to this computational difficulty, note
will be an interior point), If Al = 0, then we need to max- here that both (0,1,2)' and (1,0,1)' are stationary points of
imize (A2A~/2) exp( ~A2 - A3) subject to A2' A3 > 0, and the iteration (12), and so, depending on the starting point,
this maximum is attained at A2 = 1 and A3 = 2. If A2 = 0, the algorithm may converge to a non-MLE point. .
then we need to maximize AIA3 exp(-AI - A3) subject to Another interesting question is whether the log-
AI,A3 > 0, and this maximum is attained at Al = A3 = 1. likelihood isconcave, which would have simplified the
Thus there are two possible candidates for MLE: (O, 1,2)' likelihood optimization problema After sorne algebra, it is
and (1,0, 1)'. Comparing the likelihoods of these candi- possible to express the matrix of second derivatives of the
dates, one verifies that L((O, 1,2)') < L((l, 0,1)'), and so log-likelihood function l as
the unique MLE is A == (1,0, 1)'. 2l
H == ( 8 )
3.2 An EM Derivation of the MLE - 8Ai8Aj
K
The likelihood equations (O == 8l/8Aj,j 1, ... ,c,l
- log L) in vector notation can be expressed as == 2: cov[x(k) / Aly(k) , A]
k=l

0=
K
~ 2:E.\[X(k)ly(k) = AX(k)] ~.x,
k=l
(8) - diag (t, E[X(k) /.x 2Iy(k),.x]) , (13)
368 Journal of the American Statistical Association, March 1996

where the quotient vector s X(k) / A and X(k) / A2 are the large K. We start 'with the first. Assume that
vectors of coordinate-wise ratios. It follows that
A == diag(A), (17)

and consider the joint distribution of (X', Y')' when Y


AX. Then

It can then be shown that this quantity is not necessarily


::; O for all A, and so l is not necessarily concave. Note, and
however, that when the true A, say A*, is an interior point,
then for large K's, l is concave in the neighborhood of A*. XIY r-;» Nc{A + AA'(AAA)-l(y - AA),
To see this, apply the law of large numbers to (14) to get
A - AA'(AAA')-l AA}. (19)

~ ~zizjHij
"',J
----+ E"*var,, [t ~: XiIY]
",=1
In particular, (10) is approximated as A + AA' (AAA -1)
(Y - AA). Applying this to (9) results in the iteration for-

- E"*E,, [t ~} XiI Y ] , (15)


mula
K '
,X(n+l) = ~L [,X(n) +A(n)A/(AA(n)A/)-l
where Y = AX. Setting A = A* in (15), we can apply the k=l
general formula

var[U] = E(var[UIV]) + var(E[UIV]) We have not experimented with this method, but we suspect
to get that it may perform badly for sorne cases because the nor-
mal approximations (17), on which the iteration is based,
(right side of (15) with A = A*) is a poor approximation unless we ··know a priori that all
of the Ai' s are large. Other potential difficulties with this
= varj, (t ~i. Xi)
i=l '"
approach are the cumulative approximation error in each
step of the iteration (because we apply the normal approx-
imation to each term in (9) separately), potential numerical

- varj, ( E" [t ~: XiI Y]) instability from the matrix inversion in each iteration, and
the possibility of converging to negative values of Ai'S.
In the second normal approach,we approximate the dis-

- E" [t (~:r -. tribution of Y when K is large witha normal distribution:

K
Y = ~ Ly(k) d~. Nr(A'x,K-1AAA/), A = diagjX).
(A = A*), k=l

(21)
(16)
Treating the right side of (21),as the exact distribution of Y,
where the last equation follows from theequality of the the log-likelihood of Y (modulo multiplicative andadditive
mean and the variance for Poisson random variables and
constants) is
from the independence of the Xi's. Thus if K is large and
the algorithm is iterated in the vicinity of the true A (as- l(A) = -lag IAAA'I- K(Y.- AA).'(AAA')-l(Y - AA).
sumed interior), then the algorithm converges to anMLE
near the true A. In fact, irrespective of the numerical dif- (22)
ficulties in calculating the MLE, for large K the standard
asymptotic properties of the MLE should hold for the case A Ml.E'based on this approximation would seekto max-
at hand. imize l(A) of (22) subject to the constraints Ai 2:: O, i
1, ... ,c. When K is large, the second term is the
3.3 MLE and Normal Approximations
dominant term in (22), and this suggests argminA>oK(Y
Two types of normal approximations can be considered - AA)'(AAA)-l(y - AA) as a reasonable large-sample
"natural" here. The first is a normal approximation of each substitute for the MLE. Note that this is a weighted least
of the summands in (9), which amounts to approximating square with positivity constraints and with weights depend-
(10) using multivariate normal theory. The second is based ing on A, which can be estimated by the sample covariance
on the centrallimit theory (CLT) approximation of Y for matrix of the Y's.
Vardi: Network Tomography 369

3.4 Estimation Based on Sample Moments gebraically inconsistent system of equations (with proba-
3.4.1 First and Second Moment Equations. The approx- bility ~ 1). Furthermore, due to sampling variability, we
imate normal distribution of V is completely determined by can also expect to observe sorne negative sample covari-
the mean vector, AA, and covariance matrix, AAA', of Y. ance values, Sii' 's, whereas the theoretical covariance val-
Thus we can equate the sample's first and second moments ues, ¿l ailai'lAl, are always nonnegative. Finally, when
to their theoretical values to obtain a linear (in A) system ¿l auaei = O and Sii' =1= O, we get a degenerate false iden-
of estimating equations: tity Sii' = O, which, as before, is attributable to sampling
variability and should be deleted. These algebraically in-
e
consistent equations are to be expected, and discussion on
(a) E(Yi) = Yi = L ailAl, i = 1, ... ,r how to handle them follows the simulation experiment dis-
l=l cussed latero For now, suppose that all the constants on the
and left side of (24) are strictly positive (Le., Yi, Sii' > O) and
that B has no rows of zeros. (Clearly, A has no such rows.)
(b) -("l.l'" x> ) _ ~ ~ y, (k)y(k) _ "l7."l7.
Then, because all of the entries of A and B are nonnegative
COY .li, .li' - K L...J- i i' .I.i.li'
k and because (V', S')' >. O and A is constrained to be 2:: O,
e (24) is of the general form of a LININPOS problem, and we
= L~ilai'lAl' 1 < i < i' < r. (23) propose using an EM algorithm to"solve" it. Formally, this
l=l solution is the minimizer of the Kullback-Leibler-directed
distance between the observed sample moments and their
These are r(r + 3)/2 linear equations that can be written, theoretical values (modulo a certain natural scaling), and it
in vector notation, as can also be viewed as an MLE of the algebraic solution un-
der a multinomial model. (Details have been given in Snyder
(24) et al. 1992 and in Vardi and Lee 1993.)
3.4.3 On the Relation to Maximum Likelihood Estima-
where S is the sample covariance matrix .stretched out as
tion. Left-multiplying the likelihood equation (8) by A, we .
a vector of length r(r + 1)/2 (which, for simplicity of
see that any stationary point of the log-likelihood function,
presentation here, we index by a double subscript (i, i') 1
::; i ::; i' ::; r, ordered Iexicographically, say),
l, also satisfies the first-moment equations V = AA. The
reverse implication is not true,but a partial converse holds.
_ 1. ~ (k) (k) -- Specifically, if the columns of A are linearly independent
Sii' - K L...J ~ ~, - YiYi', (which is not the case in our typical application where e
k
> r), and A* solves the first-moment equations, V = AA*,
and B is an (r(r +~1)/2) x e matrix (rows indexed by then A* solves (8) and hence is a stationary point of l. To
(i, i'), 1 ::; i ::; i' ::; r, to match the indexing of S) with prove this, denote the right side of (8) by 1](A) and note that
the (i, i')th row of B being the element-wise product of V = AA* implies A1](A*) = O, which implies 1](A*) = O
row i and row i' of the matrix A. for A with linearly independent columns, so that A* is a
solution of (8).
Example 3.2. .Suppose that It is important tonote, however, that because of the non-
negativity constraints on A, the MLE may be a boundary
A=
(1O O 1
1 1
1
O
O 1 O 1
¡). point and so is not necessarily a stationary point of l, as
demonstrated in Example 3.1.
3.4.4 Details of the Method: Implementation for Matri-
Then
ces in Block Form. The canonical form of the EM iteration
1 O 1 1 1 SIl for solving the LININPOS problem V = AA is
O O 1 O 1 S12
O O O 1 1 S13
B= S=
O 1 1 O 1 S22
O 1 O O 1 j = 1,.... ,c. (26)
S23
O 1 O 1 1 S33

3.4.2 A Closer Look at the Moment Equations (24). If the linear system is given in a block form as
First we recall that because the X, 's are independent and
the aij' s are zeros or 1s, the marginal distributions of the
Yi's are also Poisson and so
(27)
i = 1, ... .r. (25)

On the other hand, sampling variability will typically re-


sult in Yi =1= Su, i = 1, ... , r, and so (24) will be an al- where A is r x c, V is r x 1, and S is m x 1 (indexed as r
370 Journal of the American Statistical Assoclatíon, March 1996

+ 1, ... , r + m) and B is m x e (rows indexed as r + 1, and Lee 1993),


... , r + m), then (26) becomes the following iteration:
~J.(A , Y , A) ==
-
Aj ' "
L..-J~
aij
A'Yi j == 1, ... , c, (30)
A·J a.j i LJk aik k
Aj ~
~r
LJi= 1 aij + ~r+m b
LJi=r+ 1 ij for any matrix A and vectors Y and A, with their respective
positivity constraints.
3.4.5 A Simulation Experiment. To simulate the net-
work described in Example 1.1, we took A == (Al,
j == 1, ... , c. (28) A2, ... ,A12) == (1,2, ... , 12) to be the "daily" transmission
rate between all 12 pairs and generated daily data on the
A similar block decomposition was first utilized by lusem seven links for 100 days, y(l) (= (y}l), ... , Y7(1)),), ... ,
andSvaiter (1994). Equation (28) can now be written as y(lOO). We then calculated the sample means, Y, and
covariances, S, and applied the algorithm (29). This
led to an estimate ~ == (~l' ~2, ... ' ~12)" We repeated
this whole process 50 times to produce 50 such esti-
A (1) A (50)
mates, A , ... , A . The mean vector and covariance
with "." denoting surnmation over the subscript that it re- matrix based on these 50 estimates were X == (1.01,
places (e.g., a.j == Ei aij) .. Here ~j is defined by the canoni- 2.37,2.68,4.72,5.06,5.79,6.84,7.92,9.25,9.87, 11.33, 12.14)
cal EM iteration for LININPOS problems (as noted in Vardi and

.01 -.01 -.01 -.10 -O .03 -.01 -.09 -.01 -.07 .08 -.02
1.41 -.62 .44 .02 -1.22 -.40 .15 .89 -.10, -.81 -.07
3.00 .88 -.27 -1.48 -.18 -.04 -.51 .10 .69 -.62
8.62 .15 -1.55 -2.15 6.56 .46 -7.05 -1.88 3.87
.22 .20 -.24 .33 .10 .08 -.27 .05
OOV(~) == * * reflect * * 3.78 -.51 -.27 -.97 -.01 .67 .55
6.96 -5.03 .46 -2.84 2.46 -1.1'5
10.79 -.13 -3.48 -7.38 5.76
1.78 .24 -.87 .07
12.69 -1.60 -4.52
11.50 -5.93
8.54

The simulation results suggest that the method leads to there is no shortage of moment equations, deleting the bad
unbiased estimation of A and that the variance of the esti- ones seems a reasonable strategy. It is also related to our
mated Aj increases monotonically with Aj. later discussion of reducing weights to sorne moment equa-
tions.
3.4.6 On Algebraic Inconsistency and Bad Equations.
Algebraic inconsistency of (24) of the type described in 4. REGULARIZ~TION:WEIGHTED
connection with Equation (25) is not a real problem. In fact, MOMENT-EQUATIONS
this inconsistency often implies that the ML/EM solution The first-moment equations Y == AA follow from the
of the LININPOS problem (24) has a unique solution. (This linearity of the system (1) and is independent of the Poisson
follows from Byrne 1993 and from Roeder, Devlin, and assumption. The second-moment equations depend strongly
Lindsay 1989.) Intuitively, the point is that both equations on the Poisson model, and so in. applications where this
Yi == EYi and Su == EYi are 'believable and valid as mo- model is only a crude approximation, one may want to give
ment equations under the Poisson model, and so we let the these equations smaller weight relative to the first-moment
"geometryof Kullback-Leibler distances" determine the equations. This is achieved by replacing S and B in (27)
best compromise between all these inconsistent equations. with eS and eB, for O < e < 1. This in turn changes (29)
But the situation is different when we have infeasible equa- into
tions (e.g., when Sii' < O but El ailai'l > O) or false iden-
tities (e.g., when Sii' =1= O but El ailai'l == O). These are
known to be impossible moment equations under the Pois-
son model and should be given no weight by comparison
to feasible moment equations. But assigning no weight to Note that when e -+ O, the solution is based entirely on the
these equations amounts to simply deleting them. Because first moments, and as such it is independent of any proba-
Vardi: Network Tomography 371

bilistic assumption (but most likely is not a good estimate The 12 columns of A correspond to the 12 Markovian net-
Of'A because of shortage of estimating equations). On the works depicts in Figure 2.
other hand, when e -+ 1, the solution is based on the first- As in the earlier sections, we observe the traffic on the
and second-moment equations for the Poisson model. Thus links and are interested in estimating thetransmission rates
e can be interpreted as a regularization parameter, when the for all .SD addresses,· based on these data and the matrix
regularization in thiscase attempts to pull the model toward A. The problem at hand resembles the estimation problem
a Poisson model. This approach, in a different context and in PET (Shepp and Vardi 1982; Vardi, Shepp, and Kauf-
with a goal similar in spirit to Silverman, Jones, Wilson, man 1985), where the link data (also Poisson, as sums
and Nychka's (1990) EMS and Green's (1990a,b) one-step- of "thinned" Poisson variables) are the equivalent of the
late EM, was suggested by Iusem and Svaiter (1994), who "detector tubes" data in the PET problem. But there is a cru-
discussed convergence properties of (31). cial difference that makes the current problem much harder.
Although the detector tubes data in PET are mutually in-
dependent, the link data are highly dependent. In fact, the
5. RANDOM (MARKOVIAN) ROUTING
current problem is similar to a PET problem in which the
Here we consider the same problem under the same model detector tubes overlap. Thus ML estimation appears to be
assumptions as in Sections 1-4, except that the assump- very complicated. Because of this, we focus again on the
tion of fixed routing is replaced by Markovian routing. For method of moments.
specificity and ease of description, we use the term SD ad- Despite the significantly more complicated nature of the
dress (and sometimes just address) to mean the combined random routing regimen, it turns out that the first and sec-
sender and receiver addresses, so that a packet traveling ond moments of the observed link data are linear in A, and,
from source i, to destination j2 is said to have an SD ad- as in the case of fixed routing, the moment equations con-
dress j (= (jI, j2) ). (This is like a standard letter, which stitute a LININPOS problem that can be "solved" using an
usually includes both sender and receiver addresses.) As EM/ML algorithm. The hard part is then the calculation of
before, the routing regimen is specified by an r x e matrix EYi and cov(Yi, Yi,), 1 ~i, i' ~ r, which are needed to set
A, but now the entries are conditional probabilities: up the moment equations, These are derived latero

aij is the conditional probability that a packet with SD- 5.1 Estimation: First and Second Moments
address j passes through link i = (i l , i 2), conditioned on the 5.1.1 Network Thinning of Poisson Variables. In gen-
packet being at node i l . (It is assumed that this probability eral, "thinning" of Poisson variables is a reference to
is independent of how the packet arrived at i l but depends the following general property: IfX rv Poissoní.x)
only on its SD address j.) . and [YI , ... , YkIX] rv multinomial(X;7rI,'" ,7rk), then
For any fixed SD pair, the Markovian nature of the routing YI, ... , Yk are mutually independent andr, rv Poissonf.or.),
is a consequence of the assumption that the routing of any i ~ 1, ... , k. (The index k here should not be confused with
packet at each particular node does not depend on the path the index k for themeasurement period in earliersections,
through which it arrived at the node, but depends only on as the derivation of the moments in this section requires
it being there and on its SD address. Thus each column consideration of only a single measurement period.)
of the matrix A corresponds to a directed network with e We can use this property as follows: Let X j be the num-
ber of packets with SD address j; X j rv Poissont.xj). Let
nodes and r links, with probabilities attached to each link.
Typically, if the probability of passing through a given link Y/ be the number of packets with address j passing through
is zero, then we do not include the link in the diagram of link i (= (i l , i 2)). It follows from the "thinning property"
the network. Wefurther assume that the routing matrixA is that
such that a packet cannot revisit a node, and that for each Y/ rv Poisson(AjPl), (32)
SD address j = (jl,j2), the corresponding column of A
specifies a network of paths allleading from i, (source) to where
j2 (destination).
pI probability that a packet with address j
Example 5.1. Consider a network of 4 nodes (and hence passes through link i.
12 SD addresses) and 9 possible links given by the follow-
ing matrix A (blank entries are equal to zero): Furthermore, because of the independence of the Xj's, the
1 2 3 4 5 6 7 8 9 10 11 12 Y/ 's are independent across j for fixed i, and so
ab ae ad ba be bd ca eb ed da db de
1 a ~ b .8 .2 .2
2a ~ e .2 .8 .8 1 1 i = 1, ... , r. (33)
3b~ a 1 .2 .1 1 1
A= 4b~ e .8 .8 .1 1
d
b~
5
6 e~ b . 8
.2 1
.2
.8
.8 .8
1
.2 1 1 Next, we express pi in terms of the routing matrix A.
t e-s « .2 .8 1 .2 .2 .8 We define a chain of connected links as a path, and we say
8b~d 1 1 1 .8 .8 .2 that a path is possible if it has positive probability. Specif-
9 d~e 1 .2 .2 .8
ically, a path (i l, ... , ik) is possible for SD address j if it
372 Journal of the American Statistical Association, March 1996

ah ad
Q . Q
!.2 ¡.s
~~ i
G)E 1.0 @
s
Q
GJ
i 2
CE)
~~
1.0
~)'\j)

ha
a

1.0

't~~
ea eh ed
. G G G
O
Q CE)
G)E .@ 7~-
1.0 GJ 1.0~'\j)

da dh de
Q Q Q
·1.0/ Q
@<
iY'Z@ .8

Figure 2. The Four-Node Network of Example 5.1 With Its Markovian Routing Scheme (as Specified by the Matrix A) Gives Rise to 12 Markov
Networks Corresponding to the 12 Possible SD Addresses, as Depicted Here.

is possible to travel in sequence through i l , i 2 , ... , ik un- For example,


der the network probabilities for SD address j.We say
that a path is complete for SD address j if it is possi- pi = P~d = .2 X 1.0 + .8 X .2 X 1.0
ble and leads from the source (JI) to the destination (j2)'
Let B{ denote the set ofall complete paths for address j = .36 (= aI3 X a53 + a23 X a63 X a53)'
that pass through link i. Then the chain rule for probabil-
ities (P(D 3D2DI) = P(D3ID2DI)P(D2IDI)P(DI)) ando 5.1.2 Calculating the First and Second Moments of Y.
the Markovian nature of the routing imply that the proba- As it .stands, formula (34) is somewhat hard to calculate
bility for a packet to travel on a possible path (i l , ... , ik) because of the complexity of constructing the set of paths
in B{ is the product ai1j ... aikj' Hence B{ . Here we give a relatively simple method for deriving
P from A, using Markov chains techniques, and demon-
(34) strateit onthe foregoing example. Expressions (33) and (34)
give us the necessary ingredients for the moment equations
based on the mean and variance, because
(i = 1, ... ,r,j = 1, ... ,c).
EY=PA, Var y = PA, (35)
Example 5.1 (continued). We get the following p!'s,
i = 1, ... ,9 and j = 1, ... ,12 (blank entries are equal to where var Y = (varYI , ... , varYr ) ' and P is given in (34).
zero): To construct additional second-moment equations corre-
sponding to cov(Yi, Yi,), we need the following. Let
1 2 3 4 5 6 7 8 9 10 11 12
ab ae ad ba be bd ca eb ed da db de
1
2
a
a
-+
-+
b
e
.8
.2
.2
.8
.2
.8 .2 .1
~1, = the number of packets with SD address j
p=
3
4
5
b
b
b
-+
-+
-+
a
e
d
.16
.04 .36
1 .2
.8
.1
.1
.8
1

.2
1
.2 passing through both link i and link r.
6 e -+ b .16 .16 .8 .8 .2 .2 .2
7 e d .04 .64 .2 .2 .2 .8
8 d
-+
-+ b .04 .2 .2 .8 .2 .2 Then, again from the "thinning property" of Poisson vari-
9 d e .04 .2 .2 .8
-+
ables, we have
Vardi: Network Tomography 373

Example 5.2. Consider the network depicted in Figure


3. We have

,Bg,s == {(1, 5" 8, 9,13), (2,8,9,13), (3,6,8,9,13),


(1,5,8,9,11,14),(2,8,9,11,14),(3,6,8,9,11,14)},

B4,s == 1, B~,6 == 1,

B~,s == B~ == {(1, 5,8,9,13), (1,5,8,9,11,14), (1,5,8,12),


(2,8,9,13),(2,8,9,11,14),(2,8,12),(3,6,8,9,13),
(3,6,8,9, 11,t4), (3,6,8,12)},

and
Figure 3. A Oiagram ot a Network Corresponding to SO Address j
(= 01, le) With 14 Links Labeled 1, ... , 14.
Bi,lO == {(3, 6,10,13), (3,6, 10, 11, 14)}.
~~, rv Poisson(AjPh'), 1 ~ i, i' ~ T, j == 1, ... , c,
Note: Because the sum of the probabilities over all(par-
(36) tial) paths originating at any given node and terminating at
where
the destination is 1,we can modify the definition of i , toBl
Bl
disregard the tails of all paths in i , beyond the point at
Ph' == probability that a packet with SD address j which they pass both link i and link i', (This is so because
passes through both link i and link i', the absence of loops implies that all such paths either pass
firstthrough i and then through i', or the other way around
Now we can decompose ~j and ~~ as follows: ~j == ~~, for all such paths. Otherwise, there would be a loop be-
+ U1.,
'l:¿
and y~ == y~, + U!,., where U1., is the number of
'l, 'l,'l, 'l, 'l, 'l,'l, tween i and i', contradicting our assumption. This permits
packets with address j passing through link i but not i', Bl
modifying the definition of i , to be the set of all possi-
and Uti is the number of packets with address j passing ble paths for address j starting at j 1 and passing through
through link i' but not i. Because of the thinning property, both link i and i' and terminating at either link i or link i'
U/i" Uti' and ~~, are statistically independent, and thus we (whichever is furthest from j 1).) An analogous statement is
get of course true for Bl. With this modified definition, we get
in Example 5.2

Bg,s == {(1, 5, 8, 9), (2; 8, 9), (3,6,8,9)},


== L COV(~~, + U~" ~~, + Uti)
j
B~,6 == 1,
o L COV(~{/ ~{, )
j

B~ ,s == B~ == {(1, 5,8), (2,8), (3,6,8)},


(37)
and
(Here (1) follows from the independence of ~j,s across i.
(2) follows from the independence of ~~"U~" and as ci; Bi ,10 == ,{3, 6, ID}.
explained earlier; and (3) follows from (36).) ,
As in the derivation of (34), the chain rule for probabili- This simplifies the calculation of (34) and (38),considerably,
ties and the Markovian nature of the routing lead to as it reduces the number of summand and the length of the
products. Putting (37) and (38) together, we get
1 ~ i, i' ~ T,

j == 1, ... , c, (38)
where
set of all complete paths for address i. which is of the same general form as (35) combined
which pass through both link i and link i', with (34).
374 Journal 01 the American Statistical Association,. March 1996

@ (M + 1) X (M + 1) transition probability matrix between

®
i~~
2 ¡.s
~
1.0 )@
these links. To reduce the notational clutter, we suppress
the superscript j and use Q for Q(j). Q. is derived directly
from the matrix A as follows: For two links, i == (i l , i 2)
and i' == (i~, i~), if a direct transition from i to i' is
possible, so that i 2 == i~, then we set qi,i' == a{. If
Figure 4. The Markov Network ot SO Address (a, d) ot Example 5.1. i 2 =1= i~ (Le., i is not directly connected to i'), then
qi,i' == o. If i 2 == j2 (Le., == the destination node), then
5.1.3 A Simple Method for Constructing P of (34). the next state is necessarily M + 1, so that qi,M+I ==
For the following discussion, we fix the address j, and with- 1 if i 2 == j2. Also qM+I,M+I == 1. This is somewhat
out loss of genera1ity assume that the "active" links (Le., hard to see formally but easier to see via an example.
links with positive probabilities) are labeled 1, ... , M. Con-
sider a Markov chain with M + 1· states, where the states Example 5.1 (continued). Consider the address j
are the M links plus an additional dummy link that is an == (ad) (== "3") corresponding to the third column of A
absorbing state labeled M + 1 (or "end"), corresponding to in Example 5.1. The corresponding network is depicted in
an imaginary state that follows the last link of the path. (Al1 Figure 4..There are five active links, and the matrix Q is
paths are eventually absorbed at this state.) Let Q(j) be the the following 6 x 6 transition probability matrix:

1 2 3 4 5 6
(a ~ b) (a ~ e) (b~ d) (e ~ b) (e ~ d) (end)
1 (a ~ b) O O 1 O O O
Q _ 2 (a -+ e) O O O .2 .8 O
(40)
- 3 (b ~ d) O O O O o. 1
4 (e ~ b) O O 1 O O O
5 (e ~ d) O O O O O 1
6 (end) O O O O O 1

We shall also need the two-step transition probability matrix chain ever visits state i is then
O O O O O ·1
O O .2 O O .8 i== 1, ... ,M, (QO == 1).
O O O O O 1
Q2== (41)
O O O O O 1 This is seen as follows: Because there are no loops in the
O O O O O 1 chain, the event [chain visits state i] is a union of mutually
O O O O O 1 exclusive events, u~o [chain visits state i at step k], and
and the three-step transition probability matrix so we get(cf. (34»

O O O O O 1 P[chain visits state i]


O O O O O 1 M

Q3== O O O O O 1
(42)
L P[chain visits state i at step k]
O O O O O 1 k=O
O O O O O 1
O O O O O 1 (44)

Note that because there are no loops (by assumption), such


a chain is always absorbed at state M + 1 after at most Bringing back into (44) the suppressed dependency of Q
M steps, and so QM is a matrix with columns of all zeros and 1r on the SD address j, and noting that the event "chain
except the last column, which is all 1s. visits state i" is equivalent to a "packet with SD address j
The initial probabilities of the chain, say 1r, are also read passing through link i," we get
off the matrix A. These probabilities are positive only for
links connected to the source (JI) of address j, and in this (45)
case 1ri == a~. In Example 5.1, for j == (a,d) == "3", we get

== (.2, .8, O, O, O, O) (1re nd == O). (43) Example 5.1 (continued). For the address j == (ad), we
get
The probability that the chain is in "state i," i == 1, ... , M,
after k steps is then {-¡r'Qk)i' and the probability that the
Vardi: Network Tomography 375

This is obtained as the first five (= M) coordinates of mates are derived algorithmically, taking advantage of the
-¡r(j)' [Qo + QI + Q2]: fact that the first- and second-moment equations give rise to
a linear inverse problem with positivity restrictions, which
(.2, .8,0,0,0,0)[1 + QI + Q2] can be approached with an EM algorithm. This is done
for two different types of network routing schemes: fixed
= (.2, .8,0,0,0, O) + (0,0, .2, .16, .64, O) (deterministic) and random (Markovian). A small simula-
+ (0,0, .16,0,0, .84) = (.2'-.8, .36, .16, .64, .84). tion study is presented for a network with fixed routing,
and the results indicate tendency toward unbiasedness al-
Compare the vector (.2, .8, .36, .16, .64) with the third col- ready for moderate sample sizes.
umn of the matrix P in Example 5.1 and note that they are
the same, except that we deleted the inactive links of the 6.2 On the Poisson Assumption
network in the current calculation, because for these links
Without any probabilitymodel, one is left only with the
pid = O. linear equations (1), in which the X(k) are from unspecified
5.1.4 A Simple Method for Constructing P of (38). distribution(s). This cannot lead to anything useful. Thus a
The absence of loops induces a partial order on the net- model is needed to tie all these equations together and relate
work with respect to each address j. For any pair of links, the data to sorne underlying SD traffic pattern. A Poisson
i and i', they may be unconnected, in which case they are model is the most natural starting point for various reasons
unordered, or they may be connected by a possible path, including the popularity of Poisson-related models for time-
in which case they are' ordered relative to their distance dependent processes such as conventional telephone and
from JI. We say that i <j i' if i and i' are connected and i packet traffic (Heffes and Lucantoni, 1986), as well as ve-
precedes i' relative to the address j. If i and i' are uncon- hicular traffic; the simplicity of transition from inference for
nected, then P{i' = O. If they are connected, and supposing Poisson variables (as in this article) to inference for Poisson
(without loss of generality) that i <ji', then a conditional processes; and the linearity and positivity in A of the first
probability argument gives the following factorization: two moments, which facilitate a simple estimation proce-
dure. The method of weighted moment equations of Section
(46) 4 addresses estimation under deviations from the Poisson
assumption and can also be viewed as a regularization tech-
where i = (i l , i 2), i' = (i~, i~), and j = (jl,j2). Here is pi nique for the Poisson modelestimation. Clearly, however,
as defined before (33), and Pi~i2,j2) is the probability that a there are situations of network data where Poisson-related
packet with address (i 2 , j2) (starting at node iz and ending at models are totally inappropriate (see, for example, Leland et
node j2) passes through link i', under the transition proba- al. 1993). Such cases are governed by different probability
bilities of thenetwork with the original address j = (jI, j2). models, and, accordingly, different estimation procedures,
tailored to the specific models, would need to be derived.
Both pi and Pi~i2,j2) can be calculated using the method
given in (45). 6.3 On Applications to Road Planning
5.1.5 Putting it Together. Expressions (45) and (35) Knowing the node-to-node traffic intensity is a great de-
together give the mean and variance of the .Yi 's, and ex- sign tool for traffic engineers and urban planners, as it is
pressions (46) and (37) together give the covariances of the essential for deciding which nodes need to be linked di-
Yi's. For estimation, we compare these quantities to the rectly to reduce network congestiono Historically, road side
empirical moments, based on repeated measurements at the interviews and other types of surveys have been used for es-
links (exactly as in Seco 3),and treat the resulting system of timáting the SD traffic intensities in transportation studies.
linear equations as a LININPOS problem. Much of the dis- In recent years more algebraic and statistical methods based
cussion in Sections 1-4, which relates to implementation on link data have been proposed. See Sherali, Sivanandan,
of the method in connection with bad equations, regular- and Hobeika (1994), Bell (1991), and the references therein,
ization toward the Poisson model via reduced weights to for important recent work and current practices. A forth-
second-moment equations, and so on, is aH relevant to the coming paper will discuss our methodology in the context
random-routing case. of these applications, and its relation to the methods pro-
posed in this field.
6. CONCLUSIONS ANO FINAL REMARKS
6.1 On the Problem (Formulation and Methodology) 6.4 On an Early Interest in Networks
The problem of estimating the node-to-node traffic in- Finally, Wf; would like to include earlier references (sug-
tensity from repeated measurements of traffic on the links gested by W. Whitt and M. Segal) to a related problem.
of a network is formulated and discussed under Poisson Kruithof (1937) posed the problem of projecting from mea-
assumptions. The MLE and related approximations are dis- sured node-to-node teletraffic data to sorne future values,
cussed, and numerical difficulties are pointed out. A detailed based on estimates of total originating and terminating traf-
methodology is presented for the method of moments. For fic only. Krupp (1979) presented Kruithof's method, treated
the Poisson model, estimates based on the method of mo- it with modern mathematical tools, and related it to more
ments are consistent and asymptotically normal. The esti- recent mathematical concepts, such as Kullback-Leibler in-
376 Journal 01 the American Statistical Association, March 1996

formation divergence. Kruithof's method can be consid- = Xc = O. Furthermore, if a solution has Xj = 1, then
ered a dual of the EM algorithm for LININPOS problems, it must also have Xl = ... = Xj-l = O, because for such a
as it also minimizes a Kullback-Leibler criterion function solution, Sj = Sj + E~:: xis«, and hence Xl = ... = Xj-l = O
for such problems, but it does the minimization with re- too. In other words, the only feasible solution of A (j) = Ax with
spect to the first argument of the Kullback-Leibler func- Xj = 1 is x = lij • Therefore, if A (j) = Ax has any solution

tion, .whereas the EM does it with respect to the second in addition to x = lij , then it must be a vector of zeros and ls
argument (see also Byrne 1993 for a related analysis.) and have a tail of zeros starting at position j or earlier, so that
Xj = Xj+l = ... = Xc = o. Let xi , ... ,Xk be all such distinct
APPENDIX: PROOF OF LEMMA OF SECTION 2 feasible solutions to A (j) = Ax, and let b, ~ {I, ... ,j - 1} index
the coordinates with value 1 of x., i = 1, ... ,k. (For example, if
Let x¡ = (1, 0,1, O... O) and X2 = (1,1, O... O), then b: = {1, 3} and
r
ba = {1, 2}.) We then have .
Sj = L A~j) = number of l's in column A (j). (A.1) Pv(A(j) lA) Px(lij or x¡ or X2 ... or
i=l
Without loss of generality, assume that the columns of A are ar- xkl A) = e-}JAi(Aj + rr Ai + rr Ai + ... + rr Ai) .
ranged monotonically in Sj, 1 :::; SI :::; ... :::; s-: To prove identifi- iEbl iEb2 iEbk ~
ability, we need to show that if
(Note that lij and xi , ... ,Xk are all distinct.) Similarly,
Pv(yIA) = Pv(yIX) for a set of y's of measure 1(A), (A.2)
then necessarily aA = X. The proof has three steps. In Step 1 we
Pv(A(j) IX) = e-}J'xi(Aj + rr + rrAi
iEb2
Ai + ... + rr
iEbk
Ai) .
show that (A.2) implies E Ai = E .xi, in Step 2 we show that iEbl

it implies Al = .xl, and in Step 3 we use induction to complete It follows from (A.2), (A.3), and the induction assumption (A.5)
the proof. In all threesteps we calculate the probabilities of the that
possible solution vectors, x, to the equations y = Ax under the
Poisson model, . (A.6)

Step 1: Take y = O (vector of all zeros). Then, because the


components of x mustbe natural numbers (because the x/s are where the last equality follows from the induction assumption
Poisson variables) and because A has no zero columns, the only because
feasible solution of y = Ax is x = o. Combining this with (A.2), bl ~ {1, ... ,j - 1}, l = 1, ... , k.
we have e-}JAi = Pv(OIA) = Pv(OIX) = e-}J'xi, so that This completes the proof of the induction step and of the lernma.

(A.3) [Received January 1994. Revised May 1995.}

Step 2: Now take y = A (1) and consider again the solution, REFERENCES
in x, to y = Ax. Clearly, x = lil == (1, O, ... ,O)' is a solution. Becker, R. A., and Wilks, A. R. (1993), "Network Traffic Analysis,"
"Io see that this is the unique solution, suppose (by negation) that AT&T Bell Labs technical report.
there is another solution. Because A (1) is a vector of zeros and Bell, M. G. (1991), "The Estimation of Origin-Destination Matrices by
1s, so must be any feasible solution x. If the first coordinate of Constrained Generalised Least Squares," Transportation Research B,
x is 1, then the remaining must all be zeros, because we have 25B, pp. 13-22.
A (1) = E~=l xiA (i), so that SI = E~=l xis, = SI + E~=2 xis«, Byrne, C. L. (1993), "Iterative Image Reconstruction AIgorithms Based
on Cross-Entropy Minimization," IEEE Transactions on Imaging Pro-
and so X2 = ... = Xc = O. If the first coordinate of x is zero,
cesses, 2, 96-103.
then we get SI = E~=2 xis«, and because SI :::; Si, i = 2, ... ,c, Dempster, A. P., Laird, N. M., and Rubin,D. B. (1977)~ "Maximum Like-
we can have at most one xi(2 :::; i :::; c) equal to 1, but because x lihood from Incomplete Data via the EM AIgorithm" (with discussion),
is a solution to A (1) = Ax, it would imply A (1) = A (i) for sorne Journal of the Royal Statistical Society, Ser. B, 39, 1-38.
i E {2, ... ,c}, contradicting the assumption that all the columns Green, P. J. (1990a), "Bayesian Reconstructions From Emission Tomog-
of A are different. Thus we have raphy Data Using a ModifiedEM AIgorithm," IEEE Transactions in
MedicalImaging, 9,84-93.
Py(A(1) lA) = Px(8 l IA) = e-}JAiAl - - (1990b), "On Use of the EM AIgorithm for Penalized Likelihood
Estimation," Journal of the Royal Statistical Society, Ser. B, 52, 443-
and, similarly, Pv(A (1) IX) = e-}J'xi.xl. It then follows from (A.2) 452.
and (A.3) that He1fes, H., and Lucantoni, D. M. (1986), "A Markov Modulated Charac-
terization of Packetized Voice and Data Traffic and Related Statistical
Al = .xi. (A.4) Multiplexer Performance," IEEE Journal on Selected Areas in Commu-
nications, 4, 856-868.
Step 3 (induction step): Suppose that Iusem, A. N., and Svaiter, B. F. (1994), "A New Smoothing-Regularization
Approach for a Maximum-Likelihood Estimation Problem," Applied
Ai=.xi, i=1, ... ,j-1, forany jE{2, ... ,c}. (A.5) Mathematics & Optimization: An International Journal, 29, 225-241.
Kruithof, J. (1937), "Telefoonverkeersrekening," De Ingenieur, 52, no. 8,
We want to show that Aj = .xj. Take y = A (j) and consider again
E15-E25.
the equation y = Ax. Clearly, x = lij == (O, ... , O, 1, O, ... ,O)'
Krupp, R. S. (1979), "Properties of Kruithof's Projection Method," The
(1 at position i. and zero elsewhere) is a solution. Also, because Bell Systems Technical Journal, 58, 517-538~
A (j) is a vector of zeros and ls, and because the x/s are Poisson Leland, W. E., Taqqu, M. S., Willinger, W., and Wilson, D. V. (1993), "On
variables, any feasible solution must be a vector of zeros and 1s. the Self-Similar Nature of Ethernet Traffic," presented at SIGCOMM
For similar reasons as in Step 2, any solution must have x j + 1 '93.
Vardi: Network Tomography 377

Roeder, K., Devlin, B., and Lindsay, B. G. (1989), "Applieation oí"Maxi- tieular Referenee to Stereology and Emission Tomography" (with dis-
mum Likelihood Methods to Population Genetie Data for the Estimation eussion), Journal of the Royal Statistical Society, Ser. B, 52, 271-324.
of Individual Fertilities," Biometrics, 45, 363-379. Snyder, D. L., Sehultz, T. 1., and O'Sullivan, 1. A. (1992), "Deblurring Sub-
Sherali, H. D., Sivanandan, R., and Hobeika, A. G. (1994), "A Linear Pro- jeet to Nonnegativity Constraints," IEEE Transactions on Signal Pro-
gramming Approaeh for Synthesizing Origin-Destination Trip Tables cesses, 40, 1143-1150.
from Link Traffie Volumes," Transportation Research B, 28B, pp. 213- Vardi, Y., and Lee, D. (1993), "From Image Deblurring to Optimal In-
233. vestments: Maximum Likelihood Solutions for Positive Linear Inverse
Shepp, L. A., andVardi, Y. (1982), "Máximum Likelihood Reeonstruetion Problems" (with diseussion), Journal of the Royal Statistical Society,
for Emission Tomography," IEEE Transactions on Medical Imaging, 1, Ser. B, 55, 569-612.
113-122. Vardi, Y., Shepp, L. A., and Kaufman, L. (1985), "A Statistieal Model
Silverman, B. W., Jones,M. C., Wilson, J. D., and Nyehka, D. W. (1990), for Positive Emission Tomography," (with eomments), Journal of the
"A Smoothed EM Approaeh to Indireet Estimation Problems, With Par- American Statistical Association, 80, 8-37.

You might also like