Professional Documents
Culture Documents
Abstract Properties of multidimensional Poisson point processes (PPPs) are discussed using a constructive approach readily accessible to a broad audience. The
processes are defined in terms of a two-step simulation procedure, and their fundamental properties are derived from the simulation. This reverses the traditional
exposition, but it enables those new to the subject to understand quickly what PPPs
are about, and to see that general nonhomogeneous processes are little more conceptually difficult than homogeneous processes. After reviewing the basic concepts
on continuous spaces, several important and useful operations that map PPPs into
other PPPs are discussedthese include superposition, thinning, nonlinear transformation, and stochastic transformation. Following these topics is an amusingly
provocative demonstration that PPPs are inevitable. The chapter closes with a
discussion of PPPs whose points lie in discrete spaces and in discrete-continuous
spaces. In contrast to PPPs on continuous spaces, realizations of PPPs in these
spaces often sample the discrete points repeatedly. This is important in applications
such as multitarget tracking.
Keywords Event space Intensity function Orderly PPP Realizations
Likelihood functions Expectations Random sums Campbells Theorem Characteristic functions Superposition Independent thinning Independent scattering
Poisson gambit Nonlinear transformations Stochastic transformations PPPs on
discrete spaces PPPs on discrete-continuous spaces
Readers new to PPPs are urged to read the first four subsections below in order.
After that, they are free to move about the chapter as their fancy dictates. There is
a lot of information here. It cannot be otherwise for there are many wonderful and
useful properties of PPPs.
1 What he really said [27]: It can scarcely be denied that the supreme goal of all theory is to make
the irreducible basic elements as simple and as few as possible without having to surrender the
adequate representation of a single datum of experience.
11
12
The emphasis throughout the chapter is on the PPP itself, although applications are alluded to in several places. The event space of PPPs and other finite
point processes is described in Section 2.1. The concept of intensity is discussed
in Section 2.2. The important concept of orderliness is also defined. PPPs that are
orderly are discussed in Sections 2.3 through 2.11. PPPs that are not orderly are
discussed in the last section, which is largely devoted to PPPs on discrete and
discrete-continuous spaces.
j = 1, . . . , n .
(2.1)
The event space is clearly very much larger in some sense than the space S in which
the individual points reside.
x j R,
2.2 Intensity
Every PPP is parameterized by a quantity called the intensity. Intensity is an intuitive
concept, but it takes different mathematical forms depending largely on whether the
state space S is continuous, discrete, or discrete-continuous. The continuous case
2.3
Realizations
13
(s) ds <
(2.2)
for all bounded subsets R of S, i.e., subsets contained in some finite radius mdimensional sphere. The sets R includeprovided they are boundedconvex sets,
sets with holes and internal voids, disconnected sets such as the union of disjoint
spheres, and sets that are interwoven like chain links.
The intensity function (s) need not be continuous, e.g., it can have step discontinuities. The only requirement on (s) is the finiteness of the integral (2.2).
The special case of homogeneous PPPs
on S = Rm with R = S shows that the
inequality (2.2) does not imply that S (s) ds < . Finally, in physical problems,
the integral (2.2) is a dimensionless number, so (s) has units of number per unit
volume of Rm .
The intensity for general PPPs on the continuous space S takes the form
D (s) = (s) +
w j (s a j ) ,
s S,
(2.3)
j=1
where ( ) is the Dirac delta function and, for all j, the weights w j are nonnegative
and the points a j S are distinct: ai = a j for i = j. The intensity D (s) is not a
function in the strict meaning of the term, but a generalized function. It is seen in
the next section that the PPP corresponding to the intensity D (s) is orderly if and
only if w j = 0 for all j; equivalently, a PPP is orderly if and only if the intensity
D (s) is a function, not a generalized function.
The concept of orderliness can be generalized so that finite point processes other
than PPPs can also be described as orderly. There are several nonequivalent definitions of the general concept, as discussed in [118]; however, these variations are not
used here.
2.3 Realizations
The discussion in this section and through to Section 2.11 is implicitly restricted to
orderly PPPs, that is, to PPPs with a well defined intensity function on a continuous
space S Rm . Realizations and other properties of PPPs on discrete and discretecontinuous spaces are discussed in Section 2.12.
Realizations are conceptually straightforward to simulate for bounded subsets
of continuous spaces S Rm . Bounded subsets are windows in which PPP
14
realizations are observed. Stipulating a window avoids issues with infinite sets; for
example, realizations of homogeneous PPPs on S = Rm have an infinite number
of points but only a finite number in any bounded window.
Every realization of a PPP on a bounded set R is an element of the event space
E(R). The realization therefore comprises the number n 0 and the locations
{x1 , . . . , xn } of the points in R.
A two-step procedure, one step discrete and the other continuous, generates (or,
simulates) one realization E(R) of a nonhomogeneous PPP with intensity (s)
on a bounded subset R of S. The procedure also fully reveals the basic statistical
structure of the PPP. If R (s) ds = 0, is the trivial event. If R (s) ds > 0,
the realization is obtained as follows:
Step 1. The number n 0 of points is determined by sampling the discrete
Poisson random variable, denoted by N , with probability mass function given
by
p N (n)
(s) ds
n!
n
e
(s) ds
(2.4)
(s)
R (s) ds
for s R .
(2.5)
The output is the ordered pair o = (n, (x1 , . . . , xn )). Replacing the ordered
n-tuple (x1 , . . . , xn ) with the set {x1 , . . . , xn } gives the PPP realization =
(n, {x1 , . . . , xn }).
The careful distinction between o and is made to avoid annoying, and sometimes confusing, problems later when order is important. For example, it is seen in
Section 2.4 that the pdfs (probability density functions) of o and differ by a factor
of n! . Also, the points {x1 , . . . , xn } are i.i.d. when conditioned on the number n of
points. The conditioning on n is implicit in the statement of Step 2.
For continuous spaces S Rm , an immediate consequence of Step 2 is that the
points {x1 , . . . , xn } are distinct with probability one: repeated elements are allowed
in theory, but in practice they never occur (with probability one). Another way to
say this is that the list, or multiset, {x1 , . . . , xn } is a set with probability one. The
statement fails to hold when the PPP is not orderly, that is, when the intensity (2.3)
has one or more Dirac delta function components. It also does not hold when the
state space S is discrete or discrete-continuous (see Section 2.12).
An acceptance-rejection procedure (see, e.g., [56]) is used to generate the i.i.d.
samples of (2.5). Let
2.3
Realizations
15
p X (s)
,
sR g(s)
= max
(2.6)
where g(s) > 0 is any bounded pdf on R from which i.i.d. samples of R can
be generated via a known procedure. The function g( ) is called the importance
function. For each point x with pdf g, compute t = p X (x)/( g(x)). Next, generate
a uniform variate u on [0, 1] and compare u and t: if u > t, reject x; if u t, accept
it. The accepted samples are distributed as p X (x).
The acceptance-rejection procedure is inefficient for some problems, that is, large
numbers of i.i.d. samples from the pdf (2.5) may be drawn before finally accepting
n samples. As is well known, efficiency depends heavily on the choice of the importance function g( ). Table 2.1 outlines the overall procedure and indicates how
the inefficiency can occur. If inefficiency is a concern, other numerical procedures
may be preferred in practice. Also, evaluating R (s) ds may require care in some
problems.
Table 2.1 Realization of a PPP with intensity (s) on bounded set R
Preliminaries
Select importance function g(s) > 0, s R
Set efficiency scale ce f f = 1000 (corresponds to a 0.1% acceptance rate)
Step 1.
Compute = R (s) ds
Compute
= max (s)
g(s)
sR
Example 2.1 The two-step procedure is used to generate i.i.d. samples from a PPP
whose intensity function is nontrivially structured. These samples also show the
difficulty of observing this structure in small sample sets. Denote the multivariate
Gaussian pdf on Rm with mean and covariance matrix by
16
1
1
exp (s )T 1 (s ) .
N (s ; , ) =
2
det (2 )
(2.7)
a
+ b f (x, y),
64 2
0,
f (x, y) =
10
x
0
,
;
, 2
N
01
y
0
(2.8)
if 65 y 45
otherwise.
Fig. 2.1 Realizations of the PDF (2.9) of the intensity function (2.8) for = 1, a = 20, and
b = 80. Samples are generated by the acceptance-rejection method. The prominent horizontal
notch in the intensity is hard to see from the samples alone
2.4
Likelihood Function
17
For = 1, numerical integration gives the mean intensity = R (x, y)
dx dy = 92.25 , approximately. A pseudo-random integer realization of the Poisson
discrete variable (2.4) is n = 90, so 90 i.i.d. samples of the pdf (cf. (2.5))
p X (s) p(x, y) = (x, y) / 92.25
(2.9)
are drawn via the acceptance-rejection procedure with g(x, y) = 1/(64 2 ) . The
pdf (2.9) is shown as the 3-D plot in Fig. 2.1a and as a set of equispaced contours
in Fig. 2.1b, respectively Fig. 2.1c and 2.1d, show the 90 sample points with and
without reference to the intensity contours.
The horizontal notch is easily missed using these 90 samples in Fig. 2.1c. The
detailed structure of an intensity function can be estimated reliably only in special
circumstances, e.g., when a large number of realizations is available, or when the
PPP has a known parametric form (see Section 3.1).
(2.10)
n
p X (x j ) ,
(2.11)
j=1
where X is the random variable corresponding to a single sample point whose pdf
is (2.5). The n! in (2.11) arises from the fact that there are n! equally likely ordered
i.i.d. trials that generate the unordered set X . Substituting (2.4) and (2.11) into (2.10)
gives the pdf of evaluated at = (n, {x1 , . . . , xn }) E(R):
p ( ) = p N (n) pX |N ( {x1 , . . . , xn } | n)
n
n
(x j )
R (s) ds
n!
(s) ds
= exp
n!
(s) ds
R
j=1 R
n
(s) ds
(x j ) , for n 1.
= exp
R
j=1
(2.12)
18
pX |N (x1 , . . . , xn | n)
j=1
n
j=1
(x j )
.
R (s) ds
(2.13)
(2.14)
(2.15)
Let o = (n, (x1 , . . . , xn )). Using (2.15) and the definition of conditioning gives
p (o ) = p N (n) pX |N (x1 , . . . , xn | n)
n
1
(s) ds
(x j ) ,
=
exp
n!
R
(2.16)
for n 1.
(2.17)
j=1
This notation interprets arguments in the usual way, so it is easier to understand and
manipulate than (2.12). For example, the discrete pdf p N (n) of (2.4) is merely the
integral of (2.17) over x1 , . . . , xn , but taking the same integral of (2.12) requires an
additional thought to restore the missing n!.
The argument o in (2.17) is written simply as below. This usage may cause
some confusion, since then the left hand side of (2.17) becomes p ( ), which is
the same as the first equation in (2.12), a quantity that differs from it by a factor of
n!. A similar ambiguity arises from using the same subscript X | N on both sides of
(2.13). Context makes the intended meaning clear, so these abuses of notation will
not cause confusion.
In practice, when the number of points in a realization is very large, the points of
a PPP realization are often replaced by a smaller data set. If the smaller data set also
reduces the information content, the likelihood function obtained in this section no
longer applies. An example of a smaller data set (called histogram count data) and
its likelihood function is given in Section 2.9.1.
2.5 Expectations
Expectations are decidedly more interesting for point processes than for ordinary
random variables. Expectations are taken of real valued functions F defined on the
event space E(R), where R is a bounded subset of S. Thus F( ) evaluates to a real
2.5
Expectations
19
number for all E(R). The expectation of F( ) is written in the very general
form
E [F] E [F] =
F( ) p ( ),
(2.18)
E (R)
where the sum, properly defined, is matched to the likelihood function of the point
process. In the case of PPPs, the likelihood function is that of the two-step simulation procedure. The sum is often referred to as an ensemble average over all
realizations of the point process.
The sum is daunting because of the huge size of the set E(R). Defining the
expectation carefully is the first and foremost task of this section. The second is
to show that for PPPs the expectation, though fearsome, can be evaluated explicitly
for many functions of considerable application interest.
2.5.1 Definition
Let = (n, {x1 , . . . , xn }). For analytical use, it is convenient to rewrite the
function F( ) = F (n, {x1 , . . . , xn }) in terms of a function that uses an easily
understood argument list, that is, let
F (n, {x1 , . . . , xn }) F(n, x1 , . . . , xn ).
(2.19)
(2.20)
F(n, x1 , . . . , xn ) p (n, x1 , . . . , xn ).
(2.21)
(n, x1 , ..., xn )
The sum in (2.21) is an odd looking discrete-continuous sum that needs interpretation. The conditional factorization
p ( ) = p N (n) pX |N (x1 , . . . , xn | n)
20
p N (n)
n=0
(2.22)
R
The expectation is formidable, but it is not as bad as it looks. Its inherently straightforward structure is revealed by verifying that E[F] = 1 for F(n, x1 , . . . , xn ) 1.
The details of this trivial exercise are omitted.
The expectation of non-symmetric functions is undefined. The definition is
extendedformallyto general functions, say G(n, x1 , . . . , xn ), via its symmetrized version:
G Sym (n, x1 , . . . , xn ) =
1
n!
(2.24)
Sym(n)
The expectation of G is defined by E[G] = E G Sym . This definition works
because G Sym is a symmetric function of its arguments, a fact that is straightforward to verify. The definition is clearly compatible with the definition for symmetric
functions since G Sym (n, x1 , . . . , xn ) G(n, x1 , . . . , xn ) if G is symmetric.
The expectation is defined by (2.23) for any finite point process with events in
E (R), not just PPPs. For PPPs and other i.i.d. finite point processes (such as BPPs),
pX |N (x1 , . . . , xn | n) =
n
p X (x j ),
(2.25)
j=1
n=0
p N (n)
F(n, x1 , . . . , xn )
n
p X (x j ) dx1 dxn .
j=1
(2.26)
PPPs are assumed throughout the remainder of this chapter, so the discrete probability distribution p N (n) and pdf p X (x) are given by (2.4) and (2.5).
The expected number of points in R is E[N (R)]. When the context clearly
identifies the set R, the expectation is written simply as E[N ]. By substituting
F(n, x1 , . . . , xn ) n into (2.26) and observing that the integrals all integrate
2.5
Expectations
21
E[N ]
n p N (n)
n=0
=
(s) ds .
(2.27)
(n E[N ] )2 p N (n)
n=0
=
(s) ds.
(2.28)
The explicit sums in (2.27) and (2.28) are easily verified by direct calculation using
(2.4).
f (X j )
(2.29)
j=1
f (x j ) ,
for n 1 ,
(2.30)
j=1
and, for n = 0, by F(0, ) 0. The special case of (2.30) for which f (x) 1
reduces to F(n, x1 , . . . , xn ) = n, the number of points in R. The mean of F is
given by
22
E[F] = E
f (X j )
(2.31)
j=1
=
f (x) (x) dx .
(2.32)
p N (n)
n=0
n
j=1 R
f (x j )
n
p X (x j ) dx1 dxn .
j=1
p N (n) n
n=0
f (x j ) p X (x j ) dx j .
g(x j ) ,
(2.33)
j=1
where g(x) is a real valued function. Then the expected value of the product is
E[F G] =
R
f (x) (x) dx
g(x) (x) dx +
Before verifying this result in the next paragraph, note that since the means of F
and G are determined as in (2.32), the result is equivalent to
cov[F, G] E [(F E[F]) (G E[G])]
=
f (x) g(x) (x) dx.
R
(2.35)
(2.36)
2.6
Campbells Theorem2
23
The special case f (x) 1, (2.36) reduces to the variance (2.28) of the number of
points in R.
The result (2.34) is verified by direct evaluation. Write
F( ) G( ) =
f (xi ) g(x j ) +
i, j=1
i= j
f (x j ) g(x j ).
(2.37)
j=1
The second term in (2.37) is (2.30) with f (x j ) g(x j ) replacing f (x j ), so its expectation is the second term of (2.34). The expectation of the first term is evaluated in
much the same way as (2.32); details are omitted. The identity (2.34) is sometimes
written
N
f
(X
)
g(X
)
=
f
(x)
(x)
dx
g(x) (x) dx .
(2.38)
E
i
j
i, j=1
i= j
f (x) (x) dx
R
g T (x) (x) dx +
24
Under mild regularity conditions, Campbells Theorem says that when is purely
imaginary
F
= exp
e
E e
R
f (x)
1 (x) dx ,
(2.40)
where f (x) is a real valued function. The expectation exists for any complex for
which the integral converges. It is obtained by algebraic manipulation. Substitute
the explicit form (2.17) into the definition of expectation and churn:
F
=
p N (n)
E e
R
n=0
= e
(s) ds
= e
= e
(s) ds
(s) ds
n
j=1
f (x j )
n
n! R
R
n=0
f (x j )
(x j )
j=1
dx1 dxn
n
1
f (s)
e
(s) ds
n!
R
n=0
f (s)
exp
e
ds .
R
(2.41)
The last expression is obviously equivalent to (2.40). See [49, 57, 63] for further
discussion.
The characteristic
function of F is given by (2.40) with = i, where is
real and i = 1, and R = R. The convergence of the integral requires that the
Fourier transform of f exist as an ordinary function, i.e., it cannot be a generalized
function. As is well known, the moment generating function is closely related to the
characteristic function [93, Section 7.3]. Expanding the exponential gives
()2 F
+
E ei F = E 1 + i F + (i)2
2!
= 1 + i E[F] + (i)2
()2
E[F 2 ] + ,
2!
(2.42)
2.6
Campbells Theorem2
25
ei 1
E ei 1 F + i 2 G = exp
f (x) + i 2 g(x)
1 (x) dx .
(2.43)
To see this, simply use 1 f (x) + 2 g(x) in place of f (x) in (2.40). An immediate by-product of this result is an expression for the joint moments of F and G.
Expanding (2.43) in a joint power series and assuming term by term integration is
valid gives
E ei 1 F + i 2 G
(i 1 )2 2
(i 1 )(i 2 )
(i 2 )2 2
= E 1 + i 1 F + i 2 G +
F +
FG +
G +
2!
2!
2!
= 1 + i 1 E[F] + i 2 E[G]
+
(i 1 )(i 2 )
(i 1 )2
(i 2 )2
E[F 2 ] +
E[F G] +
E[G 2 ] + ,
2!
2!
2!
where terms of order larger than two are omitted. Taking partial derivatives gives the
joint moment of order (r, s) as
E F r G s = (i)r +s
r s
E ei 1 F + i 2 G
.
s
r
1 = 2 = 0
1 2
(2.44)
In particular, a direct calculation for the case r = s = 1 verifies the earlier result
(2.34).
The form (2.40) of the characteristic function also characterizes the PPP; that is,
a finite point process whose expectations of random sums satisfies (2.40) is necessarily a PPP. The details are given in the next subsection.
F( ) =
f (X j ) ,
n 1,
(2.45)
j=1
26
e f (x) 1 (x) dx
E eF = exp
R
(2.46)
for some nonnegative function (x). The goal is to show that (2.46) implies that
the finite point process is necessarily a PPP with intensity function (x). This is
done by showing that satisfies the independent scattering property for any finite
number k of sets A j such that S = kj=1 A j and Ai A j = for i = j.
Consider a nonnegative function f with values f 1 , f 2 , . . . , f k on the specified
sets A1 , A2 , . . . , Ak , respectively, so that
A j = {x : f (x) = f j } .
Let
mj =
(x) dx .
Aj
k
e f j 1 m j .
E eF = exp
(2.47)
j=1
Observe that
N
f (X j )
j=1
f j N (A j ) ,
(2.48)
j=1
where N (A j ) is the number of points in A j . For the given function f , the assumed
identity (2.46) is equivalent to
k
E e j=1
f j N (A j )
k
= exp
e f j 1 m j .
(2.49)
j=1
k
j=1
N (A )
zj j
k
em j (z j 1) .
(2.50)
j=1
By varying the choice of function values f j 0, the result (2.50) is seen to hold
for all z j (0, 1).
The joint characteristic function of several random variables is the product
of the individual characteristic functions if and only if the random variables are
2.6
Campbells Theorem2
27
independent [93], and the characteristic function of the Poisson distribution with
mean m j is (in this notation) em j (z j 1) . Therefore, the counts N (A j ) are independent and Poisson distributed with mean m j . Since the sets A j are arbitrary, the finite
point process is a PPP.
The class of functions for which the identity (2.46) holds must include the class
of all nonnegative functions that are piecewise constant, with arbitrarily specified
values f j , on an arbitrarily specified finite number of disjoint sets A j . The discussion here is due to Kingman [63].
(2.51)
= E exp
log f (X j )
= E
j=1
N
f (X j ).
(2.52)
j=1
The probability generating functional is the analog for finite point processes of the
probability generating function for random variables.
The Laplace and probability generating functionals are defined for general finite
point processes , not just PPPs. If is a PPP with intensity function (x), then
G ( f ) = exp
( f (x) 1) (x) dx .
(2.53)
28
2.7 Superposition
A very useful property of independent PPPs is that their sum is a PPP. Two PPPs
on S are superposed, or summed, if realizations of each are combined into one
event. Let and denote these PPPs, and let their intensities be (s) and (s).
If (m, {x1 , . . . , xm }) and (n, {y1 , . . . , yn }) are realizations of and , then the
combined event is (m + n, {x1 , . . . , xm , y1 , . . . , yn }). Knowledge of which points
originated from which realization is assumed lost.
The combined event is probabilistically equivalent to a realization of a PPP
whose intensity function is (s) + (s). To see this, let = (r, {z 1 , . . . , zr })
E(R) be an event constructed in the manner just described. The partition of this
event into an m point realization of and an r m point realization of is
unknown. Let the sets Pm and its complement Pmc be such a partition, where
Pm Pmc = {z 1 , . . . , zr }. Let Pm denote the collection of all partitions of size m.
There are
r
r!
m
m!(r m)!
partitions in Pm . The partitions in Pm are equally likely, so the likelihood of is the
sum over partitions:
p( ) =
1
r
p (m, Pm ) p r m, Pmc .
m=0
Pm Pm
r
e
(z)
(z) ,
p( ) =
r!
c
m=0 Pm Pm
zPm
zPm
where R ((s) + (s)) ds. The double sum in the last expression is recognized (after some thought) as an elaborate way to write an r -term product. Thus,
p( ) =
r
e
((z i ) + (z i )) .
r!
(2.54)
i=1
Comparing (2.54) to (2.12) shows that p( ) is the pdf of a PPP with intensity function given by (s) + (s).
More refined methods that do not rely on partitions show that superposition holds
for a countable number of independent PPPs. The intensity of the superposed PPP
is the sum of the intensities of the constituent PPPs, provided the sum converges.
For details, see [63].
2.7
Superposition
29
The Central Limit Theorem for sums of random variables has an analog for point
processes called the Poisson Limit Theorem: the superposition of a large number
of uniformly sparse independent point processes converges in distribution to a
homogeneous PPP. These point processes need not be PPPs. The first statement and
proof of this difficult result dates to the mid-twentieth century. For details on R1 ,
see [62, 92]. The Poisson Limit Theorem also holds in the multidimensional case.
For these details, see [15, 40].
Example 2.2 The sum of dispersed unimodal intensities is sometimes unimodal.
Consider the intensity function
c (x, y) =
i{1,0,1} j{1,0,1}
cN
10
x
i
,
;
, 2
01
y
j
(2.55)
Fig. 2.2 Superposition of an equispaced grid of nine PPPs with circular Gaussian intensities (2.55)
of equal weight and spread, = 1. Samples from the PPP components are generated independently and superposed to generate samples from the unimodal flat-topped intensity function
30
(2.56)
To see this, consider first the special case that (x) is constant on R. Let
=
(x) dx,
(x) dx,
Pr[m | n] =
n
m (1 )nm ,
m
m n.
2.8
Pr[m] =
=
31
n
m (1 )nm Pr[n]
m
n=m
n=m
(2.57)
n!
n
m (1 )nm
e
m!(n m)!
n!
( )m
((1 ) )nm
e
m!
(n m)!
n=m
m
( )m
e
e .
m!
m!
.
=
(x) dx, =
(x) dx, =
R
R
The probability that has m points after thinning is, by the preceding argument,
Poisson distributed with mean . The samples x1
, . . . , xm
are i.i.d., and their
pdf on R
is (x)/ . Now extend the intensity function from R
to all R by
setting it to zero outside the cell. Superposing these cell-level PPPs and taking the
limit as cell size goes to zero shows that (x) is the intensity function on the full
set R. Further details are omitted.
An alternative demonstration exploits the acceptance-rejection method. Generate
a realization of the PPP with intensity function (x) from the homogeneous PPP
with intensity
function = maxxR (x). Redefine = R (x) dx, and let
|R| = R dx. The probability that no points remain in R after thinning by (x) is
v(R) =
n=0
n
(s)
1
1 (s)
ds
R |R|
n=0
n
n |R|n
|R|
1
= e
n!
|R|
=
e|R|
n |R|n
n!
n=0
= e .
The void probabilities v(R) for a sufficiently large class of test sets R characterize
a PPP, a fact whose proof is unfortunately outside the scope of the present book.
32
(A clean, relatively accessible derivation is given in [136, Theorem 1.2].) Given the
result, it is clear that the thinned process is a PPP with intensity function (x).
Example 2.3 Triple Thinning. The truncated and scaled zero-mean Gaussian intensity function on the rectangle [2, 2 ] [2, 3 ],
c (x, y) = c N (x ; 0, 2 )N (y ; 0, 2 ),
is depicted in Fig. 2.3a for c = 2000 and = 1. Its mean intensity (i.e., the integral of 2000 over the rectangle) is 0 = 1862.99. Sampling the discrete Poisson
variate with mean 0 gives, in this realization, 1892 points. Boundary conditions
are imposed by the thinning functions
Fig. 2.3 Triply thinning the Gaussian intensity function by (2.58) for = 1 and c = 2000 yields
samples of an intensity with hard boundaries on three sides
2.9
Declarations of Independence
33
1 (x, y) = 1 ey
if y 0
2 (x, y) = 1 e
x 2
if x 2
3 (x, y) = 1 e
x 2
if x 2,
(2.58)
where j (x, y) = 0 for conditions not specified in (2.58). The overall thinning
function, 1 2 3 , is depicted in Fig. 2.3b overlaid on the surface corresponding to
1 . The intensity of the thinned PPP, namely 1 2 3 2000 , is nonzero only on the
rectangle [2, 2 ] [0, 3 ]. It is depicted in Fig. 2.3c. Thinning the 1892 points
of the realization of 2000 leaves the 264 points depicted in Fig. 2.3d. These 264
points are statistically equivalent to a sample generated directly from the thinned
PPP. The mean thinned intensity is 283.19.
This name conveys genuine meaning in the point process context, but it seems of fairly recent
vintage [84, Section 3.1.2] and [123, p. 33]. It is more commonly called independent increments,
which can be confusing because the same name is used for a similar, but different, property of
stochastic processes. See Section 2.9.4.
34
and B R denote bounded subsets of R. The point processes (A) and (B)
are obtained by restricting realizations of to A and B, respectively. Simply put,
the points in (A) are the points of that are in A R, and the same for (B).
This somewhat obscures the fact that the realizations A and B are obtained from
the same realization . Intuition may suggest that constructing A and B from the
very same realization will force the point processes (A) and (B) to be highly
correlated in some sense. Such intuition is in need of refinement, for it is incorrect.
This is the subtlety mentioned above.
Let denote an arbitrary realization of a point process (A B) on the set A B.
The point process (A B) is an independent scattering process if
p (AB) ( ) = p (A) ( A ) p (B) ( B ) ,
(2.59)
for all disjoint subsets A and B of R, that is, for all subsets such that A B = .
The pdfs in (2.59) are determined by the specific character of the point process,
so they are not in general those of a PPP. The product in (2.59) is the reason the
property is called independent scattering.
A nonhomogeneous multidimensional PPP is an independent scattering point
process. To see this it is only necessary to verify that (2.59) holds. Define thinning
probability functions, (x) and (x), by
%
(x) =
1, if x A
0, if x
/ A
and
%
(x) =
1, if x B
0, if x
/ B.
The point processes (A) and (B) are obtained by -thinning and -thinning
realizations of the PPP (A B), so they are PPPs. Let (x) be the intensity
function of the PPP (A B). Let = (n, {x1 , . . . , xn }) be an arbitrary realization of (A B). The pdf of is, from (2.12),
p (AB) ( ) = e
AB
(x) dx
n
(x j ).
(2.60)
j=1
Because the points of the -thinned and -thinned realizations are on disjoint sets
A and B, the realizations A = (i, {y1 , . . . , yi }) and B = (n, {z 1 , . . . , z k }) are
necessarily such that i + k = n and
{y1 , . . . , yi } {z 1 , . . . , z k } = {x1 , . . . , xn }.
Because (A) and (B) are PPPs, the pdfs of A and B are
2.9
Declarations of Independence
p (A) ( A ) = e
35
A
(x) dx
i
(y j )
j=1
p (B) ( B ) = e
(x) dx
k
(z j ) .
j=1
The product of these two pdfs is clearly equal to that of (2.60). The key elements of
the argument are that the thinned processes are PPPs, and that the thinned realizations are free of overlap when the sets are disjoint. The argument extends easily to
any finite number of disjoint sets.
Example 2.4 Likelihood Function for Histogram Data. A fine illustration of the utility of independent scattering is the way it makes the pdf of histogram data easy to
determine. Denote the cells of a histogram by R1 , . . . , R K , K 1. The cells are
assumed disjoint, so R j R j = for i = j. Histogram data are nonnegative
integers that count the number of points of a realization of a point process that fall
within the various cells. No record is kept of the locations of the points within any
cell. Histogram data are very useful for compressing large volumes of sample (point)
data.
Denote the histogram data by n 1:K {n 1 , . . . , n K }, where n j 0 is the number
of points of the process that lie in R j . Let the point process be a PPP, and let
the PPP obtained by restricting to R j . The intensity function of
(R j ) denote
(R j ) is R j (s) ds. The histogram cells are disjoint. By independent scattering,
the PPPs (R1 ), . . . , (R K ) are independent and the pdf of the histogram data is
'
n j
(s)
ds
Rj
(s) ds
p (n 1:K ) =
n j!
Rj
j=1
n j
K
(s)
ds
Rj
= exp
,
(s) ds
n j!
R
j=1
K
&
exp
(2.61)
(2.62)
36
(2.63)
The point process has realizations in the event space E([0, 1]), but it is not a PPP
because of the way the points are sampled for n = 3.
For any c [0, 1], define the random variable
%
X c (x) =
1, if
0, if
x < c
x c.
(2.64)
The number of points in a realization of the point process in the interval [a, b]
conditioned on n points in [0, 1] is
G n (a, b, m) = Pr exactly m points of {x1 , . . . , xn } are in [a, b] .
(2.65)
m
n
n
=
E
X b (x j ) X a (x j )
X a (x j ) + X 1 (x j ) X b (x j ) .
m
j=1
j=m+1
(2.66)
For n = 3, the points are i.i.d. conditioned on n, so for all c j [0, 1]
E X c1 (x1 ) X cn (xn ) = F [c1 , . . . , cn ] = c1 cn .
(2.67)
2.9
Declarations of Independence
37
4 A gambit in chess involves sacrifice or risk with hope of gain. The sacrifice here is loss of control
over the number of Bernoulli trials, and the gain is independence of the numbers of different
outcomes.
38
The final product is the statement that the number of heads and tails are independent
Poisson distributions with the required parameters. For further comments, see, e.g.,
[52, Section 9.3] or [42, p. 48].
Example 2.6 Independence of Thinned and Culled PPPs. The points of a PPP that
are retained and those that are culled during Bernoulli thinning are both PPPs. Their
intensities are p(x)(x) and (1 p(x))(x), respectively, where p(x) is the probability that a point at x S is retained. Poissons gambit implies that the numbers
of points in these two PPPs are independent. Step 2 of the realization procedure
guarantees that the sample points are of the two processes are independent. The
thinned and culled PPPs are therefore independent, and superposing them recovers
the original PPP, since the intensity function of the superposition is the sum of the
component intensities. In other words, splitting a PPP into two parts using Bernoulli
thinning, and subsequently merging the parts via superposition recovers the original
PPP.
Example 2.7 Coloring Theorem. Replace the Bernoulli trials in Example 2.6 by
independent multinomial trials with k 2 different outcomes, called colors in
[63, Chapter 5], with probabilities { p1 (x), . . . , pk (x)}, where
p1 (x) + + pk (x) = 1 .
Every point x S of a realization of the PPP with intensity function (x) is
colored according to the outcome of the multinomial trial. For every color j, let j
denote the point process that corresponds to points of color j. Then j is a PPP,
and its intensity is
j (x) = p j (x) (x).
Poissons gambit and Step 2 of the realization procedure shows that the PPPs independent. The intensity of their superposition is
k
j=1
j (x) =
j=1
2.9
Declarations of Independence
39
not Poisson distributed for even one set R, then it is not an independent scattering
process, and hence not a PPP. To see this, a physics-style argument (due to Kingman
[63, pp. 910]) is adopted.
Given a set A = with no holes, or voids, define the family of sets At , t 0
by
At = aA x Rm : x a t ,
where is the usual Euclidean distance. Because A has no voids, the boundary
of At encloses the boundary of As if t > s. Let
pn (t) = Pr [N (At ) = n]
and
qn (t) = Pr [N (At ) n] ,
where N (At ) is the random variable that equals the number of points in a realization
that lie in At . The point process is orderly, so it is assumed that the function pn (t)
is differentiable. Let
(t) E [N (At )] .
Finding an explicit mathematical form for this expectation is not the goal here. The
goal is to show that
pn (t) = e(t)
n (t)
.
n!
40
qn (t) qn (t + h) = Pr [N (At ) = n] Pr N Ath = 1 [| N (At ) = n]
= Pr [N (At ) = n] Pr N Ath = 1
= pn (t) ((t + h) (t)) .
Dividing by h and taking the limit as h 0 gives
dqn (t)
d(t)
= pn (t)
.
dt
dt
(2.69)
d p0 (t)
d(t)
= p0 (t)
dt
dt
d
((t) + log p0 (t)) = 0 .
dt
(2.70)
where the last step follows from pn (t) = qn (t) qn1 (t). Multiplying both sides
by e (t) and using the product differentiation rule gives
d
d(t)
pn (t) e (t) = pn1 (t) e (t)
.
dt
dt
Integrating gives the recursion
pn (t) = e(t)
0
d(x)
dx .
dx
(2.71)
Solving the recursion starting with (2.70) gives pn (t) = e(t) n (t)/ n! , the Poisson density (2.4) with mean (t).
The class of sets without voids is a very large class of test sets. To see that
the Poisson distribution is inevitable for more general sets requires more elaborate
theoretical methods. Such methods are conceptually lovely and mathematically rigorous. They confirm but do not deepen the insights provided by the physics-style
argument, so they are not presented here.
2.9
Declarations of Independence
41
(2.72)
where
(t) =
( ) d ,
t t0 .
t0
42
R
(x) dx =
f (R)
)) f 1 (y) ))
) dy ,
(y) ))
y )
(2.73)
) 1 )
) f (y) )
)
)
) y ) .
(2.74)
1 1
A (y b) ,
| A|
(2.75)
2.10
Nonlinear Transformations
43
[100, Chapter 4]. Suppose that is a PPP with intensity function (x) > 0 for
all x S R1 , and let
y = f (x) =
(t) dt
for
< x < .
(2.76)
The point process f ( ) is a PPP with intensity one. To see this, use (2.74) to obtain
f 1 (y)
(x)
(y) =
=
= 1,
| f (x)/ x|
(x)
where the chain rule is used to show that | f 1 (x)/ x| = 1/| f (x)/ x|. An
alternative, but more direct way, to see the same thing is to observe that since f is
monotone, its inverse exists and the mean number of points in any bounded interval
[a, b] is
f 1 (b)
f 1 (a)
d f (x) =
dy = b a .
(2.77)
Therefore, f ( ) is homogeneous with intensity function (y) 1. Obvious modifications are needed to make this method work for (y) 0.
A scalar multiple of the mapping (2.76) is used in the well known algorithm
for generating i.i.d. samples of a one dimensional random variable via the inverse
cumulative density function. The transformation fails for Rm , m 2, because the
inverse function is a one to many mapping. For the same reason, nonhomogeneous
PPPs on spaces of dimension more than two do not transform to homogeneous ones
of the same dimension.
Transformations may alter all the statistical properties of the original PPP, not
just the PPP intensity function. For instance, in Example 2.9, because f ( ) is a
homogeneous PPP, the interval lengths between successive points of f ( ) are independent. (see Section 2.9.4.) However, the intervals between successive points of the
original nonhomogeneous PPP are not independent [63, p. 51]. In practice, it is
necessary to understand how the transformation affects all the statistical properties
deemed important in the application.
An important class of many to one mappings are the projections from Rm
to R , where m. Let map the point x = (1 , . . . , m ) Rm to the point
y = (x) = (1 , . . . , ) R . The set of all x Rm that map to the point y is
1 (y). This set is a continuous manifold in Rm . Explicitly,
1 (y) = {(1 , . . . , , +1 , . . . , m ) : +1 R, . . . , m R} .
Integrating over the manifold 1 (y) gives the intensity function
(2.78)
44
(1 , . . . , ) =
(1 , . . . , , +1 , . . . , m ) d+1 dm .
(2.79)
where dM(y) is the differential in the tangent space at the point f 1 (y) of the set
M(y). The special case of projection mappings provides the basic intuitive insight
into the nonlinear mapping property of PPPs. To see that the result holds requires
a more careful and mathematically subtle analysis than is deemed appropriate here.
See [63, Section 2.3] for further details.
In practice, the sets M(y) are commensurate for most nonlinear mappings. For
example, it is easy to see that the projections have this property. However, some
nonlinear functions do not. As the next example shows, the problem with forbidden
mappings is that they lead to intensities that are generalized functions.
Example 2.10 A Forbidden Nonlinear Mapping. The sets M(y) of the function f :
R2 R1 defined by
,
y = f (x1 , x2 ) =
0,
x12 + x22 1,
if x12 + x22 1
+
(x1 , x2 ) : x12 + x22 1 R2
2.10
Nonlinear Transformations
45
(0) =
1 dx1 dx2 =
M(0)
and
(y) =
1 d = 2(y + 1),
y > 0.
M(y)
This gives
(y) = (y) + 2(y + 1),
y 0,
x12 + x22
maps a PPP with intensity function (x1 , x2 ) on R2 to a PPP with intensity function
(y1 , y2 ) = y1 (y1 cos y2 , y1 sin y2 )
on the semi-infinite strip
{(y1 , y2 ) : y1 > 0, 0 y2 < 2 } .
(2.82)
If (x1 , x2 ) 1, then (y1 , y2 ) = y1 . From (2.79), the projection onto the range
y1 gives a PPP on [0, ) R1 with intensity function (y1 ) = 2 y1 , and the
projection onto the angle y2 is of infinite intensity on [0, 2 ]. Alternatively, if
1/2
(x1 , x2 ) = x12 + x22
, then (y1 , y2 ) 1. The projection onto range is
(y1 ) = 2 ; the projection onto angle is .
Historical Note. Example 2.11 is the two dimensional (cylindrical propagation)
version of Olbers famous paradox (1823) in astronomy. It asks, Why is the sky
dark at night? The argument is that if star locations form a homogeneous PPP
in R3 , at the time a seemingly reasonable model for stellar distributions, then an
easy calculation shows that the polar projection onto the unit sphere is a PPP with
infinite intensity. If stellar intensity falls off as the inverse square of distance (due
to spherical propagation), another easy calculation shows that the polar projection
still has infinite intensity. Resolving the paradox (e.g., by assuming the universe is
46
(y | x) (x) dx .
(2.83)
To see this, let R be any bounded subset of S. Let = R (s) ds and observe
that the likelihood of the transition event is, by construction,
2.11
Stochastic Transformations
p() =
47
m
j=1
m
m
e
(y j | x j )
(x j ) dx1 dxm
m! R
R
j=1
j=1
m
e
=
(y j | x j ) (x j ) dx j .
m!
R
j=1
m
e
(y j ) .
m!
(2.84)
j=1
Since
R
(y) dy =
(y | x) (x) dx dy
R R
=
(x) dx = ,
R
(2.85)
it follows from (2.12) that the transition Poisson process ( ) is also a PPP.
48
where h(x) is the measurement the sensor produces of a target at x in the absence
of noise, and the error w is a zero mean Gaussian distributed with covariance matrix
. The conditional pdf form of the very same equation is N (z | h(x), ). The pdf
form is general and not limited to additive noise, so it is used here. Because (z | x)
is a pdf,
T
(y | x) dy = 1
for every x S.
Now, as in the previous section, let = (m, {x1 , . . . , xm }) be the PPP realization and (x) the PPP intensity function. Each point x j is observed by a sensor. The
sensor generates a measurement z j T R , 1 for the target x j . The pdf of
this measurement is (y | x). In words, (z j | x j ) is the pdf of z j conditioned on x j .
Let = (m, {z 1 , . . . , z m }). Then is a realization of a PPP defined on the range
T of the pdf . To see this, it is only necessary to follow the same reasoning used to
establish (2.83). The intensity function of this PPP is
(y) =
(y | x) (x) dx , y T .
(2.86)
(2.87)
The errors in these measurements are assumed to be additive zero mean Gaussian
distributed with variances r2 and 2 , respectively. The measurement pdf conditioned on target state is therefore
(r, | x, y) = N
r ; (x 2 + y 2 )1/2 , r2 N ; arctan(x, y), 2 . (2.88)
2.11
Stochastic Transformations
49
x ; x0 , x2 N y ; y0 , y2 ,
(2.89)
(2.90)
Fig. 2.4 The predicted measurement PPP intensity function in polar coordinates of a Gaussian
shaped PPP intensity function in the x-y plane: x = y = 1, r = 0.1, = 0.15 (radians),
and c = 200, x0 = 6, y0 = 0
50
Figure 2.4a, b give the intensities (2.89) and (2.90), respectively. A realization of the
PPP with intensity function c (x, y) generated by the two step procedure is given
in Fig. 2.4c. Randomly perturbing each of these samples gives the realization in
Fig. 2.4d. The predicted intensity (r, ) is nearly Gaussian in the r - plane. If the
likelihood function (2.88) is truncated to the semi-infinite strip (2.82), the predicted
intensity (2.90) is also restricted to the semi-infinite strip.
2.12
51
p N j (n j ) = e
j j
n j!
n j 0.
(2.91)
j .
jR
In Step 1, the total number of samples n is drawn from the Poisson random variable
with parameter (R). In Step 2, these n samples, denoted by x j , are i.i.d. draws
from the multinomial distribution with pdf
%
(
j
: j R .
(R)
The integers x j range over the set of indices of the discrete points in R, but they are
otherwise unrestricted. The PPP realization is
= (n, {x1 , . . . , xn }).
Nothing prevents the same discrete point, say j R, from occurring more than
once in the list {x1 , . . . , xn }; that is, repeated samples of the points in R are permitted. The number n j of occurrences of j R as a point of the PPP realization
is a Poisson distributed random variable with parameter j and pdf (2.91). Because
of Poissons gambit, these Poisson variates are independent. The two definitions are
therefore equivalent.
The event space of PPPs on is
E(R) = {(0, )}
n=1 (n, {x1 , . . . , xn }) : x j R, j = 1, . . . , n .
(2.92)
Except for the small change in notation that highlights the indices x j , it is identical
to (2.1). The pdf of the unordered realization is
p ( ) = e
jR
jR
x j .
(2.93)
52
This is the discrete space analog of the continuous space expression (2.12). The
expectation operator is changed only in that integrals are everywhere replaced by
sums over the discrete points of R . The notions of superposition and thinning
are also unchanged.
The intensity functions of transition and measurement processes are similar to
(2.83) and (2.86), but are modified to accommodate discrete spaces. The transition
pdf ( j | i ) is now a transition matrix whose (i, j)-entry is the probability that
the discrete state i maps to the discrete state j . The intensity of the transition
process ( ) is
( j ) =
( j | i ) (i ) ,
(2.94)
( j | x) (x) dx ,
(2.95)
where (x) is the intensity function of a PPP, say , on the state space S. If the
conditioning variable takes values u in a discrete space U, the pdf ( j | u) is the
probability of j given u U and the measurement intensity vector is
( j ) =
( j | u) (u),
(2.96)
uU
where in this case (u) is the intensity vector of the discrete PPP defined on U. The
discrete-continuous case is discussed in the next section.
Example 2.13 Histograms. The cells {R j } of a histogram are probably the most
natural example of a set of discrete isolated points. Consider a PPP defined on
the underlying continuous space in which the histogram cells reside. Aggregating,
or quantizing, the i.i.d. points of realizations of into the nonoverlapping cells
{R j } and reporting only the total counts in each cell yields a realization of a PPP
on a discrete space with points j R j . The intensity vector of this discrete PPP,
call it H , are
j =
Rj
c (s) ds ,
2.12
53
where c (s) is the intensity function of . By the independent scattering, since the
histogram cells {R j } are disjoint, the number of elements in cell R j is Poisson
distributed with parameter j . The fact that the points j are, or can be, repeated in
realizations of the discrete PPP H hardly needs saying.
Concrete examples of discrete spaces occur in emission and transmission tomography. In these examples, the points in correspond to the individual detectors in a
detector array, and the number of occurrences of j in a realization is the number of
detected photons (or other particle) in the j-th detector. These topics are discussed
in Chapter 5.
R+
(2.97)
The event space of a PPP on the augmented space is E(R+ ). The event space E(R)
is a proper subset of E(R+ ).
Realizations are generated as before for the bounded sets R. For bounded sets
R+ , the integrals in (2.4) are replaced by the integrals over R+ as defined in (2.97);
otherwise, Step 1 is unchanged. Step 2 is modified slightly. If n is the outcome of
Step 1, then n i.i.d. Bernoulli trials with probabilities
()
() + R (s) ds
ds
R (s)
Pr[R] =
() + R (s) ds
Pr[] =
54
are performed. The number n() is the number of occurrences of in the realization. The number of i.i.d. samples drawn from R is n n().
The number n() is a realization of a random variable, denoted by N (), that is
Poisson distributed with parameter (). This is seen from the discussion in Section 2.9.2. The expected number of occurrences of is (). Also, probability of
repeated occurrences of is never zero. The possibility of repeated occurrences of
is important to understanding augmented PPP models for applications such as
multitarget tracking.
The probability that the list {x1 , . . . , xn } is a set is the probability that no more
than one realization of occurs in the n Bernoulli trials. Consequently, if () > 0,
the probability that the list {x1 , . . . , xn } is a set is strictly less than one. In augmented spaces, random finite sets are more accurately describes as random finite
lists.
The likelihood function and expectation operator are unchanged, except that the
integrals are over either R or R+ , as the case may be. Superposition and thinning
are unchanged. The intensity of the diffusion and prediction processes are also
unchanged from (2.83) and (2.86), except that the integrals are over S + .
It is necessary to define the transitions (y | ) and ( | y) for all y S,
as well as ( | ) = Pr[ | ]. The measurement, or data, likelihood function
L( | ) must also be defined. These quantities have natural interpretations in target
tracking.
Example 2.14 Tracking Interpretations. A one-point augmented space is used in
Chapter 6. The state is the hypothesis that no target is present in the tracking
region R, and the point x R is the hypothesis that a target is present with state x.
State transitions and the measurement likelihood function are interpreted in tracking
applications as follows:
(y | ) is the likelihood that the transition initiates a target at the point y R.
( | y) is the probability that the transition terminates a target at the point
y R.
( | ) is the probability that no target is present both before or after transition.
L( | ) is the likelihood that the data are clutter-originated, i.e., the likelihood
function of the data conditioned on the absence of a target in R.
Initiation and termination of target track is therefore an intrinsic part of the tracking
function when using a Bayesian tracking method (see Appendix C) on an augmented
state space S + .
As is seen in Chapter 6, augmented spaces play an important role in simplifying
difficult enumerations related to joint detection and tracking of targets. Only one
state is considered here, but there is no intrinsic limitation.
Example 2.15 Non-Orderly PPPs. The intensity of general PPPs on a continuous
space S is given in (2.3) as the sum of an ordinary function (s) and a countable
2.12
55
number of weighted Dirac delta functions located at the isolated points {a j }. The
points {a j } are identified with the discrete points = { j }. Let S + = S .
Realizations on the augmented space S + generated in the manner outlined above for
the one point augmented case map directly to realizations of the non-orderly PPP
on S via the identification j a j . Other matters are similarly handled.
http://www.springer.com/978-1-4419-6922-4