Density Ratio Binomial Vs Poisson

The Density Ratio of Generalized Binomial versus
Poisson Distributions
Lutz Dümbgen (University of Bern)∗
and
Jon A. Wellner (University of Washington, Seattle)†
October 9, 2019
arXiv:1910.03444v1 [math.ST] 8 Oct 2019
Abstract
Let b(x) be the probability that a sum of independent Bernoulli random variables with
parameters p1 , p2 , p3 , . . . ∈ [0, 1) equals x, where λ := p1 + p2 + p3 + · · · is finite. We prove
two inequalities for the maximal ratio b(x)/πλ (x), where πλ is the weight function of the
Poisson distribution with parameter λ.
Key words: Poisson approximation, relative errors, total variation distance.
1 Introduction
We consider independent Bernoulli random variables Z1 , Z2 , Z3 , . . . ∈ {0, 1} with param-

P
eters IP(Zi = 1) = IE(Zi ) = pi ∈ [0, 1) and their random sum X = i≥1 Zi . By the first
and second Borel–Cantelli lemmas, X is almost surely finite if and only if the sequence
p = (pi )i≥1 satisfies
∞
X
λ := pk < ∞, (1)
k=1
and we exclude the trivial case λ = 0. Under this assumption, the distribution Q of X is
given by
X Y Y
IP(X = x) =: b(x) = pi (1 − pk )
J : #J=x i∈J k∈J c
for x ∈ N0 , where J denotes a generic subset of N and J c := N \ J. It is well-known that

the distribution Q may be approximated by the Poisson distribution Poiss(λ) with weights
π(x) = e−λ λx /x!,

∗
Research supported by Swiss National Science Foundation
†
Research supported in part by: (a) NSF Grant DMS-1566514; and (b) NI-AID Grant 2R01 AI291968-
04
1
provided that the quantity
X
∆ := λ−1 p2i
i≥1
is small. Indeed, a suitable version of Stein’s method, developed by Chen (1975), leads to
the remarkable bound
X
dTV Q, Poiss(λ) ≤ (1 − e−λ )∆ ≤ p2i max(1, λ),
i≥1
where dTV (·, ·) stands for total variation distance; see Theorem 2.M in Barbour et al.
(1992). Note also that
X
Var(X) = pi (1 − pi ) = λ(1 − ∆).
i≥1
Conjecture and main results. Motivated by Dümbgen et al. (2019), we are aiming
at upper bounds for the maximal density ratio

ρ Q, Poiss(λ) := sup r(x)
x≥0
with
b(x)
r(x) := .
π(x)
Note that for arbitrary sets A ⊂ N0 , the probability Q(A) = IP(X ∈ A) is never larger

than the corresponding Poisson probability times ρ Q, Poiss(λ) , no matter how small
−1
the Poisson probability is. Moreover, dTV Q, Poiss(λ) ≤ 1 − ρ Q, Poiss(λ) . Hence,

ρ Q, Poiss(λ) is a strong measure of error when Q is approximated by Poiss(λ). We
conjecture that

ρ Q, Poiss(λ) ≤ (1 − ∆)−1 . (2)
In this note we prove that

ρ Q, Poiss(λ) ≤ (1 − p∗ )−1 (3)
for arbitrary values of λ, where
p∗ := max pi ≥ ∆.
i≥1
In addition, we prove that in case of λ ≤ 1, a stronger version of (2) is true:

ρ Q, Poiss(λ) ≤ e∆ if λ ≤ 1. (4)
Note that e−∆ > 1 − ∆, whence e∆ < (1 − ∆)−1 .

In Section 2 we provide some basic formulae for the weights b(x) and the ratios r(x).
These lead to a preliminary bound for the maximizer(s) of r = b/π and a first bound for

ρ Q, Poiss(λ) . Then in Section 3 we derive the upper bound (3). In Section 4 we discuss

the case 0 < λ ≤ 1 and provide lower and upper bounds for ρ Q, Poiss(λ) .
2
2 Preparations
Discrete scores. With n := #{i ≥ 1 : pi > 0} ∈ N ∪ {∞}, note that b(x) > 0 if and
only if x ≤ n. For any x ≥ 0,
π(x + 1) λ
= ,
π(x) x+1
so the “scores” r(x + 1)/r(x) are given by
r(x + 1) (x + 1)b(x + 1)
=
r(x) λb(x)
for x ≥ 0 with b(x) > 0. If x is a maximizer of r(·), then
(x + 1)b(x + 1) xb(x)
≤ λ ≤ (5)
b(x) b(x − 1)
with b(−1) := 0.
Representing the weight function of Q. The weight function b may be written as

X Y Y
b(x) = w(J) with w(J) := pi (1 − pk ).
J:#J=x i∈J k∈J c
In particular,
Y X
b(0) = (1 − pk ) = exp log(1 − pk ) < exp(−λ) = π(0),
k≥1 k≥1
because log(1 + y) < y for −1 < y 6= 0. Since

Y pi Y Y pi
w(J) = (1 − pk ) = b(0) ,
1 − pi 1 − pi
i∈J k≥1 i∈J
we can also write w(J) = b(0)W (J) and

X Y
b(x) = b(0) W (J) with W (J) := qi ,
J:#J=x i∈J
where
pi qi
qi := ∈ [0, ∞), pi = .
1 − pi 1 + qi
Ratios of consecutive binomial weights. There are various ways to represent the
ratios b(x + 1)/b(x). In the subsequent versions, the following notation will be useful: For
any set J ⊂ N, we define
X X
s(J) := pi and S(J) := qi .
i∈J i∈J
3
In case of #J < ∞ we set
s̄(J) := s(J)/#J,
S̄(J) := S(J)/#J,
. X . X
W̄ (J) := W (J) W (L) = w(J) w(L)
L:#L=#J L:#L=#J
with the convention 0/0 := 0. Then for any integer x ≥ 0 with b(x) > 0,
b(x + 1) X X 1 X
= W (L) = W (L \ {k})qk
b(0) x+1
L:#L=x+1 L:#L=x+1 k∈L
1 X X
= W (J) qk
x+1
J:#J=x k∈J c
1 X
= W (J)S(J c ).
x+1
J:#J=x
Consequently,
(x + 1)b(x + 1) X
= W̄ (J)S(J c ). (6)
b(x)
J:#J=x
Alternatively, if b(x + 1) > 0, then

b(x) X X X qk
= W (J) = W (J)
b(0) S(J c )
J:#J=x J:#J=x k∈J c
X X W (J ∪ {k})
=
qk + S((J ∪ {k})c )
J:#J=x k∈J c
X X 1
= W (L) .
qk + S(Lc )
L:#L=x+1 k∈L
Consequently,
b(x) X 1 X 1
= W̄ (L) . (7)
(x + 1)b(x + 1) x+1 qk + S(Lc )
L:#L=x+1 k∈L
P
One can repeat the previous arguments with the sums k∈J c pj /s(J c ) = 1 in place of
P c
k∈J c qk /S(J ) = 1. This leads to
b(x) X X W (J)pk
=
b(0) p + s((J ∪ {k})c )
c k
J:#J=x k∈J
X X 1 − pk
= W (L) ,
pk + s(Lc )
L:#L=x+1 k∈L
because W (J)pk = W (J ∪ {k})(1 − pk ) for k ∈ J c . Consequently,

b(x) X 1 X 1 − pk
= W̄ (L) . (8)
(x + 1)b(x + 1) x+1 pk + s(Lc )
L:#L=x+1 k∈L
Analyzing equation (8) will lead to a first result about the location of maximizers of

r(·) plus a preliminary bound for ρ Q, Poiss(λ) .
4
Proposition 1. Any maximizer x ∈ N0 of r(x) satisfies the inequalities
1 ≤ x ≤ ⌈λ⌉.
Moreover,
ep∗ − 1 ⌈λ⌉
ρ Q, Poiss(λ) ≤ 1 + ∆ ≤ e⌈λ⌉p∗ .
p∗
Proof of Proposition 1. Since r(0) < 1, any maximizer xo of r(·) has to satisfy xo ≥ 1.
To verify the inequality xo ≤ ⌈λ⌉, it suffices to show that for any x ≥ λ with b(x) > 0,
r(x + 1)
≤ 1.
r(x)
This is equivalent to
b(x)
≥ λ−1 . (9)
(x + 1)b(x + 1)
If b(x + 1) = 0, this inequality is trivial. Otherwise, according to (8), the left hand side of
(9) equals
X 1 X 1 − pk
W̄ (L) .
x+1 pk + s(Lc )
L:#L=x+1 k∈L
Since (1 − y)/(y + s(Lc ))

is a convex function of y ≥ 0, Jensen’s inequality implies that
1 X 1 − pk 1 − s̄(L) 1 − s̄(L) 1 − s̄(L)
c
≥ c
= = .
x+1 pk + s(L ) s̄(L) + s(L ) s̄(L) + λ − s(L) λ − xs̄(L)
k∈L
But in case of x ≥ λ,
1 − s̄(L) 1 − s̄(L)
≥ = λ−1 ,
λ − xs̄(L) λ − λs̄(L)
whence (9) holds true.
Now we only need an upper bound for r(x) and apply it with x ≤ ⌈λ⌉. First of all,
X Y Y
r(x) = λ−x x! eλ pi (1 − pk )
X Y Y
= λ−x x! pi epi exp(pk + log(1 − pk ))
X Y
≤ λ−x x! pi epi
J:#J=x i∈J
X x
Y X p
k pk x
≤ λ−x pk(s) epk(s) = e .
λ
k(1),...,k(x)≥1 s=1 k≥1
Moreover,
epk ≤ 1 + (pk /p∗ )(ep∗ − 1) ≤ ep∗
by convexity and monotonicity of the exponential function, whence

X pk ep∗ − 1
epk ≤ 1 + ∆ ≤ ep∗ .
λ p∗
k≥1
5
3 Bounds in terms of p∗
3.1 A general strategy to verify upper bounds
In what follows, the dependency of objects such as Q, b, r, w(J), . . . on the sequence p

is indicated by a subscript p if necessary, leading to Qp , bp , rp , wp (J), . . ., and we write
π = πλ . Let A = A(p) ∈ [0, 1) stand for a positively homogeneous functional of p, i.e.
A(tp) = tA(p) for t ∈ (0, 1].
Two examples for such a functional are A = ∆ and A = p∗ .

Suppose we want to prove that

log ρ Q, Poiss(λ) ≤ g(A)
for a given differentiable function g : [0, 1) → [0, ∞) with g(0) = 0 and g′ (0) ≥ 1. An
explicit example is given by g(s) := − log(1 − s). To verify this conjecture, we analyze the
function f : (0, 1] → R given by

f (t) := log ρ Qtp , Poiss(tλ) − g(tA),
so the assertion is equivalent to f (1) ≤ 0. Hence, it suffices to show that f (0 +) = 0 and

that f is nonincreasing.
Note that replacing p with tp amounts to replacing λ and ∆ with tλ and t∆, respec-
tively. By Proposition 1, we know that

ρ Qtp , Poiss(tλ) = max rtp (x)
1≤x≤⌈tλ⌉
and
f (t) ≤ ⌈tλ⌉tp∗ − g(tA).
This implies already that f (0 +) = 0. If we can show that for any fixed x ∈ {1, . . . , ⌈λ⌉},
the log-density ratio Lx (t) := log rtp (x) is a continuously differentiable function of t ∈
(0, 1], then f is continuous on (0, 1] with limit f (0 +) = 0, and for t < 1,
f ′ (t +) = max L′x (t) − ge(tA)/t, (10)

x∈N (t)
where
N (t) := arg max rtp (x)
1≤x≤⌈tλ⌉
and
ge(s) := sg ′ (s).
6
Then a sufficient condition for f (1) ≤ 0 is that f ′ (t +) ≤ 0 for all t ∈ (0, 1), and this can
be rewritten as follows: For t ∈ (0, 1) and 1 ≤ x ≤ ⌈tλ⌉,
L′x (t) ≤ e
g(tA)/t if x ∈ N (t).
In view of (5), a sufficient condition for that is
xbtp (x)
L′x (t) ≤ e
g(tA)/t if ≥ tλ. (11)
btp (x − 1)
Now it is high time to analyze the functions Lx (·) for 1 ≤ x ≤ ⌈λ⌉. The inequality
x ≤ ⌈λ⌉ implies that b(x) > 0, because otherwise, λ would be a sum of x − 1 weights
pi ∈ [0, 1), and this would lead to the contradiction ⌈λ⌉ ≤ x − 1. For a set J ⊂ N with
#J = x,
∂ ∂ xY Y
wtp (J) = t pi (1 − tpk )
∂t ∂t
i∈J k∈J c
Y Y X Y Y
= xtx−1 pi (1 − tpk ) − p ℓ tx pi (1 − tpk )
i∈J k∈J c ℓ∈J c i∈J k∈J c \{ℓ}
Y Y X Y Y
= xtx−1 pi (1 − tpk ) − tx pi (1 − tpk )
i∈J k∈J c ℓ∈J c i∈J∪{ℓ} k∈(J∪{ℓ})c
x 1 X
= wtp (J) − wtp (J ∪ {ℓ}).
t t c ℓ∈J
Consequently,
∂ X ∂
btp (x) = wtp (J)
∂t ∂t
J:#J=x
x X 1 X X
= wtp (J) − wtp (J ∪ {ℓ})
t t
J:#J=x J:#J=x ℓ∈J c
x X 1 X X
= wtp (J) − wtp (L)
t t
J:#J=x L:#L=x+1 ℓ∈L
x X x+1 X
= wtp (J) − wtp (L)
t t
J:#J=x L:#L=x+1
x x+1
= btp (x) − btp (x + 1).
t t
This gives us the identity
∂ x x + 1 btp (x + 1)
log btp (x) = − .
∂t t t btp (x)
An elementary calculation yields
∂ x
log πtλ (x) = − λ,
∂t t
7
so
∂ x + 1 btp (x + 1)
L′x (t) = log rtp (x) = λ − .
∂t t btp (x)
Consequently, (11) may be rewritten as follows: For each t ∈ (0, 1) and 1 ≤ x ≤ ⌈tλ⌉,
(x + 1)btp (x + 1) xbtp (x)

≥ tλ − ge(tA) if ≥ tλ.
btp (x) btp (x − 1)
Since we could replace p with tp, it even suffices to show that for 1 ≤ x ≤ ⌈λ⌉,
(x + 1)b(x + 1) xb(x)
≥ λ−eg(A) if ≥ λ. (12)
b(x) b(x − 1)
P P
Note that b(1)/b(0) = i≥1 qi > i≥1 pi = λ, so (12) implies that
2b(2)
≥ λ−e
g (A).
b(1)
3.2 The main result
In case of A = p∗ and g(s) = − log(1 − s), the strategy just outlined works nicely, leading
to our first main result. Note that e
g(s) = s/(1 − s).
P
Theorem 1. For any sequence p of probabilities pi ∈ [0, 1) with λ = i≥1 pi < ∞,

ρ Q, Poiss(λ) ≤ (1 − p∗ )−1 .
Proof of Theorem 1. For 1 ≤ x ≤ ⌈λ⌉, the representation (7) with x − 1 in place of x

reads
b(x − 1) X 1X 1
= W̄ (J) .
xb(x) x qi + S(J c )
J:#J=x i∈J
By Jensen’s inequality,
1X 1 1 X −1 −1
c
≥ (q i + S(J )) = S̄(J) + S(J c ) ,
x qi + S(J c ) x
i∈J i∈J
so
b(x − 1) X −1
≥ W̄ (J) S̄(J) + S(J c ) .
xb(x)
J:#J=x
A second application of Jensen’s inequality yields that

b(x − 1) X −1
≥ W̄ (J) S̄(J) + S(J c ) .
xb(x)
J:#J=x
Consequently, if xb(x)/b(x − 1) ≥ λ, then

X
W̄ (J) S̄(J) + S(J c ) ≥ λ.
J:#J=x
8
On the other hand, (6) yields
(x + 1)b(x + 1) X
= W̄ (J)S(J c )
b(x)
J:#J=x
X X
= W̄ (J) S̄(J) + S(J c ) − W̄ (J)S̄(J)
J:#J=x J:#J=x
X
≥ λ− W̄ (J)S̄(J)
J:#J=x
p∗
≥ λ− = λ−e
g (p∗ ),
1 − p∗
because for any set J with x elements,
1X p∗
S̄(J) = qi ≤ .
x 1 − p∗
i∈J
Consequently, (12) is satisfied with A = p∗ , and this yields the assertion.
4 Bounds in terms of ∆
At the moment we do not know whether our general strategy works for A = ∆. Instead
we derive some bounds via direct arguments. We start with an elementary result about
the log-density ratio L1 (t) = log rtp (1).
Proposition 2. The function L1 : [0, 1] → R is twice differentiable with L1 (0) = 0,

L′1 (0) = ∆ and L′′1 ≤ 0 with equality if and only if #{i ≥ 1 : pi > 0} = 1.
Proof of Proposition 2. Note first that for t ∈ (0, 1],

X Y
L1 (t) = tλ + log (tλ)−1 (tpi ) (1 − tpk )
i≥1 k6=i
X Y
= tλ + log λ−1 pi (1 − tpk )
i≥1 k6=i
X X pi
= (tpi + log(1 − tpi )) + log λ−1 .
1 − tpi
i≥1 i≥1
The right hand side is a smooth function of t ∈ [0, 1] with L1 (0) = 0. Moreover,
X pi X p2i .X p
i
L′1 (t) = pi − +
1 − tpi (1 − tpi )2 1 − tpi
i≥1 i≥1 i≥1
X p2i X p2i .X pi
= −t + ,
1 − tpi (1 − tpi )2 1 − tpi
i≥1 i≥1 i≥1
X . X
L′1 (0) = p2i pi = ∆.
i≥1 i≥1
9
P
Finally, with ai (t) := pi /(1 − tpi ) and S(t) := i≥1 ai (t),
X p2i X p3i .X p
i
L′′1 (t) = − 2
+ 2 3
(1 − tpi ) (1 − tpi ) 1 − tpi
i≥1 i≥1 i≥1
X p2i .X pi 2
−
(1 − tpi )2 1 − tpi
i≥1 i≥1
X X X
2
= − ai (t) + 2 ai (t)3 /S(t) − ai (t)2 aj (t)2 /S(t)2
i≥1 i≥1 i,j≥1
X
≤ − ai (t) − 2ai (t) /S(t) + ai (t)4 /S(t)2
2 2
i≥1
X 2
= − ai (t)2 1 − ai (t)/S(t)
i≥1
≤ 0.
The second last inequality is strict, unless #{i ≥ 1 : pi > 0} = 1, and in that case both
preceding inequalities are equalities.
Propositions 1 and 2 are the main ingredients for the following upper bound for

log ρ Q, Poiss(λ) .
P
Theorem 2. For any sequence p of probabilities pi ∈ [0, 1) with λ = i≥1 pi ≤ 1,
∆ λ
∆ ≥ log ρ Q, Poiss(λ) ≥ ∆ 1 − − .
2 2(1 − p∗ )
Since ∆ ≤ p∗ ≤ λ, this theorem shows that

log ρ Q, Poiss(λ)
→ 1 as λ → 0.
∆
Proof of Theorem 2. We know from Proposition 1 that in case of λ ≤ 1,

log ρ Q, Poiss(λ) = log r(1) = L1 (1).
But Proposition 2 implies that for some ξ ∈ (0, 1),
L1 (1) = L1 (0) + L′1 (0) + 2−1 L′′1 (ξ) = 0 + ∆ + 2−1 L′′1 (ξ) ≤ ∆.
As to the lower bound, recall that

X X pi
L1 (1) = (pi + log(1 − pi )) + log λ−1 .
1 − pi
i≥1 i≥1
On the one hand,

X pk p2i X ℓ p2i
i
pi + log(1 − pi ) = − ≥ − p∗ = − ,
k 2 2(1 − p∗ )
k≥2 ℓ≥0
10
so
X 1 X λ
(pi + log(1 − pi )) ≥ − p2i = − ∆.
2(1 − p∗ ) 2(1 − p∗ )
i≥1 i≥1
Moreover,
X pi X
log λ−1 ≥ log λ−1 (pi + p2i ) = log(1 + ∆) ≥ ∆ − ∆2 /2,
1 − pi
i≥1 i≥1
and this implies the asserted lower bound for L1 (1).
Remark 3 (Total variation distance). Since b(0) ≤ π(0), Theorem 2 implies that in case
of λ ≤ 1,

dTV Q, Poiss(λ) = sup Q(A) − Poiss(λ)(A)
A⊂{1,2,3,...}
−1
≤ sup Q(A) 1 − ρ Q, Poiss(λ)
A⊂{1,2,3,...}
≤ (1 − b(0))(1 − e−∆ )
X
≤ λ(1 − e−∆ ) ≤ λ∆ = p2i .
i≥1
Q P
Here we used the elementary inequalities 1 − b(0) = 1 − i≥1 (1 − pi ) ≤ i≥1 pi =
λ and 1 − e−∆ ≤ ∆. Consequently, Theorem 2 implies a reasonable upper bound for

dTV Q, Poiss(λ) .
Acknowledgement. Part of this research was conducted at Mathematisches Forschungs-

institut Oberwolfach (MFO), Germany, in June and July 2019. We are grateful to the
MFO for its generous hospitality and support.
References
Barbour, A. D., Holst, L. and Janson, S. (1992). Poisson approximation, vol. 2

of Oxford Studies in Probability. The Clarendon Press, Oxford University Press, New
York. Oxford Science Publications.
Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Probability 3

534–545.
Dümbgen, L., Samworth, R. J. and Wellner, J. A. (2019). Bounding distributional

errors via density ratios. Tech. rep., University of Bern. (arXiv:1905.03009).
11

Density Ratio Binomial Vs Poisson

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Density Ratio Binomial Vs Poisson

Uploaded by

Copyright:

Available Formats

The Density Ratio of Generalized Binomial versus

Key words: Poisson approximation, relative errors, total variation distance.

We consider independent Bernoulli random variables Z1 , Z2 , Z3 , . . . ∈ {0, 1} with param-

for x ∈ N0 , where J denotes a generic subset of N and J c := N \ J. It is well-known that

π(x) = e−λ λx /x!,

In this note we prove that

for arbitrary values of λ, where

In addition, we prove that in case of λ ≤ 1, a stronger version of (2) is true:

Note that e−∆ > 1 − ∆, whence e∆ < (1 − ∆)−1 .

for x ≥ 0 with b(x) > 0. If x is a maximizer of r(·), then

Representing the weight function of Q. The weight function b may be written as

because log(1 + y) < y for −1 < y 6= 0. Since

we can also write w(J) = b(0)W (J) and

Alternatively, if b(x + 1) > 0, then

because W (J)pk = W (J ∪ {k})(1 − pk ) for k ∈ J c . Consequently,

Since (1 − y)/(y + s(Lc ))

by convexity and monotonicity of the exponential function, whence

In what follows, the dependency of objects such as Q, b, r, w(J), . . . on the sequence p

A(tp) = tA(p) for t ∈ (0, 1].

Two examples for such a functional are A = ∆ and A = p∗ .

so the assertion is equivalent to f (1) ≤ 0. Hence, it suffices to show that f (0 +) = 0 and

f ′ (t +) = max L′x (t) − ge(tA)/t, (10)

In view of (5), a sufficient condition for that is

(x + 1)btp (x + 1) xbtp (x)

3.2 The main result

Proof of Theorem 1. For 1 ≤ x ≤ ⌈λ⌉, the representation (7) with x − 1 in place of x

A second application of Jensen’s inequality yields that

Consequently, if xb(x)/b(x − 1) ≥ λ, then

Consequently, (12) is satisfied with A = p∗ , and this yields the assertion.

Proposition 2. The function L1 : [0, 1] → R is twice differentiable with L1 (0) = 0,

Proof of Proposition 2. Note first that for t ∈ (0, 1],

Since ∆ ≤ p∗ ≤ λ, this theorem shows that

Proof of Theorem 2. We know from Proposition 1 that in case of λ ≤ 1,

But Proposition 2 implies that for some ξ ∈ (0, 1),

As to the lower bound, recall that

On the one hand,

and this implies the asserted lower bound for L1 (1).

Acknowledgement. Part of this research was conducted at Mathematisches Forschungs-

Barbour, A. D., Holst, L. and Janson, S. (1992). Poisson approximation, vol. 2

Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Probability 3

Dümbgen, L., Samworth, R. J. and Wellner, J. A. (2019). Bounding distributional

You might also like