You are on page 1of 8

Concentration of measure for the number of isolated vertices in the

Erdös-Rényi random graph by size bias couplings



Subhankar Ghosh and Larry Goldstein

University of Southern California

Abstract
Let Y be a nonnegative random variable with mean µ and let Y s , defined on the same space as Y ,
have the Y -size biased distribution, that is, the distribution characterized by

E[Y f (Y )] = µE[f (Y s )] for all functions f for which these expectations exist.

The size bias coupling of Y to Y s can be used to obtain the following concentration of measure result
when Y counts the number of isolated vertices in an Erdös- Rényi random graph model on n edges with
edge probability p. With σ 2 denoting the variance of Y ,
  Z θ
Y −µ µ
P ≥ t ≤ inf exp(−θt + H(θ)) where H(θ) = sγs ds
σ θ≥0 2σ 2 0

with
n
pes

γs = 2e 2s
1+ + (1 − p)−n + 1.
1−p

Left tail inequalities may be obtained in a similar fashion. When np → c for some constant c ∈ (0, ∞)
as n → ∞, the bound is of the order at most e−kt for some positive constant k.

1 Introduction and main result


For some n ∈ {1, 2, . . . , } and p ∈ (0, 1) let K be the random graph on the vertices V = {1, 2, . . . , n}, with the
indicators Xvw of the presence of edges between two unequal vertices v and w being independent Bernoulli
random variables with success probability p, and Xvv = 0 for all v ∈ V. Recall that the degree of a vertex
v ∈ V is the number of edges incident on v,
X
d(v) = Xvw . (1)
w∈V

Many authors have studied the distribution of


X
Y = 1(d(v) = d) (2)
v∈V

∗ Department of Mathematics, University of Southern California, Los Angeles, CA 90089, USA,subhankg@usc.edu and

larry@usc.edu
2000 Mathematics Subject Classification: Primary 60E15; Secondary 60C05,60D05.
Keywords: Large deviations; graph degree; size biased couplings

1
counting the number of vertices v of K with degree d(v) = d for some fixed d. In this paper we derive upper
bounds, for fixed n, on the distribution function of the number of isolated vertices of K, that is, (2) for the
case d = 0 where Y counts the number of vertices having no incident edges.
For d in general, and p = pn depending on n, previously in [7] the asymptotic normality of Y was shown
when n(d+1)/d pn → ∞ and npn → 0, or npn → ∞ and npn − log n − d log log n → −∞; see also [10] and [3].
Asymptotic normality of Y when npn → c > 0, was obtained by [2]. The size bias coupling considered here
was used in [6] to study the rate of convergence to the multivariate normal distribution for a vector whose
components count the number of vertices of some fixed degrees. In [8], the mean µ and variance σ 2 of Y for
the particular case d = 0 are computed as

µ = n(1 − p)n−1 and σ 2 = n(1 − p)n−1 (1 + np(1 − p)n−2 − (1 − p)n−2 ). (3)

In the same paper, Kolmogorov distance bounds to the normal were obtained and asymptotic normality
shown when

n2 p → ∞ and np − log(n) → −∞.

O’Connell [9] showed that an asymptotic large deviation principle holds for Y . Raič [11] obtained
nonuniform large deviation bounds in some generality, for random variables W with E(W ) = 0 and Var(W ) =
1, of the form
P (W ≥ t) 3
≤ et β(t)/6 (1 + Q(t)β(t)) for all t ≥ 0 (4)
1 − Φ(t)
where Φ(t) denotes the distribution function of a standard normal variate and Q(t) is a quadratic in t.
Although in general the expression for β(t) is not simple, when W equals Y properly standardized and
np → c as n → ∞, (4) holds for all n sufficiently large with
 √

C1 C2 t C4 t/ n
β(t) = √ exp √ + C3 (e − 1)
n n

for some constants C1 , C2 , C3 and C4 . For t of order n1/2 , for instance, the function β(t) will be small as
n → ∞, allowing an approximation of the deviation probability P (W ≥ t) by the normal, to within some
factors.
Theorem 1.1 below, by contrast, produces a non-asymptotic, explicit bound, that is, it does not require
2
any relation between n and p and is satisfied for all n. Moreover by (9) and (7), the bound is of order e−λt
2
over some range of t, and of worst case order e−ρt , for the right tail, and e−ηt for the left tail, where λ, ρ
and η are explicit, with the bound holding for all t ∈ R.
Theorem 1.1. For n ∈ {1, 2, . . .} and p ∈ (0, 1) let K denote the random graph on n vertices where each
edge is present with probability p, independently of all other edges, and let Y denote the number of isolated
vertices in K. Then for all t > 0,
  Z θ
Y −µ µ
P ≥ t ≤ inf exp(−θt + H(θ)) where H(θ) = sγs ds, (5)
σ θ≥0 2σ 2 0

with the mean µ and variance σ 2 of Y given in (3), and


n
pes

γs = 2e2s 1 + + β + 1 where β = (1 − p)−n . (6)
1−p
For the left tail, for all t > 0,
 2
σ2
  
Y −µ t
P ≤ −t ≤ exp − . (7)
σ 2 µ(β + 1)

2
Recall that for a nonnegative random variable Y with finite, nonzero mean µ, the size bias distribution
of Y is given by the law of a variable Y s satisfying

E[Y f (Y )] = µE[f (Y s )] (8)

for all f for which the expectations above exist. The main tool used in proving Theorem 1.1 is size bias
coupling, that is, constructing Y and Y s , having the Y -size biased distribution, on the same space. In [4]
and [5], size bias couplings were used to prove concentration of measure inequalities when |Y s − Y | can be
almost surely bounded by a constant independent of the problem size, the number of vertices in the present
case. Here, when Y is the number of isolated vertices of K, we consider a coupling of Y to Y s with the
Y -size bias distribution where this boundedness condition is violated. Unlike the theorem used in [4] and
[5], which can be applied to a wide variety of situations under a bounded coupling assumption, it seems that
cases where the coupling is unbounded, such as the one we consider here, need application specific treatment,
and cannot be handled by one single general result.
Remark 1.1. Useful bounds for the minimization in (5) may be obtained by restricting to θ ∈ [0, θ0 ] for
some θ0 . In this case, as γs is an increasing function of s, we have
µ
H(θ) ≤ γθ θ 2 for θ ∈ [0, θ0 ].
4σ 2 0
The quadratic −θt + µγθ0 θ2 /(4σ 2 ) in θ is minimized at θ = 2tσ 2 /(µγθ0 ). When this value falls in [0, θ0 ] we
obtain the first bound in (9), and setting θ = θ0 yields the second, thus,
 ( t2 σ 2
exp(− µγ ) for t ∈ [0, θ0 µγθ0 /(2σ 2 )]

Y −µ
P ≥t ≤ θ0
2
µγθ0 θ0
(9)
σ exp(−θ0 t + 4σ 2 ) for t ∈ (θ0 µγθ0 /(2σ 2 ), ∞).

Though Theorem 1.1 is not an asymptotic result, when np → c as n → ∞, (3) and (6) yield

σ2 s
→ 1 + ce−c − e−c , β → ec and γs → 2e2s+ce + ec + 1 as n → ∞.
µ
Since limn→∞ γs and limn→∞ µ/σ 2 exist, the right tail probability is at most of the exponential order e−ρt
for some ρ > 0. Also, the left tail bound (7) in this asymptotic behaves as

t 1 + ce−c − e−c
 2
σ2
  2 
t
lim exp − = exp − ,
n→∞ 2 µ(β + 1) 2 e−c + 1
2
that is, as e−ηt with η > 0.
The paper is organized as follows. In Section 2 we review results leading to the construction of size
biased couplings for sums of possibly dependent variables, and then apply it in Section 3 to the number Y
of isolated vertices; this construction first appeared in [6]. The proof of Theorem 1.1 is also given in Section
3.

2 Construction of size bias couplings


In this section we will review the discussion in [6] which gives a procedure for a construction of size bias
couplings when Y is a sum; the method has its roots in the work of Baldi et al. [1]. The construction depends
on being able to size bias a collection of nonnegative random variables in a given coordinate, as described
in Definition 2.1. Letting F be the distribution of Y , first note that the characterization (8) of the size bias
distribution F s is equivalent to the specification of F s by its Radon Nikodym derivative
x
dF s (x) = dF (x). (10)
µ

3
Definition 2.1. Let A be an arbitrary index set and let X = {Xα : α ∈ A} be a collection of nonnegative
random variables with finite, nonzero expectations EXα = µα and joint distribution dF (x). For β ∈ A, we
say that Xβ = {Xαβ : α ∈ A} has the X-size bias distribution in coordinate β if Xβ has joint distribution

dF β (x) = xβ dF (x)/µβ .

Just as (10) is related to (8), the random vector Xβ has the X-size bias distribution in coordinate β if
and only if

E[Xβ f (X)] = µβ E[f (Xβ )] for all functions f for which these expectations exist.

Letting f (X) = g(Xβ ) for some function g one recovers (8), showing that the β th coordinate of Xβ , that is,
Xββ , has the Xβ -size bias distribution.
The factorization
dF (x) = P (X ∈ dx|Xβ = x)P (Xβ ∈ dx)
of the joint distribution of X suggests the following way to construct X. First generate Xβ , a variable with
distribution P (Xβ ∈ dx). If Xβ = x, then generate the remaining variates {Xα , α 6= β} with distribution
P (X ∈ dx|Xβ = x). Similarly, the factorization

dF β (x) = xβ dF (x)/µβ = P (X ∈ dx|Xβ = x)xβ P (Xβ ∈ dx)/µβ = P (X ∈ dx|Xβ = x)P (Xββ ∈ dx) (11)

suggests that to generate Xβ with distribution dF β (x), one may first generate a variable Xββ with the
Xβ -size bias distribution, then, when Xββ = x, generate the remaining variables according to their original
conditional distribution given that the β th coordinate takes on the value x.
Definition 2.1 and the following special case of a proposition from Section 2 of [6] will be applied in the
subsequent constructions; the reader is referred there for the simple proof.
Proposition 2.1. Let A be an arbitrary indexP set, and let X = {Xα , α ∈ A} be a collection of nonnegative
random variables with finite means. Let Y = β∈A Xβ and assume µA = EY is finite and positive. Let
Xβ have the X-size biased distribution in coordinate β as in Definition 2.1. Then, if I is a random index,
independent of X, taking values in A with distribution

P (I = β) = µβ /µA , β ∈ A,

the variable Y s = XαI has the Y -sized biased distribution as in (8).


P
α∈A

In our examples we use Proposition 2.1 and the random index I, and (11), to obtain Y s by first generating
XII with the size bias distribution of XI , then, if I = β and Xββ = x, generating {Xαβ : α ∈ A\{β}} according
to the (original) conditional distribution P (Xα , α 6= β|Xβ = x).

3 Proof of Theorem 1.1


We now present the proof of Theorem 1.1.
Proof. We first construct a coupling of Y s , having the Y -size bias distribution, to Y . Let K be given, and
let Y be the number of isolated vertices in K. To size bias Y , first recall the representation of Y as the
sum (2) with d = 0. As the summands in (2) are exchangeable, the distribution of the random index I
in Proposition 2.1 is uniform. Hence, choose one of the n vertices of K uniformly. If the chosen vertex,
say V , is already isolated, we do nothing and set K s = K, as the remaining variables already have their
conditional distribution given that V is isolated. Otherwise obtain K s by deleting all the edges connected
to K. By Proposition 2.1, the variable Y s counting the number of isolated vertices of K s has the Y -size
biased distribution.

4
To derive the needed properties of this coupling, let N (v) be the set of neighbors of v ∈ V, and T be the
collection of isolated vertices of K, that is, with d(v), the degree of v, given in (1),
N (v) = {w : Xvw = 1} and T = {v : d(v) = 0}.
Note that Y = |T |. Since all edges incident to the chosen V are removed in order to form K s , any neighbor
of V which had degree one thus becomes isolated, and V also becomes isolated if it was not so earlier. As
all other vertices are otherwise unaffected, as far as their being isolated or not, we have
X
Y s − Y = d1 (V ) + 1(d(V ) 6= 0) where d1 (V ) = 1(d(w) = 1). (12)
w∈N (V )

In particular the coupling is monotone, that is, Y s ≥ Y . Since d1 (V ) ≤ d(V ), (12) yields
Y s − Y ≤ d(V ) + 1. (13)
Now note that for real x 6= y, the convexity of the exponential function implies
Z 1 Z 1
ey − ex ey + ex
= ety+(1−t)x dt ≤ (tey + (1 − t)ex )dt = . (14)
y−x 0 0 2
Using (14), that the coupling is monotone, and that Y is a function of T , for θ ≥ 0 we have
s θ  s s

E(eθY − eθY ) ≤ E (Y − Y )(eθY + eθY )
2
θ
= E (exp(θY )(Y s − Y ) (exp(θ(Y s − Y )) + 1))
2
θ
= E {exp(θY )E((Y s − Y )(exp(θ(Y s − Y )) + 1)|T )} . (15)
2
Now using that Y s = Y when V ∈ T , and (13), we have

E((Y s − Y )(exp(θ(Y s − Y )) + 1)|T )


≤ E((d(V ) + 1)(exp(θ(d(V ) + 1)) + 1)1(V 6∈ T )|T )
  
≤ eθ E d(V )eθd(V ) + eθd(V ) + e−θ d(V ) 1(V 6∈ T )|T + 1
  
≤ eθ E 2d(V )eθd(V ) + e−θ d(V ) 1(V 6∈ T )|T + 1, (16)

where in the final inequality we have used that 1(V 6∈ T ) ≤ d(V )1(V 6∈ T ).
To derive a bound on the expectation in (16), first note that by conditioning we obtain
P (d(V ) = k, 1(V 6∈ T ) = 1|T ) = P (V 6∈ T )P (d(V ) = k|T , V 6∈ T ). (17)
Since V is chosen independently of K, the distribution on the right hand side of (17) is a Binomial sum with
number of trials equal to the number n − 1 − Y of non-isolated vertices other than V , each having success
probability p, conditioned to be nonzero, that is,
(
n−1−Y pk (1−p)n−1−Y −k

for 1 ≤ k ≤ n − 1 − Y
P (d(V ) = k|T , V 6∈ T ) = k 1−(1−p)n−1−Y (18)
0 otherwise.

Using the conditional distribution (18) and (17), and dropping the factor P (V 6∈ T ) in the later to obtain
an inequality, it can be easily verified that the first derivative of the conditional moment generating function
of d(V ) satisfies
(n − 1 − Y )(peθ + 1 − p)n−2−Y peθ
E(d(V )eθd(V ) 1(V 6∈ T )|T ) ≤ .
1 − (1 − p)n−1−Y

5
Starting with the first term of (16), by the mean value theorem applied to the function f (x) = xn−1−Y ,
for some ξ ∈ (1 − p, 1) we have

1 − (1 − p)n−1−Y = f (1) − f (1 − p) = (n − 1 − Y )pξ n−2−Y ≥ (n − 1 − Y )p(1 − p)n .

Hence, recalling θ ≥ 0,

(n − 1 − Y )(peθ + 1 − p)n peθ


E(d(V )eθd(V ) 1(V 6∈ T )|T ) ≤
1 − (1 − p)n−1−Y
(n − 1 − Y )(peθ + 1 − p)n peθ

(n − 1 − Y )p(1 − p)n
n
peθ

θ
= αθ where αθ = e 1 + . (19)
1−p

Lastly, again applying (17) and (18) we may handle the second term in (16) by the inequality

(n − 1 − Y )p (n − 1 − Y )p
E(d(V )1(V 6∈ T )|T ) ≤ n−1−Y
≤ = β, (20)
1 − (1 − p) (n − 1 − Y )p(1 − p)n

with β as in (6).
Substituting inequalities (19) and (20) into (16) yields

E((Y s − Y )(exp(θ(Y s − Y )) + 1)|T ) ≤ γθ where γθ = 2eθ αθ + β + 1. (21)

Now, by (15) we have that


s θγθ
E(eθY − eθY ) ≤ E(eθY ) for all θ ≥ 0. (22)
2
Letting m(θ) = E(eθY ) and using (8) and (22) we obtain
 
0 θY θY s θγθ
m (θ) = E(Y e ) = µE(e )≤µ 1+ m(θ). (23)
2
Standardizing, we set

M (θ) = E(exp(θ(Y − µ)/σ)) = e−θµ/σ m(θ/σ), (24)

and now by differentiating and applying (23), we obtain


1 −θµ/σ 0 µ
M 0 (θ) = e m (θ/σ) − e−θµ/σ m(θ/σ)
σ  σ
µ −θµ/σ θγθ µ
≤ e 1+ m(θ/σ) − e−θµ/σ m(θ/σ)
σ 2σ σ
µθγ θ µθγ θ
= e−θµ/σ m(θ/σ) = M (θ).
2σ 2 2σ 2
Since M (0) = 1, integrating M 0 (s)/M (s) over [0, θ] yields the bound
Z θ
µ
log(M (θ)) ≤ H(θ), or that M (θ) ≤ exp(H(θ)) where H(θ) = sγs ds.
2σ 2 0

Hence for t ≥ 0,
     
Y −µ θ(Y − µ)
P ≥t ≤ P exp ≥e θt
≤ e−θt M (θ) ≤ exp(−θt + H(θ)).
σ σ

6
As the inequality holds for all θ ≥ 0, it holds for the θ achieving the minimal value, proving (5).
To demonstrate the left tail bound let θ < 0. Since Y s ≥ Y and θ < 0, using (14) and (13), and recalling
s
Y = Y when V ∈ T , we obtain
s |θ|  θY s

E(eθY − eθY ) ≤ E (e + eθY )(Y s − Y )
2
≤ |θ|E(eθY (Y s − Y ))
= |θ|E(eθY E(Y s − Y |T ))
≤ |θ|E(eθY E((d(V ) + 1)1(V 6∈ T ))|T )).

Now (20) yields


s
E(eθY − eθY ) ≤ (β + 1)|θ|E(eθY ),

and therefore
s
m0 (θ) = µE(eθY ) ≥ µ (1 + (β + 1)θ) m(θ).

Again with M (θ) as in (24),


1 −θµ/σ 0 µ
M 0 (θ) = e m (θ/σ) − e−θµ/σ m(θ/σ)
σ σ
µ −θµ/σ µ
≥ e ((1 + (β + 1)θ/σ)m(θ/σ)) − e−θµ/σ m(θ/σ)
σ σ
µ(β + 1)θ
= M (θ).
σ2
Dividing by M (θ) and integrating over [θ, 0] yields

µ(β + 1)θ2
log(M (θ)) ≤ . (25)
2σ 2
The inequality in (25) implies that for all t > 0 and θ < 0,

µ(β + 1)θ2
   
Y −µ
P ≤ −t ≤ exp θt + .
σ 2σ 2

Taking θ = −tσ 2 /(µ(β + 1)) we obtain (7).


Acknowledgements: The authors would like to thank an anonymous referee for a number of helpful
comments.

References
[1] Baldi, P., Rinott, Y. and Stein, C. (1989). A normal approximations for the number of local
maxima of a random function on a graph, Probability, Statistics and Mathematics, Papers in Honor of
Samuel Karlin, T. W. Anderson, K.B. Athreya and D. L. Iglehart eds., Academic Press, 59-81.
[2] Barbour, A.D., Karoński, M. and Ruciński,A.(1989). A central limit theorem for decomposable
random variables with applications to random graphs, J. Combinatorial Theory B, 47, 125-145.
[3] Bollobás, B. (1985). Random graphs. Academic Press Inc., London.
[4] Ghosh, S. and Goldstein, L.(2009). Concentration of measures via size biased couplings, Probab.
Th. Rel. Fields, DOI: 10.1007/s00440-009-0253-3.

7
[5] Ghosh, S. and Goldstein, L.(2011). Applications of size biased couplings for concentration of mea-
sures, Electronic Communications in Probability, 16, pp. 70-83
[6] Goldstein, L. and Rinott, Y.(1996). Multivariate normal approximations by Stein’s method and
size bias couplings, Journal of Applied Probability, 33,1-17.

[7] Karoński, M. and Ruciński, A. (1987). Poisson convergence and semi-induced properties of random
graphs. Math. Proc. Cambridge Philos. Soc., 101, 291-300.
[8] Kordecki, W.(1990). Normal approximation and isolated vertices in random graphs, Random Graphs
’87, Karoński, M., Jaworski, J. and Ruciński, A. eds., John Wiley & Sons Ltd., 131-139.

[9] O’Connell, N.(1998). Some large deviation results for sparse random graphs, Probab. Th. Rel. Fields,
110, 277-285.
[10] Palka, Z. (1984). On the number of vertices of given degree in a random graph. J. Graph Theory, 8,
167–170.

[11] Raič, M.(2007). CLT related large deviation bounds based on Stein’s method, Adv. Appl. Prob., 39,
731-752.

You might also like