(2015) Bhattacharya-Pati (Supplement)

Supplement to: Adaptive Bayesian inference in the
Gaussian sequence model using exponential-variance

priors
Debdeep Pati
Department of Statistics, Florida State University, Tallahassee, FL,
email:debdeep@stat.fsu.edu
Anirban Bhattacharya
Department of Statistics, Texas A &M University, College Station, TX,
email:anirbanb@stat.tamu.edu
1. Proofs of results in the main document
1.1. Proof of Theorem 3.1 in main document

Proof. Observe that θi | Xi , a are independent with

nXi 1
θi | Xi , a ∼ N , .
n + aei/a n + aei/a
By an application of Chebychev’s inequality,
∞
X
2β
Eθ0 Πa (nβ/(2β+1) kθ − θ 0 k > Mn | X) ≤ Mn−2 n 2β+1 Eθ0 {E(θi | Xi , a) − θ0i }2 +
i=1
∞
X
V(θi | Xi , a) . (1)
i=1
1/(2β+1)
R∞
5 Now if a = n , using p+q
dx/{x(x − q)} = log(1 + q/p)/q for p, q > 0,
∞ ∞ Z ∞ Z ∞
X X 1 1 dx
V(θi | Xi , a) = ≤ dx = a
i=1 i=1
n + aei/a 0 n + ae x/a
a+n x(x − n)
a log(1 + n/a)
=
n
−2β/(2β+1)
n log n.
Also, letting g(Xi ) = E(θi | Xi , a), we can write
Eθ0 {E(θi | Xi , a) − θ0i }2 = Vθ0 {g(Xi )} + [Eθ0 {g(Xi )} − θ0i ]2

n a2 e2i/a θ0i
2
= + . (2)
(n + aei/a )2 (n + aei/a )2
Preprint submitted to Journal of LATEX Templates April 15, 2015

Defining
∞ ∞
X n X a2 e2i/a θ0i
2
I1 = , I2 = ,
i=1
(n + aei/a )2 i=1
(n + ae )2
i/a
it follows from (1) and (2) that
Eθ0 Π(nβ/(2β+1) kθ − θ 0 k > Mn | X) ≤ Mn−2 n2β/(2β+1) (I1 + I2 ) + Mn−2 log n.

R∞
Using p+q
dx/x2 (x − q) = {log(1 + q/p) − q/(p + q)}/q 2 for p, q > 0, we have
Z ∞ Z ∞
dx dt
I1 ≤ n = na 2 (t − n)
x=0 (n + aex/a )2 n+a t
log(1 + n/a) − n/(a + n)
= na
n2
−2β/(2β+1)
≤ n log n.
Next,
∞
X e2i/a θ0i
2
2 a2 e2i/a
I2 = a2 = kθ 0 kβ max .
i=1
(n + aei/a )2 1≤i≤∞ i2β (n + aei/a )2
10 Taking maximum over i ≥ a and i < a respectively, we obtain

2
I2 ≤ kθ0 kβ max{a−2β , a2 e2 /(n + a)2 }. Now, a−2β = n−2β/(2β+1) , and a2 e2 /(n +
a)2 n−4β/(2β+1) . Hence Mn−2 n2β/(2β+1) (I1 + I2 ) + Mn−2 log n . Mn−2 log n.
The proof follows immediately.
1.2. Proof of Lemma 3.1 in main document
Proof. We shall aim to bound Πa (kθ − θ 0 k < ) for a ∈ (−1/β , 2−1/β ) in view
of the fact that
Z 2−1/β
Π(kθ − θ 0 k < ) ≥ Πa (kθ − θ 0 k < )h(a)da. (3)
−1/β
15 Fix a ∈ (−1/β , 2−1/β ). For any N ≥ 1,

N
X ∞
X
2
Πa kθ − θ 0 k ≤ 2 ≥ Πa (θi − θ0i )2 ≤ 2 /2 Πa (θi − θ0i )2 ≤ 2 /2 . (4)
i=1 i=N +1
We first proceed to bound the second term in the right hand side of (4). Note
P∞ 2
P∞ 2 P∞ 2
P∞ 2
that i=N +1 (θi − θ0i ) ≤ 2 i=1 θi + 2 i=N +1 θ0i . Now, i=N +1 θ0i <
2
P∞ 2 2
N −2β i=N +1 i2β θ0i
2
≤ kθ0 kβ N −2β . Choosing N ≥ N1 = (8 kθ 0 kβ )1/(2β) −1/β ,
2
we can ensure that kθ0 kβ N −2β ≤ 2 /8. Therefore, for any N ≥ N1 ,
∞
X ∞
X
Πa (θi − θ0i )2 ≤ 2 /2 ≥ Πa θi2 ≤ 2 /4 . (5)
i=N +1 i=N +1
By an application of Markov’s inequality,

X∞ ∞ ∞
X 8 X −i/a
Πa θi ≤ /4 ≥ 1 − 8/2
2 2
E(θi2 ) = 1− e
a2
i=N +1 i=N +1 i=N +1
8e−N/a 1
≥ 1− 2
≥ , (6)
2
P∞ −i/a
provided N ≥ N2 = 2a log(4/). We used the fact that i=N +1 e ≤
R ∞ −x/a
N
e dx = ae−N/a . We therefore have from (5) & (6) that for any N ≥
max{N1 , N2 },
∞
X
2 1 2
Πa (θi − θ0i ) ≤ /2 ≥ . (7)
2
i=N +1
We now bound the first term in the right hand side of (4). By Anderson’s
inequality (Lemma Appendix A.3),
N
X N
X
1 2
Πa (θi − θ0i )2 ≤ 2 /2 ≥ e− 2 kθ0N kH Πa θi2 ≤ 2 /2 , (8)
i=1 i=1
2 PN
where θ 0N = (θ01 , . . . , θ0N )T and kθ 0N kH = a i=1 ei/a θ0i
2
. Since a ≤ 2−1/β ,
bound
2 2
kθ 0N kH ≤ 2−1/β kθ 0 kβ max ei/a i−2β .
1≤i≤N
x/a −2β
The function x → e x is monotonically decreasing on the interval (0, 2βa)
and monotonically increasing on [2βa, ∞). Therefore, max1≤i≤N ei/a i−2β ≤
maxi∈{1,N } ei/a i−2β . We have 1/a ≤ 1/β < 1, and hence ei/a i−2β evaluated
at i = 1 can be bounded above by e. For any N satisfying 2a log(C1 /) ≤
N ≤ 2a log(C2 /) for some constants C1 < C2 , we have eN/a ≤ (C2 /)2 and
N −2β ≤ {2a log(C1 /)}−2β ≤ 2 {2 log(C1 /)}−2β since a > −1/β . Therefore,
for any such N , eN/a N −2β ≤ C22 2−2β {log(C1 /)}−2β ≤ C22 2−2β ; and hence
2 2
kθ 0N kH ≤ 2−1/β kθ 0 kβ max{e, C22 2−2β } = 2Cβ −1/β , (9)
3
2
where Cβ = kθ 0 kβ max{e, C22 2−2β }. Note that we can always choose the con-
stant C1 large enough so that the condition N ≥ max{N1 , N2 } (necessary for
(7) to hold) can be satisfied.
To bound the centered probability in (8), recall that θi2 /τi2 (a) ∼ χ21 , and
hence θi2 has the following density on (0, ∞):

a i/(2a) exp − aei/a x/2

1 1 x
√ p exp − 2 =√ e √ .
2π 2τi2 (a)x 2τi (a) 2π x
p
Let dx denote a shorthand notation for dx1 . . . dxn below. Letting D = a/(2π)
and for N ≤ κa for some constant κ to be chosen later,
N N
exp − aei/a xi /2
X Z
N (N +1) Y
2 2 N
Πa θi ≤ /2 = D e 4a √ dx
i=1
PN 2
j=1 xj ≤ /2 i=1
xi
N
exp − aeκ xi /2
Z
N (N +1) Y
N
≥D e 4a √ dx
PN 2
j=1 xj ≤ /2 i=1
xi
2 N/2 Z N
N
N (N +1) Y exp − aeκ 2 xi /4
= D e 4a √ dx
2 PN
j=1 xj ≤1 i=1
xi
N N N
aeκ X
Z Y
D N (N +1) −1/2
≥ √ e 4a exp − x i xi dx
2 PN
j=1 x j ≤1 4 i=1 i=1
N N Z 1
aeκ

D N (N +1) Γ(1/2)
= √ e 4a exp − t tN/2−1 dt.
2 Γ(N/2) t=0 4
(10)
The inequality from the first to the second line in the above display uses that
i/a ≤ κ for any 1 ≤ i ≤ N ; the inequality from the third to the fourth line
simply uses < 1 and the last equality follows from the Dirichlet integral formula
√
(Lemma Appendix A.4). Using Γ(1/2) = π and the standard inequality (see,
√
for example, [1]) Γ(α) ≤ 2πe2 e−α αα−1/2 for α > 1, we can simplify (10) to
write
N
√ N (N +1) a2 e N/2 Z 1 aeκ
X
Πa θi2 ≤ 2 /2 & N e 4a exp − t tN/2−1 dt.
i=1
2N t=0 4
(11)
For any κ > log 4, we can ensure that aeκ /4 > 1, and hence the integral in (11)
R1
can be bounded below by (aeκ /4)−N/2 0 e−z z N/2−1 dz ≥ (2e−1 /N )(aeκ /4)−N/2 .
4
Substituting in (11), we conclude, for any N ≤ κa with κ > log 4,
N N/2
1 N (N +1) −N κ/2 2e2
X
Πa θi2 ≤ /22
&√ e 4a e . (12)
i=1
N N
Choosing κ = 2 log(C2 /), the inequalities in (7), (9) and (12) are valid for any
N ∈ [2a log(C1 /), 2a log(C2 /)]. Substituting the bounds for N and the value
of κ in (12), we obtain
N
X
−1/β
{log(C 0 /)}2
Πa θi2 ≤ 2 /2 ≥ e−C ,
i=1
for appropriate constants C, C 0 . Substituting the bound (9) into (8) and com-
bining with the bound in the above display,
N
X
−1/β −1/β
{log(C 0 /)}2
Πa (θi − θ0i )2 ≤ 2 /2 ≥ e−Cβ e−C .
i=1
This bound along with (7) implies that

−1/β
2 {log(c3 /)}2
Πa kθ − θ 0 k ≤ 2 ≥ c1 e−c2 ,
20 for appropriate constants c1 , c2 , c3 independent of a. Since the above inequality

holds for any a ∈ (−1/β , 2−1/β ), and the right hand side does not depend on
a, the proof follows from (3).
1.3. Proof of Lemma 3.2 in main document

By Markov’s inequality, for any t0 > 0,
∞
X Z t0
2
Π(θ : kθkβ > B, a ≤ t0 | X) ≤ B −1 i2β g(a | X)E(θi2 | X, a)da.
i=1 a=0
25 Recall from the proof of Theorem 3.1 that

1 n2 Xi2
E(θi2 | X, a) = + .
n + aei/a (n + aei/a )2
Now for t1 > 0,
∞ ∞ t1 ∞
i2β x2β dx x2β dx
Z Z Z
X 1
≤ ≤ x2β dx +
i=1
n + aei/a 0 n + aex/a n 0 t1 n + aex/a
t2β+1
≤ 1
+ a2β Γ(2β + 1, t1 /t0 ),
(2β + 1)n
5
where Γ(a, z) is the complementary incomplete Gamma function defined as
R∞
Γ(a, z) = z xa−1 e−x dx. From [2], we know if t1 ≥ Ct0 β for some constant
C > 0, then Γ(2β + 1, t1 /t0 ) ≤ (t1 /t0 )2β e−t1 /t0 . Hence for t1 ≥ Ct0 β,
∞
X i2β t2β+1
i/a
≤ 1
+ t2β
1 e
−t1 /t0
.
i=1
n + ae (2β + 1)n
30 Next, observe that for t2 > 0,
∞ Z t0 t2 ∞ t0
g(a | X) e−i/a
X X X Z
i2β n2 Xi2 da ≤ i2β Xi2 + i2β Xi2 e−i/t0 g(a | X)da
i=1 a=0 (n + aei/a )2 i=1 i=t2 +1 a=0 a2
t2
X ∞
X
≤ i2β Xi2 + i2β Xi2 e−i/t0 ,
i=1 i=t2 +1
where the last inequality follows since the function q(x, y) = e−x/y /y 2 is bounded
for x ≥ 1, y ≥ 0. Hence for t1 ≥ Ct0 β and any t2 = 2βt0 ,
2β+1
2 1 t1 2
Eθ0 Π(θ : kθkβ > B, a ≤ t0 | X) ≤ + t2β
1 e
−t1 /t0
+ kθ 0 kβ +
B (2β + 1)n
t2 ∞
1 X 2β
X
2β −i/t0
i + i e . (13)
n i=1 i=t +1 2
2β −x/t0
Now since x 7→ x e is a decreasing function for x ≥ t2 ,
t2 ∞
t2β+1 t2β+1
X
1 X
i2β + i2β e−i/t0 ≤ 2
+ 0 Γ(2β + 1, 2β).
n i=1 i=t +1
n(2β + 1) n
2
By Lemma 4.9 of [3], for t0 (log t0 )q > 2p/D2 and t0 > e, we have
2C2 tp0 exp{−D2 t0 (log t0 )q }
Π(a > t0 ) ≤ .
D2 (log t0 )q
√
35 Also, from Lemma 6.1 of [4], K(Pθ0 ; t3 ) contains {θ : kθ − θ 0 k ≤ t3 / 2} for
0 < t3 < 1.
Hence from Lemma 3.1, for positive constants c2 , c3 ,
Π(a > t0 ) 2C2 tp0 exp{−D2 t0 (log t0 )q }/{D2 (log t0 )q }
≤ −1/β −1/β −1/β 2
. (14)
Π{K(Pθ0 ; t3 )} Π(t3 ≤ a ≤ 2t3 ) exp{−21/2β c2 t3 z0 }
√ −1/β −1/β
with z0 = log(c3 2/t3 ). Note that Π(t3 ≤ a ≤ 2t3 )
−1/β 1/β −1/β −1/β
≥ t3 C1 (t3 )p exp{−D1 (t3 ) logq (t3 )}. Now set t3 = c4 (log n)q/2 n−β/(2β+1) .
40 We can find c5 such that with t0 = c5 (log n)2β max{2−q/(2β),q/(2β)} n1/(2β+1) the
2
right hand side of (14) is less than or equal to e−2nt3 . The conclusion of the
Lemma follows from (13).
6
1.4. Proof of Theorem 3.2 in main document
Proof. We use Theorem Appendix A.1 stated in the Appendix below. For any
sequence of positive numbers Bn such that
Bn /(log n)2β(2β+1) max{2−q/(2β),q/(2β)} → ∞,
2
define the sieve Fn to be {θ : kθkβ ≤ Bn }. (A.2) is verified using Lemma 3.2
45 stated in the main document. (A.3) and (A.1) can be respectively verified using
Lemma 3.1 stated in the main document and Lemma Appendix A.2 stated in
the Appendix below with n = n−β/(2β+1) (log n)t for any t > max[β max{2 −
q/(2β), q/(2β)}, 2β/(2β + 1)].
Appendix A. Some useful results
50 We first state a general theorem of posterior concentration rate, which is

adapted from Theorem 2.1 of [5]. To apply the theorem, we consider an equiv-
alent i.i.d representation of the problem, where Y1 , Y2 , . . . , Yn are i.i.d from
model (1) in the main document with σ = 1. Such a representation was also
used by [4].
Theorem Appendix A.1. Let n2n → ∞. Let ρ be Hellinger distance on the

space {Pθ : θ ∈ R∞ }. Suppose Fn ⊂ F is a sieve satisfying
log J(n , Fn , ρ) ≤ c1 n2n , (A.1)
Eθ0 Π(Fnc | X) → 0, (A.2)
and
Π{K(Pθ0 ; n )} ≥ c4 exp{−c2 n2n }. (A.3)
Then, there exists M > 0 such that

Pθn
Π (f : ρ(Pθ , Pθ0 ) > M n |Y1 , Y2 , . . . ., Yn ) →0 0.
55 The following lemma is adapted from Lemma 1 in [6] (see also Lemma 5 in
[7]).
7
Lemma Appendix A.1. If αn → 0 and nαn2 → ∞ as n → ∞, and if Bn is a
measurable set such that
Π(Bn ) 2
≤ e−2nαn ,
Π{K(f0 ; αn )}
then Eθ0 Π(Bn | X) → 0 as n → ∞.
60 The following lemma appears as Lemma 6.4 in [4] and provides entropy
estimates of Sobolev balls Θβ (B).
Lemma Appendix A.2.
log N (, Θβ (B), k·k) ≤ −1/β (1/2)(8B)1/(2β) log(4(2e)2β ).
We next state a version of Anderson’s lemma from [8] which provides a

sharp bound on the probability of shifted balls under multivariate Gaussian
distributions in terms of the centered probability and the size of the shift.
Lemma Appendix A.3. Suppose ξ ∼ Nn (0, Σ) with Σ p.d. and ξ 0 ∈ Rn . Let

2
kξ 0 kH = ξ T0 Σ−1 ξ 0 . Then, for any t > 0,
2
P (kξ − ξ 0 k2 < t) ≥ e− 2 kξ 0 kH P (kξk2 ≤ t/2).
1
65 We next state the Dirichlet integral formula (4.635 in [9]) to simplify integrals
over the unit probability simplex.
Lemma Appendix A.4. Let ψ(·) be a Lebesgue integrable function and αj >
0, j = 1, . . . , n. Then,
n Qn 1
j=1 Γ(αj )
Z Z
α −1
X Y P
ψ xj xj j dx = Pn ψ(t) t( αj )−1
dt.
P
xj ≤1 j=1
Γ j=1 αj t=0
References
70 [1] M. Abramowitz, I. Stegun, Handbook of mathematical functions: with for-

mulas, graphs, and mathematical tables, Vol. 55, Dover publications, 1965.
8
[2] P. Natalini, B. Palumbo, Inequalities for the incomplete gamma function,
Math. Inequal. Appl 3 (1) (2000) 69–77.
[3] A. van der Vaart, J. van Zanten, Adaptive Bayesian estimation using a Gaus-
75 sian random field with inverse Gamma bandwidth, The Annals of Statistics
37 (5B) (2009) 2655–2675.
[4] E. Belitser, S. Ghosal, Adaptive bayesian inference on the mean of an

infinite-dimensional normal distribution, The Annals of Statistics 31 (2)
(2003) 536–559.
80 [5] S. Ghosal, J. K. Ghosh, A. W. Van Der Vaart, Convergence rates of posterior

distributions, Annals of Statistics 28 (2) (2000) 500–531.
[6] S. Ghosal, A. Van Der Vaart, et al., Convergence rates of posterior dis-
tributions for noniid observations, The Annals of Statistics 35 (1) (2007)
192–223.
85 [7] A. Barron, M. J. Schervish, L. Wasserman, et al., The consistency of poste-

rior distributions in nonparametric problems, The Annals of Statistics 27 (2)
(1999) 536–561.
[8] A. van der Vaart, J. van Zanten, Reproducing kernel Hilbert spaces of Gaus-
sian priors, IMS Collections 3 (2008) 200–222.
90 [9] I. Gradshteyn, I. Ryzhik, Corrected and enlarged edition, Tables of Integrals,

Series and ProductsAcademic Press, New York.

(2015) Bhattacharya-Pati (Supplement)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(2015) Bhattacharya-Pati (Supplement)

Uploaded by

Copyright:

Available Formats

Supplement to: Adaptive Bayesian inference in the

Gaussian sequence model using exponential-variance

1. Proofs of results in the main document

1.1. Proof of Theorem 3.1 in main document

Also, letting g(Xi ) = E(θi | Xi , a), we can write

Eθ0 {E(θi | Xi , a) − θ0i }2 = Vθ0 {g(Xi )} + [Eθ0 {g(Xi )} − θ0i ]2

Preprint submitted to Journal of LATEX Templates April 15, 2015

it follows from (1) and (2) that

Eθ0 Π(nβ/(2β+1) kθ − θ 0 k > Mn | X) ≤ Mn−2 n2β/(2β+1) (I1 + I2 ) + Mn−2 log n.

10 Taking maximum over i ≥ a and i < a respectively, we obtain

1.2. Proof of Lemma 3.1 in main document

15 Fix a ∈ (−1/β , 2−1/β ). For any N ≥ 1,

By an application of Markov’s inequality,

This bound along with (7) implies that

20 for appropriate constants c1 , c2 , c3 independent of a. Since the above inequality

1.3. Proof of Lemma 3.2 in main document

25 Recall from the proof of Theorem 3.1 that

Bn /(log n)2β(2β+1) max{2−q/(2β),q/(2β)} → ∞,

Appendix A. Some useful results

50 We first state a general theorem of posterior concentration rate, which is

Theorem Appendix A.1. Let n2n → ∞. Let ρ be Hellinger distance on the

log J(n , Fn , ρ) ≤ c1 n2n , (A.1)

Eθ0 Π(Fnc | X) → 0, (A.2)

Π{K(Pθ0 ; n )} ≥ c4 exp{−c2 n2n }. (A.3)

Then, there exists M > 0 such that

then Eθ0 Π(Bn | X) → 0 as n → ∞.

Lemma Appendix A.2.

log N (, Θβ (B), k·k) ≤ −1/β (1/2)(8B)1/(2β) log(4(2e)2β ).

We next state a version of Anderson’s lemma from [8] which provides a

Lemma Appendix A.3. Suppose ξ ∼ Nn (0, Σ) with Σ p.d. and ξ 0 ∈ Rn . Let

70 [1] M. Abramowitz, I. Stegun, Handbook of mathematical functions: with for-

[4] E. Belitser, S. Ghosal, Adaptive bayesian inference on the mean of an

80 [5] S. Ghosal, J. K. Ghosh, A. W. Van Der Vaart, Convergence rates of posterior

85 [7] A. Barron, M. J. Schervish, L. Wasserman, et al., The consistency of poste-

90 [9] I. Gradshteyn, I. Ryzhik, Corrected and enlarged edition, Tables of Integrals,

You might also like

15 Fix a ∈ (−1/β , 2−1/β ). For any N ≥ 1,

Theorem Appendix A.1. Let n2n → ∞. Let ρ be Hellinger distance on the

log J(n , Fn , ρ) ≤ c1 n2n , (A.1)

Π{K(Pθ0 ; n )} ≥ c4 exp{−c2 n2n }. (A.3)

log N (, Θβ (B), k·k) ≤ −1/β (1/2)(8B)1/(2β) log(4(2e)2β ).