Mutual Information For The Stochastic Block Model by The Adaptive Interpolation Method

Mutual Information for the Stochastic Block Model
by the Adaptive Interpolation Method

Jean Barbier∗? , Chun Lam Chan† and Nicolas Macris†
∗ Laboratoire de Physique de l’École Normale Supérieure, Paris, France.

? Quantitative Life Sciences, The Abdus Salam International Center for Theoretical Physics, Trieste, Italy.
† Laboratoire de Théorie des Communications, Ecole Polytechnique Fédérale de Lausanne, Switzerland.
arXiv:1902.07273v1 [cs.IT] 19 Feb 2019
Abstract—We rigorously derive a single-letter variational ex- the asymptotic value of the mutual information per node one
pression for the mutual information of the asymmetric two- can compute the information theoretic thresholds of recovery.
groups stochastic block model in the dense graph regime. Existing In this paper we focus on the mutual information of the
proofs in the literature are indirect, as they involve mapping the
model to a rank-one matrix estimation problem whose mutual two-groups SBM with possibly asymmetric group sizes, and
information is then determined by a combination of methods this when the expected degree of the nodes is independent
(e.g., interpolation, cavity, algorithmic, spatial coupling). In this of their label and diverges with the total number of nodes.
contribution we provide a self-contained direct method using only We rigorously determine a single-letter variational expression
the recently introduced adaptive interpolation method. for the asymptotic mutual information when the number of
nodes diverges by means of the recently developed adaptive
I. I NTRODUCTION
interpolation method [7].
The stochastic block model (SBM) has a long history and Single-letter variational expressions for the mutual infor-
has attracted the attention of many disciplines. It was first mation of the SBM are not new. First of all they have been
introduced as a model of community detection in the networks analytically derived in heuristic ways by methods of statistical
and statistics literature [1], as a problem of finding graph physics and in this context are often called replica or cavity
bisections in the theoretical computer science literature [2], formulas [8]. They were then rigorously derived notably in [9,
and has also been proposed as a model for inhomogeneous 10]. The approach in these works is indirect in the sense that
random graphs [3, 4]. Here we adopt the community detection the SBM is first mapped on a rank-one matrix factorization
interpretation and motivation [5]. A planted partition of nodes problem, and then the matrix factorization problem is solved.
into labeled groups is hidden to an observer who is only given In [9] the case of two equal size communities is considered and
a random graph generated on the basis of the partition. The the analysis relies on the fact that in this case the information
task of the observer is to recover the hidden partition from theoretic phase transition is of the second order type (i.e.
the graph. A simple setting that lends itself to mathematical continuous) which allows to use message-passing arguments.
analysis is the following. The labels of nodes are drawn The asymmetric case is more challenging because first order
i.i.d. from a prior distribution, and for the graph, the edges (discontinuous) phase transitions can appear. In [10] this case
between pairs of nodes are placed independently according to is tackled through a Guerra-Toninelli interpolation combined
a probability which depends only on the group labels. If the with a rigorous version of the cavity method or Aizenman-
probability is slightly higher (resp. lower) when nodes have the Sims-Starr scheme [11]. Note that the mutual information of
same label the model is called assortative (resp. disassortative). rank-one matrix factorization had also been determined inde-
Moreover we suppose that the parameters of the prior and edge pendently in [12] for the symmetric case and more recently for
probability distributions are all known so that we are working the general case in [13, 14] using a spatial coupling method.
in the framework of Bayesian (optimal) inference. Note that The proof presented here covers the asymmetric two-groups
the recovery task is non-trivial only when the average degree SBM and has the virtue of being completely unified. It
of a node is independent of its group label in order not to reveal uses a single method, namely the adaptive interpolation, is
any information. Much progress has been done in recent years conceptually simpler, and is direct as it does not make any
in this simple mathematical setting and we refer to [6] for a detour through another model. The method is a powerful
recent comprehensive review and references. evolution of the classic Guerra-Toninelli interpolation and
In the limit of large number of nodes the SBM displays allows to derive tight upper and lower bounds for the mutual
interesting phase transitions for (partial) recovery of the hidden information, whereas the classic interpolation only yields a
partition and much effort has been deployed to characterize one-sided inequality. It has been successfully applied to a
the phase diagram, information theoretically, algorithmically, range of Bayesian inference problems, e.g. [15, 16]. Here,
and compute the algorithmic-to-statistical gaps. In this vein besides various new technical aspects, the main novelty is that
a fundamental quantity is the mutual information between the we do not use Gaussian integration by parts, as is generally the
hidden labels of the nodes and the observed graph. Indeed from case in interpolation methods. Instead, we develop a general
approximate integration by parts formula here applied to the ∆n = n−θ , 1/2 < θ < 1, which fulfill these conditions (these
Bernoulli random elements of the adjacency matrix of the conditions are easily translated back to the matrix M ).
graph. We note that related approximate integration by parts We note that in the sparse graph version of the model one
formulas have already been used by [17, 18] in the context of would have a finite limit for dn but the second condition would
the Hopfield and Sherrington-Kirkpatrick models. be the same. The analysis of the sparse case is however more
It would be desirable to extend the present method to the difficult and is not adressed in this paper.
sparse regime of the SBM where the average degree of the Instead of working with the Ising spin ±1 variables it is
nodes stays finite. This is much more challenging however, and 0
convenient to p change the alphabet. We definep Xi ≡ φr (Xi )
the mutual information has so far been determined only for with φr (1) = (1 − r)/r and φr (−1) = − r/(1 − r). The
the disassortative case [19] while the assortative case remains hidden labels
p of the nodes nowp belong to the alphabet X ≡
open. The thresholds however have been successfully deter- {X1 = (1 − r)/r, X2 = − r/(1 − r)} and X ∈ X n . It
mined for both cases in [20–23]. The adaptive interpolation will be useful to define the law Pr ≡ rδX1 + (1 − r)δX2 . An
method has been developed for the related censored block edge is then present with conditional probability
model in the sparse regime [24] and hopefully it can be also
extended to the sparse SBM, which we leave for future work. P(Gij = 1|Xi Xj ) = p̄n + ∆n Xi Xj .
II. S ETTING : DENSE ASYMMETRIC SBM This can be viewed as an asymmetric binary-input binary-
We first formulate the SBM for two communities that may output channel X → G and the inference problem is to
be of different sizes. Suppose we have n nodes belonging recover the input X (or X 0 ) from the channel output G.
to two communities where the partition is denoted by a We now formulate our results which provide a single-letter
vector X 0 ∈ {−1, 1}n . Labels Xi0 are i.i.d. Bernoulli random variational formula for the asymptotic mutual information. Let
variables with P(Xi0 = 1) = r ∈ (0, 1/2]. The size of Z ∼ N (0, 1) and X ∼ Pr independently, and set for q > 0:
√ community is nr and n(1 − r) up to fluctuations of
each √
q2 q 2
Ψ(q, λ) ≡ λ4 + 4λ − E ln x∈X Pr (x)e q Zx+qXx− 2 x .
P
O( n). The labels X 0 are hidden and instead one is given
the following random undirected graph G (equivalently one is The so-called replica formula conjectures the identity
given an adjacency marix). An edge between node i and j is
present with probability P(Gij = 1|Xi0 , Xj0 ) = MXi0 ,Xj0 and limn→∞ n1 I(X 0 ; G) = minq∈[0,∞) Ψ(q, λ) . (1)
absent with the complementary probability, where MXi0 ,Xj0 are
the four possible matrix elements of We prove that it is correct, namely:
Theorem 1 (Upper bound). For the SBM under concern,

dn an bn
M≡ .
n bn cn
lim supn→∞ n1 I(X 0 ; G) ≤ minq∈[0,∞) Ψ(q, λ) .
0
The problem is to recover the labels X from the graph G.
Let us discuss the meaning of the various parameters in this Theorem 2 (Lower bound). For the SBM under concern,
matrix and their relationships. Here dn is the average degree
lim inf n→∞ n1 I(X 0 ; G) ≥ minq∈[0,∞) Ψ(q, λ) .
of a node. In order to have a non-trivial inference problem we
demand that no information about the labels stems from the Remark 1: As it should be clear from the proof, the
nodes degrees. This imposes the restriction minimization in (1) can be restricted to q ∈ [0, λ(1 − r)/r]
E[deg(i)|Xi0 = 1] = E[deg(i)|Xi0 = −1] = (n − 1)dn /n . without changing the result.
Remark 2: From (1) one can derive the information
√ theoretic
Solving this equation imposes an = 1 − (1 − 1/r)(1 − bn ) and phase transition thresholds. Let r∗ ≡ (1 − 1/ 3)/2. For r ∈
cn = 1 − (1 − bn )/(1 − 1/r). Therefore there are only two [r∗ , 1/2] there is a continuous phase transition at λc = 1 while
independent parameters, namely dn and bn . A more convenient for r ∈ ]0, r∗ [ the phase transition becomes discontinuous. An
re-parametrization is often used [9]: p̄n ≡ dn /n and ∆n ≡ information theoretic-to-algorithmic gap occurs in the second
dn (1−bn )/n. This model is called the dense asymmetric SBM situation as discussed in detail in [10].
(the symmetric model corresponding to r = 1/2). These theorems were first obtained for the symmetric case
In this paper we rigorously determine the asymptotic mutual r = 1/2 in [9] by a mapping of the model on a rank-one
information for this problem limn→∞ n−1 I(X 0 ; G) in the matrix estimation problem via an application of Lindeberg’s
dense graph regime wherein p̄n and ∆n satisfy: theorem. The asymmetric case was treated in [10], again
n→∞
(h1) the average degree dn ≡ np̄n −−−−→ ∞, and dn grows based on this mapping. Here we propose a self-contained
sub-linearly in n. direct method (i.e., we do not map the model onto another
n→∞
(h2) λn ≡ n∆2n /p̄n −−−−→ λ finite. one) using the adaptive interpolation method [7]. A technical
The first condition ensures that the graph is dense while the limitation of interpolation methods has often been the need to
second ensures the mutual information has a well defined use Gaussian integration by parts. We by-pass this limitation
non-trivial limit when n → +∞. The reader may wish to using an (approximate) integration by parts formula for the
keep in mind a simple example of sequences p̄n = λn1−2θ , edge binary variables Gij ∈ {0, 1}.
Before we formulate the adaptive interpolation let us set up ft, ≡ EX EG|X EY |X Ft, = EX EG|X EZ Ft, its expectation.
more explicitly the quantities that we compute. Using Bayes’ One can recognize that, by design,
rule, the posterior distribution of the SBM can be written as (
Qn f0,0 = n1 I(X; G) − λ4n + on (1) ,
P(x|G) = Z(G)−1 e−HSBM (x;G) i=1 Pr (xi ) , f1, = Ψ(R(1, ), λn ) − λn
− R(1,)2
.
P 4 4λn
HSBM (x; G) ≡ − i<j Gij ln(1 + xi xj ∆n /p̄n ) R1 dft,
Therefore, from f0,0 = (f0,0 − f0, ) + f1, − dt dt ,
+ (1 − Gij ) ln(1 − xi xj ∆n /(1 − p̄n )) . 0
1
n I(X; G) = (f0,0 − f0, ) + Ψ(R(1, ), λn )
We use the statistical mechanics terminology and therefore 2 R 1 dft,
call this posterior distribution
P the Gibbs distribution,
Qn the nor- − R(1,)
4λn − 0 dt dt + on (1)
−HSBM (x;G)
malizing factor Z(G) ≡ x∈X n e i=1 Pr (xi )
where on (1) → 0 uniformly in t, and R. In section IV we
is the partition function, and HSBM is the Hamiltonian. A
outline the non-trivial steps involved in the computation of
straightforward computation, using the scaling hypotheses (h1)
dft, /dt. This eventually leads to the following sum rule:
and (h2), gives (of course I(X 0 ; G) = I(X; G))
1
R 1 R2 (t)
1
= − n1 EX EG|X ln Z(G) + λn n I(X; G) = Ψ(R(1, ), λn ) + R1 − 0 dt 4λn − R3 (4)
n I(X; G) 4 + on (1)
Pn
using EPr [X 2 ] = 1, and where limn→∞ on (1) = 0. where, denoting Q(X, x) = Q ≡ n1 i=1 Xi xi the overlap,
Thus the problem boils down to compute minus the expected R1 R1

1 2 2
R1 ≡ 4λn { 0 dt q(t, ) − ( 0 dt q(t, )) } ≥ 0 ,

log-partition function, or expected free energy, in the limit 2
R2 ≡ Eh(λn Q − q(t, )) it, ≥ 0 ,
n → +∞. This will be achieved via an interpolation towards R
R3 ≡ 4λn (2R(1, ) − ) − 21 0 d0 EhQi0,0 + on (1) .


the log-partition function corresponding to an uncoupled scalar
Gaussian channel where observations of the hidden labels are Here and later on the joint expectation w.r.t. all quenched
√ variables, i.e. X, G and Y (or equivalently Z), is denoted
Yi = q Xi + Zi , 1 ≤ i ≤ n, (2)
by E. As will eventually tend to zero, R3 will be small. The
with Zi ∼ N (0, 1) i.i.d. Gaussian random variables and q > 0 term R2 on the other hand is more challenging.
the signal-to-noise ratio (SNR). An important feature of our
technique is the freedom to adapt a suitable interpolation path A. The upper bound: Proof of Theorem 1
to the problem at hand. This is explained in the next section. Set = 0 and q(t, ) = q ∗ ≡ argminq∈[0,∞) Ψ(q, λ). Then
we have R1 = 0, R3 = on (1). Since R2 ≥ 0, (4) implies
III. A DAPTIVE PATH INTERPOLATION
1
n I(X; G) ≤ Ψ(q ∗ , λn ) + on (1) .
We design an interpolating model parametrized by t ∈ [0, 1]
and ≥ 0 s.t. at t = = 0 we recover the original SBM while Taking the lim sup yields the bound (recalling λn → λ).
at t = 1 we have a decoupled channel (2). In between, for
t ∈ (0, B. The lower bound: Proof of Theorem 2
√ 1), the model is a mixture of the SBM with parameters
(p̄n , 1 − t ∆n ) and the extra decoupled Gaussian R t observa- The basic idea is to “remove” R2 from (4) by adapting
tions (2) with SNR replaced by q → R(t, ) ≡ + 0 ds q(s, ) q(t, ), then taking the limit n → ∞ and → 0+ will provide
with q(s, ) ≥ 0. We constrain ∈ [sn , 2sn ] where sn → 0+ the desired bound since R1 ≥ 0 and R3 → 0 will disappear.
as n → +∞ at an appropriate rate to be fixed later on. To implement this idea we first decompose R2 into
The interpolating Hamiltonian is then defined to be
R2 = (λn EhQit, − q(t, ))2 + λ2n Eh(Q − EhQit, )2 it, (5)
Ht, (x; G, Y ) ≡ HSBM;t (x; G) + Hdec;t, (x; Y )
and address each part with the following two lemmas.
where
 P √ Lemma 3. For every ∈ [0, 1] and t ∈ [0, R t 1] there exists a
HSBM;t (x; G) ≡ − i<j Gij ln(1 √
 + xi xj 1 − t ∆n /p̄n ) (unique) bounded solution Rn∗ (t, ) = + 0 ds qn∗ (s, ) to

+(1 − Gij ) ln(1 + xi xj 1 − t ∆n /(1 − p̄n )) dR
 Pn p dt (t, ) = λn EhQit, with R(0, ) = . (6)
Hdec;t, (x; Y ) ≡ − i=1 ( R(t, )Yi xi − R(t, )x2i /2)

∗
dRn
It satisfies qn∗ (t, ) = λn EhQit, ∈ [0, λn (1−r)
r ], d (t, ) ≥ 1.
and the Gibbs-bracket (i.e. the expectation w.r.t. the posterior
distribution) for the interpolating model is Proof. Let Fn (t, R(t, )) ≡ λn EhQit, . Equation (6) is thus a
Qn first-order differential equation. Also note that
hAit, ≡ Zt, (G, Y )−1 x∈X n A(x)e−Ht, (x;G,Y ) i Pr (xi )
P
dFn λn Pn 2
Qn dR (t, R(t, )) = n i,j=1 E[(hxi xj it, − hxi it, hxj it, ) ]
with Zt, (G, Y ) ≡ x∈X n e−Ht, (x;G,Y ) i=1 Pr (xi ). The
P
free energy for a given graph G = G(X) (that depends on where dFn /dR is the derivative w.r.t. its second argument.
the ground truth partition) is Therefore Fn is bounded in [0, λn (1 − r)/r] and for finite
n it is differentiable w.r.t. its second argument with bounded
Ft, (G, Y ) = Ft, ≡ − n1 ln Zt, (G, Y ) , (3) derivative. The Cauchy-Lipschitz theorem then implies that
(6) admits a unique global solution over t ∈ [0, 1]. Finally Lemma 5. We have dft, /dt = D1 + D2 + D3 where
Liouville’s formula gives1 
λn 2 1 ∆n
∗ R t 0 dFn 0 ∗ 0 D1 = 4 EhQ it, + O( n ) + O( p̄n ) + O(p̄n ) ,

dRn 1
d (t, ) = exp 0 dt dR (t , Rn (t , )) . (7) D2 = − 2 q(t, )EhQit, ,

D3 = 0 .

The non-negativity of dFn /dR then implies dRn∗ /d ≥ 1.
The proof of the equality for D1 requires an approximate
The following crucial lemma has been shown for various integration by parts formula w.r.t. the Bernoulli variables Gij .
Bayesian inference problems [7, 16]. The usual proofs can be This forms the main new technical aspect of this work and
adapted to the present case and we refer to [25] for details. is explained in section IV-C. The other identities are shown
Lemma 4. Let R be the solution Rn∗ in Lemma 3. Then for by the standard techniques of the interpolation method for
any bounded positive sequence sn there exists a sequence Bayesian inference and we will be more brief. These are i)
Cn (r, λn ) > 0 converging to a constant and such that Gaussian integration by parts, namely E[Zg(Z)] = E g 0 (Z)
R 2sn for Z ∼ N (0, 1) and any reasonable function g; and ii) the
1 2 Cn (r,λn )
sn sn d Eh(Q − EhQit, ) it, ≤ sn n1/6 . Nishimori identity2 .
Now we average (4) over a small interval ∈ [sn , 2sn ] B. Computation of D3 and D2
and set R to the solution Rn∗ of (6) in Lemma 3; therefore Let us start with the first term of D3 :
qn∗ (t, ) = λn EhQit, . This choice cancels the first term of Pn
q(t,) q(t,) 2
1

d 1 √

R2 in the decomposition (5). The second term in (5) is then n E dt Hdec;t, t, = n i E 2 R(t,) Yi xi − 2 xi t,
upper bounded using Lemma 4. Finally R1 ≥ 0. Combining Pn
q(t,) p
= n1 i E √ [ R(t, ) Xi + Zi ]Xi − q(t,) 2

all these observations we reach directly 2 R(t,) 2 Xi t, = 0
1 1 2sn
R ∗ Cn λn /4 where for the second equality we first replaced xi by the
n I(X; G) ≥ sn sn d[Ψ(Rn (1, ), λn )−R3 ]− sn n1/6 (8)
groundptruth Xi thanks to the Nishimori identity, then replaced
where we used Fubini’s theorem to switch the t and integrals Yi = R(t, ) Xi + Zi , and finally used the independence of
when using Lemma 4. Using that qn∗ takes values in [0, λn (1− Xi and the zero mean noise Zi . The second term of D3 is
r)/r] and ∈ [sn , 2sn ], R3 has a bound uniform in : obtained by similar manipulations, and using the identity
√
s2n EGij |Xi Xj Gij = p̄n + 1 − t ∆n Xi Xj .
|R3 | ≤ λn + sn 1−r
r + on (1) .
d
R
Therefore the average over of R3 above is on (1) when sn → Now, D2 = EX EG|X dY Ft, dt Pt (Y |X) is equal to
0+ . As Rn∗ (1, ) ∈ [sn , 2sn + λn (1 − r)/r] ⊂ [0, ∞) we have Pn p
R(t, )Xi ) q(t,)X
√
i=1 EX EG|X EY |X [(Yi −
i
Ft, ]
2 R(t,)
1
R 2sn ∗
sn sn d Ψ(Rn (1, ), λn ) ≥ minq∈[0,∞) Ψ(q, λn ) .
q(t,) Pn
= √ i EX EG|X EZ [Zi Xi Ft, ]
2 R(t,)
Setting sn = n−θ with θ ∈ (0, 1/6) ensures the extra terms on = √ q(t,) P n dFt,
2 R(t,) i EX EG|X EZ [Xi dZi ]
the r.h.s. of (8) are small. Then taking the lim inf and using
Pn
λn → λ we finally reach the desired bound. = − q(t,)
2 n
1 1
i E[Xi hxi it, ] = − 2 q(t, )EhQit, .
IV. T HE FUNDAMENTAL SUM RULE C. Computation of D1 by approximate integration by parts

A. Decomposing the sum rule The proof involves a technical calculation. First, evaluating
d
the time-derivative dt Pt (G|X) it is straightforward to show
Here we sketch the derivation of the fundamental sum ∆
that D1 = 2√1−tn (a) (b)
(D1 + D1 ) with
rule (4). The derivative of the averaged free energy can be
decomposed into three terms: dft, /dt = D1 + D2 + D3 where (a) P EG |X Ft, −EGij |X [Gij Ft, ]
D1 ≡ E∼Gij Xi Xj ij 1−EG |X
i<j Gij ,
 ij
d
P
D1 ≡ EX EY |X R G Ft, dt Pt (G|X) ,

D1
(b) P EG |X [Gij Ft, ]
≡ − i<j E∼Gij Xi Xj EijG |X Gij

,
d
D2 ≡ EX EG|X dY Ft, dt Pt (Y |X) , ij
D ≡ 1 E d H

1

d
where E∼Gij ≡ EX EY |X EG\Gij |X . Both of these formulas
3 n dt dec;t, t, + n E dt HSBM;t t, .
involve the term EGij |X [Gij Ft, ] which is evaluated by an
Here Pt (G|X) and Pt (Y |X) are the transition kernels for the approximate integration by parts formula for the Bernoulli
“channels” X → G and X → Y at time t ∈ [0, 1]. The sum variables Gij . This is formula (12) given in appendix.
rule (4) is easily obtained using the following lemma:
2 The Nishimori identity is a direct consequence of the Bayes formula. It
states Ehg(x, X)it, = Ehg(x, x0 )it, with X the ground truth and x, x0
1 Liouville’s formula is obtained as follows. Differentiating both sides of
are two replicas drawn independently from the product posterior measure. In
dFn
d dR
(6) w.r.t. yields dt d
= dRd dR
. Therefore we have dt d
ln dR
d
= dFdR
n
Ehg(x, x0 )it, the bracket is the expectation w.r.t. this joint product measure.
and solving this differential equation using R(0, ) = gives (7). The function g can also explicitly depend on G and Y (but not on Z).
(b) n∆2n
• Evaluating D1 : Using (12) and λn = p̄n = O(1) we get ACKNOWLEDGMENTS
√ Jean Barbier acknowledges support from the Fondation
(b) ∆n 1−t P
D1 = np̄n i<j E[Xi Xj hxi xj it, ] CFM pour la Recherche-ENS.
√
P 1−t
− i<j E∼Gij [Xi Xj Ft, (Gij = 0)] + O( p̄n ) . R EFERENCES
(a) [1] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels:
• Evaluating D1 :
Using again (12) this term becomes
First steps,” Social Networks, vol. 5, no. 2, pp. 109–137, 1983.
(a) P EG |X Xj Ft, −Ft, (Gij =0) [2] T. N. Bui, S. Chaudhuri, F. T. Leighton, and M. Sipser, “Graph bisection
D1 = i<j E∼Gij Xi Xj ij i1−E Gij |X Gij
(9) algorithms with good average case behavior,” in 25th Annual Symposium
√ EGij |X Gij FOCS, 1984, pp. 181–192.
+ ∆nnp̄n1−t i<j E 1−EG
P
[3] B. Söderberg, “General formalism for inhomogeneous random graphs,”
G Xi Xj hxi xj it,
ij |X ij Phys. Rev. E, vol. 66, p. 066121, 2002.
P √ [4] B. Bollobás, S. Janson, and O. Riordan, “The phase transition in
+ i<j E∼Gij [Xi Xj Ft, (Gij = 0)] + O( 1 − t) .
inhomogeneous random graphs,” Random Struct. Algorithms, 2007.
[5] S. Fortunato, “Community detection in graphs,” Physics Reports, vol.
The difference of free energy Ft, (Gij = 1) − Ft, (Gij = 0) 486, no. 3, pp. 75 – 174, 2010.
when changing only a single Gij is equal to [6] E. Abbe, “Community detection and stochastic block models: Recent
∆n
√ developments,” Journal of Machine Learning Research, vol. 18, 2018.
1 D 1 + p̄n 1 − t xi xj E [7] J. Barbier and N. Macris, “The adaptive interpolation method: a simple
− ln ∆n
√ (10) scheme to prove replica formulas in bayesian inference,” Probability
n 1 − 1− p̄n 1 − t xi xj
t,;Gij =0 Theory and Related Fields, Oct 2018.
[8] T. Lesieur, F. Krzakala, and L. Zdeborová, “Phase transitions in sparse
where h−it,;Gij =0 means that the Gibbs-bracket corresponds pca,” in 2015 IEEE ISIT, June 2015, pp. 1635–1639.
to the Hamiltonian Ht, in which we set Gij = 0. Therefore [9] Y. Deshpande, E. Abbe, and A. Montanari, “Asymptotic mutual infor-
mation for the balanced binary stochastic block model,” Information and
Inference: A Journal of the IMA, vol. 6, no. 2, pp. 125–170, 2017.
EGij |Xi Xj Ft, − Ft, (Gij = 0) [10] M. Lelarge and L. Miolane, “Fundamental limits of symmetric low-rank
= Pt (Gij = 1|Xi Xj )(Ft, (Gij = 1) − Ft, (Gij = 0)) matrix estimation,” Probability Theory and Related Fields, Apr 2018.
√ [11] M. Aizenman, R. Sims, and S. L. Starr, “Extended variational principle
1−t ∆n
= O( n ) for the Sherrington-Kirkpatrick spin-glass model,” Physical Review B,
√ vol. 68, no. 21, p. 214403, 2003.
using (10) and Pt (Gij = 1|Xi Xj ) = p̄n + √ 1 − t ∆n Xi Xj . [12] S. B. Korada and N. Macris, “Exact solution of the gauge symmetric
p-spin glass model on a complete graph,” Journal of Statistical Physics,
Therefore
√ the first term of (9) is of order O( 1 − t n∆n ) = vol. 136, no. 2, pp. 205–230, 2009.
O( 1 − t p̄n /∆n√ ). In the same way we get that the second [13] J. Barbier, M. Dia, N. Macris, F. Krzakala, T. Lesieur, and L. Zdeborová,
term of (9) is O( 1 − t p̄n /∆n ) as well. Also, by the scaling “Mutual information for symmetric rank-one matrix estimation: A proof
of the replica formula,” in Advances in NeurIPS 29, 2016, pp. 424–432.
hypotheses (h1), (h2) we have 1 p̄n ∆n . These remarks [14] J. Barbier, M. Dia, N. Macris, F. Krzakala, and L. Zdeborová,
imply that (9) is simply “Rank-one matrix estimation: analysis of algorithmic and information
√ theoretic limits by the spatial coupling method.” [Online]. Available:
(a)
D1 = i<j E∼Gij [Xi Xj Ft, (Gij = 0)] + O( 1−t p̄n
P
). http://arxiv.org/abs/1812.02537
∆n
[15] J. Barbier, N. Macris, and L. Miolane, “The Layered Structure of
(a) (b) Tensor Estimation and its Mutual Information,” in 55th Annual Allerton
Replacing these results in D1 = 2√∆1−t
n
(D1 + D1 ) and Conference on Communication, Control, and Computing, 2017.
2 1
P n [16] J. Barbier, F. Krzakala, N. Macris, L. Miolane, and L. Zdeborová,
using λn ≡ n∆n /p̄n and Q ≡ n i=1 Xi xi ends the proof.
“Optimal errors and phase transitions in high-dimensional generalized
linear models,” Proc. of the 31st COLT, vol. 75, pp. 728–731, 2018.
A PPENDIX [17] M. Talagrand, Spin Glasses: A Challenge for Mathematicians - Cavity
The following general formula follows from Taylor expan- and Mean Field Models. Cambridge University Press, 2004.
[18] P. Carmona and Y. Hu, “Universality in Sherrington-Kirkpatrick’s spin
sion with Lagrange remainder. We refer to [25] for the details. glass model,” Annales de l’Institut Henri Poincare (B) Probability and
Statistics, vol. 42, no. 2, pp. 215 – 222, 2006.
Lemma 6 (Approximate integration by parts). Let g(U ) be a [19] A. Coja-Oghlan, F. Krzakala, W. Perkins, and L. Zdeborova,
C 4 function of a random variable U such that for k = 1, 2, 3, 4 “Information-theoretic thresholds from the cavity method,” Advances in
k
we have supU | d dU g(U )
k | ≤ Ck for constants Ck ≥ 0. Suppose
Mathematics, vol. 333, pp. 694–795, 2018.
[20] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, “Asymptotic
that the first four moments of U are finite. Then analysis of the stochastic block model for modular networks and its
algorithmic applications,” Phys. Rev. E, vol. 84, no. 6, p. 066106, 2011.
E[U g(U )] − g(0)E[U ] − E[g 0 (U )]E[U 2 ] (11) [21] E. Mossel, J. Neeman, and A. Sly, “A proof of the block model
1 3 2
threshold conjecture,” Combinatorica, vol. 38, no. 3, pp. 665–708, Jun
≤ C2 2 |E[U ]| + E[U ]E[U ] 2018. [Online]. Available: https://doi.org/10.1007/s00493-016-3238-8
1
E[U 4 ] + 12 E[U 2 ]2 + C64 |E[U 3 ]|E[U 2 ] .

+ C3 24 [22] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová,
and P. Zhang, “Spectral redemption: clustering sparse networks,” PNAS,
vol. 110 52, pp. 20 935–40, 2013.
In order to apply this lemma to the SBM, consider U = Gij [23] C. Bordenave, M. Lelarge, and L. Massoulié, “Non-backtracking spec-
and g(U ) = Ft, (Gij ) the free energy (3) seen as a function trum of random graphs: Community detection and non-regular ramanu-
of Gij (all other variables being fixed). For the expectation we jan graphs,” 56th Annual Symposium FOCS, pp. 1347–1357, 2015.
[24] J. Barbier, C. L. Chan, and N. Macris, “Adaptive path interpolation for
take E = EGij |Xi Xj . Replacing in (11) simple algebra yields sparse systems: Application to a simple censored block model,” in IEEE
ISIT, 2018, pp. 1879–1883.
EGij |Xi Xj [Gij Ft, ] = EGij |Xi Xj Gij Ft, (Gij = 0) [25] ——. [Online]. Available: https://drive.google.com/file/d/
√ √ 1MFXck4KWiWnzS1dffl7WfGtBoWDxbt_B
− ∆nnp̄n1−t EGij |Xi Xj hxi xj it, + O( λnn2 1 − t) . (12)

Mutual Information For The Stochastic Block Model by The Adaptive Interpolation Method

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mutual Information For The Stochastic Block Model by The Adaptive Interpolation Method

Uploaded by

Copyright:

Available Formats

Mutual Information for the Stochastic Block Model

by the Adaptive Interpolation Method

∗ Laboratoire de Physique de l’École Normale Supérieure, Paris, France.

IV. T HE FUNDAMENTAL SUM RULE C. Computation of D1 by approximate integration by parts

You might also like