Professional Documents
Culture Documents
universal, reaching down to even the smallest brains. Here, we measure the activity of 50+ neurons
in C. elegans, and analyze the data by building the maximum entropy model that matches the
mean activity and pairwise correlations among these neurons. To capture the graded nature of the
cells’ responses, we assign each cell multiple states. These models, which are equivalent to a family
of Potts glasses, successfully predict higher statistical structure in the network. In addition, these
models exhibit signatures of collective behavior: the state of single cells can be predicted from the
state of the rest of the network; the network, despite being sparse in a way similar to the structural
connectome, distributes its response globally when locally perturbed; the distribution over network
states has multiple local maxima, as in models for memory; and the parameters that describe the
real network are close to a critical surface in this family of models.
I 1
40× GCaMP
0.5
0 100 200 300 400 500
(b) time (s)
fluorescence intensity per area (a.u.)
1000
f 1
800
0.5
600 0 100 200 300 400 500
(e) time (s)
400 0.05
.
200 f 0
0 -0.05
0 100 200 300 400 500 0 100 200 300 400 500
time (s) time (s)
FIG. 1: Schematics of data acquisition and processing. (a) Examples of the raw images acquired through the 10× (scale bar
equals 100µm) and 40× (scale bar equals 10µm) objectives. The body of the nematode is outlined with light green curves.
(b) The intensity of the nuclei-localized fluorescent protein tags—the calcium-sensitive GCaMP and the control fluorophore
RFP—are measured as functions of time. Photobleaching occurs on a longer time scale than the intracellular calcium dynamics,
which allows us to perform photobleaching correction by dividing the raw signal with its exponential fit, resulting in the signals
of panel (c). (d) The normalized ratio of the photobleaching-corrected intensity, f , is a proxy for the calcium concentration in
each neuron nuclei (dark grey). As described in the text, this signal is discretized using the denoised time derivative f˙; we use
three states, marked as red, blue, and black after smoothing (lightly offset for ease of visualization). (e) The time derivative f˙,
extracted using total-variation regularized differentiation.
experiments in C. elegans using the maximum entropy the electrical activity of the cell and in many cases is
methods that have been so successful in other contexts. the proximal signal for transmission across the synapses
Experiments are evolving constantly, and in particular we to other cells [32]. The second protein, RFP, fluoresces
expect that recording times will increase significantly in in the red and serves as a position indicator of the nu-
the near future. To give ourselves the best chance of say- clei as well as a control for changes in the visibility of
ing something meaningful, we focus on sub–populations the nuclei during the course of the experiment. Parallel
of up to fifty neurons, in immobilized worms where sig- control experiments were done on worms engineered to
nals are most reliable. We find that, while details dif- express GFP and RFP, neither of which should be sen-
fer, the same sorts of models, which match mean activity sitive to electrical activity. Although our ultimate goal
and pairwise correlations, are successful in describing this is to understand neural dynamics in the freely moving
very different network. In particular, the models that we animal, as a first step we study worms that are immobi-
learn from the data share topological similarity with the lized with polystyrene beads, to reduce motion-induced
known structural connectome, allow us to predict the ac- artifacts [33].
tivity of individual cells from the state of the rest of the As described in Ref. [7], the fluorescence is excited us-
network, and seem to be near a critical surface in their ing lasers. A spinning disk confocal microscope and a
parameter space. high-speed, high-sensitivity Scientific CMOS (sCMOS)
camera records red- and green-channel fluorescent image
of the head of the worm at a rate of 6 brain-volumes
II. DATA ACQUISITION AND PROCESSING per second at a magnification of 40×; a second imag-
ing path records the position and posture of the worm
Following methods described previously [7, 8], nema- at a magnification of 10×, which are used in the track-
todes Caenorhabditis elegans were genetically engineered ing of the neurons across different time frames. The raw
to expressed two fluorescent proteins in all of their neu- data thus are essentially movies, and by using a custom
rons, with tags that cause them to be localized to the machine-learning approach—Neuron Registration Vector
nuclei of these cells. One of these proteins, GCaMP6s, Encoding [8]—we are able to reduce the data to the green
fluoresces in the green with an intensity that depends and red intensities for each neuron i, Iig (t) and Iir (t).
on the surrounding calcium concentration, which follows As indicated in Fig. 1b, the fluorescence intensity un-
3
10 2 10
20 20
20 rise
30 1.5 30
1.5
Neuron
40 40 40 fall
ID, i
i 50 50 flat
1
60 1
60 60
70 80 70
0.5
80 100 200 300 400 500 80
0.5
100 200 300 400 500 100 200 300 400 500
t (sec) t (sec)
FIG. 3: Discretization of the empirically observed fluorescence signals. (a) Heatmap of the normalized fluorescence ratio between
photobleaching-corrected GCaMP fluorescence intensity and RFP fluorescence intensity, f , for each neuron as a function of
time. (b) Heatmap of the neuronal activity after discretization based on time derivatives of f . Green corresponds to a state of
“rising”, red “falling”, and white “flat”.
of the trace in Fig. 1d. After the smooth derivative u solution to this problem is
is estimated, we discretized the smooth estimate of the " #
signal, Au, into three states of “rise,” “fall,” and “flat,” 1 X
P ({σi }) = exp − λµ Oµ ({σi }) , (4)
depending on whether the derivative u exceeds a constant Z µ
multiple of σn /τf , the expected standard deviation of the
smooth derivative extracted from a pure white noise. The where coupling constant λµ must be set to satisfy and
constant is chosen to be σn /τf = 5, such that the GFP Eq. (3), and the partition function Z as usual enforces
control worm has almost all pairwise mutual information normalization.
being zero after going through the same data processing Following the original application of maximum entropy
pipeline. An example of the raw fluorescence and final methods to neural activity [11], we choose as observables
discretized signals is shown in Fig. 3. the mean activity of each cell, and the correlations be-
tween pairs of cells. With neural activity described by
three states, “correlations” could mean a whole matrix
III. MAXIMUM ENTROPY MODEL or tensor of joint probabilities for two cells to be in par-
ticular states. We will see that models which match this
After preprocessing, the state of each neuron is de- tensor have too many parameters to be inferred reliably
scribed by a Potts variable σi , and the state of the entire from the data sets we have available, and so we take a
network is {σi }. As in previous work on a wide range of simpler view in which “correlation” measures the proba-
biological systems [11, 14, 17, 36–38], we use a maximum bility that two neurons are in the same state. Equation
entropy approach to generate relatively simple approxi- (4) then becomes
mations to the distribution of states, P ({σi }), and then 1 −H(σ)
ask how accurate these models are in making predictions P (σ) = e , (5)
Z
about higher order structure in the network activity.
The maximum entropy approach begins by choosing with the effective Hamiltonian
some set of observables, Oµ ({σi }), over the states of the p−1
1X XX
system, and we insist that any model we write down for H(σ) = − Jij δσi σj − hri δσi r . (6)
P ({σi }) match the expectation values for these observ- 2 i r=1
i6=j
ables that we find in the data,
The number of states p = 3, corresponding to “rise,”
“fall,” and “flat” as defined above. The parameters are
X
P ({σi })Oµ ({σi }) = hOµ ({σi })iexpt . (3)
{σi }
the pairwise interaction Jij and the local fields hri , and
these must be set to match the experimental values of
Among the infinitely many distributions consistent with the correlations
these constraints, we choose the one that has the largest T
possible entropy, and hence no structure beyond what is 1X
cij ≡ hδσi σj i = δσ (t)σ (t) , (7)
needed to satisfy the constraints in Eq. (3). The formal T t=1 i j
5
ltest -l train
0.5
Cij
10 0.5 10 1 0
Neuron ID
Neuron ID
10 1
10
20 20
Neuron ID
Neuron ID
20
i 30 20
0
0 i
30 0
0 -0.1
30 30
40 40 40 40 -1-1
0.1
50 -0.5 50
50 -0.5 50
ltest -l train
10 20 30 40 50
10 20 Neuron
30 40 ID 50
10 20 30 40 50
10 20Neuron
30 40
N
ID 50 0
j ID
Neuron j
(c) (d) Neuron ID
1 5
m ri , data
hri , model
rise fall flat
h ir
-0.1
mir 0.5 0
0 -5
10 20 30 40 50
0 10 20 30 40 50 0 10 20 30 40 50
i
Neuron ID i N
(e) (f) Neuron ID
1 1
FIG. 5: Top: No signs of overfitting are observed for mod-
cij, m ir ,
els of up to N = 50 neurons, measured by the difference of
m ri , data
data data
0.5 0.5 imum entropy model for training sets consists of 5/6 of the
data and test sets. Clusters around N = 10, 15, 20, . . . , 50
represent randomly chosen subgroups of N neurons. Error
bars are the standard deviation across 10 random partitions
0 0 of training and test samples. The dashed lines show the ex-
0 0.5 1 0 0.5 1 pected per-neuron log-likelihood difference and its standard
ccij,, model
model mrir, model
ij m i , model deviation calculated through perturbation methods (see Ap-
pendix A). Bottom: The difference between log likelihood of
FIG. 4: Model construction: learning the maximum entropy the training data and of the test data is greater than 0 (the
model from data. (a) Connected pairwise correlation ma- red line) within error bars for maximum entropy models on
trix, Cij , measured for a subgroup of 50 neurons. (b) The N = 10, 20, . . . , 50 neurons with pairwise correlation tensor
inferred interaction matrix, Jij . (c) Probability of neuron i constraint (see Appendix B), which suggests that this model
in state r, for the same group of 50 neurons as panel (a). does not generalize well.
(d) The inferred local field, hri . (e) Model reproduces pair-
wise correlation (unconnected) within variation throughout
the experiment. Error bars are extrapolated from bootstrap- timate from variations across random halves of the data.
ping random halves of the data. (f) Same as panel (e), but This training procedure leaves part of the interaction ma-
for mean neuron activity mri . trix Jij zero, while the model is able to reproduce the
magnetization mri and the pairwise correlation cij within
the experimental errors (Fig. 4).
and the magnetizations
Because of the large temporal correlation in the data,
1X
T the number of independent data in the recording is small
mri ≡ hδσi r i = δσ (t)r . (8) compared to the number of parameters. This makes us
T t=1 i
worry about overfitting, which we test by randomly se-
lecting 5/6 of the data as training set, inferring the max-
Note that the local field for the “flat” state, hpi , is set to
imum entropy model from this training set, and then
zero by convention. In addition, the interaction Jij can
comparing the log-likelihood of both the training data
be non-zero for any pairs of neurons i and j regardless
and the test data with respect to the maximum entropy
of the positions of the neurons (both physical and in the
model. No signs of overfitting are found for subgroups of
structural connectome), i.e. the equivalent Potts model
up to N = 50 neurons, as indicated by that fact that the
does not have a pre-defined spatial structure.
difference of the log-likelihood is zero within error bars
The model parameters are learned using coordinate
(Fig. 5; details in Appendix A). This is not true if we
descent and Markov chain Monte Carlo (MCMC) sam-
try to match the full tensor correlations (Appendix B),
pling [39–41]. In particular, we initialize all parame-
which is why we restrict ourselves to the simpler model.
ters at zero. For each optimization step, we calculate
the model prediction cij and mri by alternating between
MCMC sampling with 104 MC sweeps and histogram
sampling to speed up the estimation. Then, we choose a IV. DOES THE MODEL WORK?
single parameter from the set of parameters {Jij , hri } to
update, such that the increase of likelihood of the data is The maximum entropy model has many appealing fea-
maximized [39]. We repeat the observable estimation and tures, not least its mathematical equivalence to statisti-
parameter update steps until the model reproduces the cal physics problems for which we have some intuition.
constraints within the experimental errors, which we es- But this does not mean that this model gives an accu-
6
rate description of the real network. Here we test several (a) (b) 10
10
-3
in constructing our model, the first nontrivial test is to (c) 0.2 (d) 0.02
predict correlations among triplets of neurons,
0.1
0.015
p
C rs
X
0
Cijk = h(δσi r − hδσi r i)(δσj r − hδσj r i)(δσk r − hδσk r i)i . C rs
ij, data
ij
0.01
r=1 -0.1
(9)
0.005
More subtly, since we used only the probability of two -0.2
-0.2 rs 0 0.2 -0.1 0 0.1
neurons being in the same state, we can try to predict C ij , model C rs , model
ij
the full matrix of pairwise correlations,
rs FIG. 6: Model validation: The model predicts unconstrained
Cij ≡ hδσi r δσj s i − hδσi r ihδσj s i ; (10) higher order correlations of the data. Panel (a) shows the
comparison between model prediction and data for the con-
note that the trace of this matrix is what we used in nected three-point correlation Cijk for a representative group
building the model. Scatter plots of observed vs pre- of N = 50 neurons. All 19800 possible triplets are plotted
rs
dicted values for Cijk and Cij are shown in Fig. 6a and with the blue dot. Error bars are generated by bootstrapping
c. In parts b and d of that figure we pool the data, com- random halves of the data, and are shown for 20 uniformly
paring the root-mean-square differences between our pre- spaced random triplets in red. Panel (b) shows the error
dictions and mean observations (model error) with errors of three-point function ∆Cijk as a function of the connected
in the measurements themselves. Although not perfect, three-point function Cijk , binned by its value predicted by the
model errors are always within 1.5× the measurement model, Cijk , model. The red curve is the difference between
errors, over the full dynamic range of our predictions. data and model prediction. The blue curve is the standard
error from mean of Cijk over the course of the experiment,
Turning to more global properties of the system, we
extracted by bootstrapping random halves of the experiment.
consider the probability of k neurons being in the same Panels (c, d) are the same as panels (a, b), but for the con-
state, defined as nected two-point correlation tensor Cij rs
.
p
DX E
P (k) ≡ IPN
i=1 δσ =k , (11)
ir extends over a range of ∆E ∼ 20 in energy, correspond-
r=1
ing to predicted probabilities that range over a factor of
where I is the indicator function. It is useful to com- exp(∆E) ∼ 108 .
pute this distribution not just from the data, but also The maximum entropy model gives the probability for
from synthetic data in which we break correlations among the entire network to be in a given state, which means
neurons by shifting each cell’s sequence of states by an that we can also compute the conditional probabilities for
independent random time. We see in Fig. 7a that the the state of one neuron given the state of all the other
real distribution is very different from what we would see neurons in the network. Testing whether we get this right
with independent neurons, so that in particular the tails seems a very direct test of the idea that activity in the
provide a signature of correlations. These data agree very network is collective. This conditional probability can be
well with the distributions predicted by the model. written as
Our model assigns an “energy” to every possible state "p−1 #
of the network [Eq. (6)], which sets the probability of
X
r
P (σi |{σj6=i }) ∝ exp gi δσi r , (12)
that state according to the Boltzmann distribution. Be- r=1
cause our samples are limited, we cannot test whether
the energies of individual states are correct, but we can where the effective fields are combinations of the local
ask whether the distribution of these assigned energies field hri and each cell’s interaction with the rest of the
across the real states taken on by the network agree with network.
what it predicted from the model. Figure 7b compares N
these distributions, shown cumulatively, and we see that
X
gir = hri + Jij (δσj r − δσj p ) . (13)
there is very good overlap between theory and experi- j6=i
ment across ∼ 90% of the density, with the data having
a slightly fatter tail than predicted. The good agreement Then the probabilities for the states of neuron i are set
7
P (σi = r) r
A. Energy landscape
= egi , (14)
P (σi = p)
Maximum entropy models are equivalent to Boltz-
where the last state p is a reference. In Figure 7c and d mann distributions and thus define an energy landscape
we test these predictions. In practice we walk through over the states of the system, as shown schematically in
the data, and at each moment in time, for each cell, we Fig. 8a. In our case, as in other neural systems, the
compute effective fields. We then find all moments where relevant models have interactions with varying signs, al-
the effective field falls into a small bin, and compute the lowing the development of frustration and hence a land-
ratio of probabilities for the states of the one cell, col- scape with multiple local minima. These local minima
lecting the data as shown. The agreement is excellent, are states of high probability, and serve to divide the
except at extreme values of the field which are sampled large space of possible states into basins. It is natural to
only very rarely in the data. We note the agreement ex- ask how many of these basins are supported in subnet-
tends over a dynamic range of roughly two decades in the works of different sizes.
probability ratios.2
To search for energy minima, we performed quenches
from initial conditions corresponding to the states ob-
(a) (b) served in the experiment, as described in [14]. Briefly,
10 0 at each update, we change the state of one neuron such
0.1 data
ind. that the decrease of energy is maximized, and we ter-
probability
pairwise data minate this procedure when no single spin flip will de-
P(k)
(a) (b) 25
basin barrier 20
Ωα Ωβ
15
E - E0
local min.
10
E
α
5
β
metabasin 0
10 -5 10 0
probability density
(c) 10 4 (d) 10
0 (e) 2
10 N = 10
model N = 20
N = 30
N basin 2 frequency N MB
10
data
0
10 -5 0 10
10 0 20 40 10 10
2
10 -1 10
0
10 1
N rank E
FIG. 8: Energy landscape of the inferred maximum entropy model. (a) Schematic of the energy landscape with local minima
α, β and the corresponding basin Ωα , Ωβ . Colored in light blue is the metabasin formed at the given energy threshold, ∆E. (b)
Typical distribution of the value of the energy minima and the barriers of a maximum entropy model on N = 30 neurons. The
global energy minimum, E0 , is subtracted from the energy, E. (c) The number of energy minima increases sub-exponentially
as number of neurons included in the model increases. Error bars are the standard deviation of 10 different subgroups of N
neurons. (d) The rank-frequency plot for frequency of visiting each basin matches well between data and model for a typical
subgroup of 40 neurons. (e) The number of metabasins, grouped according to the energy barrier, diverges when the energy
threshold ∆E approaches 1 from above.
there is a transition at ∆E ≈ 1.2 from single to multiple the system, effectively scaling all terms in the log prob-
metabasins for all N = 10, 20, and 30. Since the dynam- ability up and down uniformly. Concretely, we replace
ics of the real system do not correspond to a simple walk H(σ) → H(σ)/T in Eq (5). We monitor the heat ca-
on the energy landscape (Appendix C and Fig. 12), we pacity of the system, as we would in thermodynamics;
cannot conclude that this is a true dynamical transition, here the natural interpretation is of the heat capacity as
but it does suggest that the state space is organized in being proportional to the variance of the log probability,
ways that are similar to what is seen in systems with such so it measures the dynamic range of probabilities that
transitions. can be represented by the network. Results are shown in
Fig. 9, for randomly chosen subsets of N = 10, 20, ..., 50
neurons. A peak in heat capacity often signals a critical
B. Criticality point, and here we see that the maximum of the heat
capacity approaches the operational temperature T0 = 1
from below as N becomes larger, suggesting that the full
Maximum entropy models define probability distribu- network is near to criticality.
tions that are equivalent to equilibrium statistical physics
problems. As these systems become large, we know that
the parameter space separates into distinct phases, sep-
arated by critical surfaces. In several biological systems C. Network topology
that have been analyzed there are signs that these criti-
cal surfaces are not far from the operating points of the The worm C. elegans is special in part because it is
real networks, although the interpretation of this result the only organism in which we know (essentially) the full
remains controversial [21]. Here we ask simply whether pattern of connectivity among neurons. Our models also
the same pattern emerges in C. elegans. have a “connectome,” since only a small fraction of the
One natural slice through the parameter space of mod- possible pairs of neurons are linked by a nonzero value of
els corresponds to changing the effective temperature of Jij . The current state of our experiments is such that we
9
As shown in Fig. 11, the response of the network to the order statistical structures in the network activity. This
local perturbation is distributed throughout the network parallels what has been seen in systems where the neu-
for both clamping and ablation. However, clamping leads rons generate action potentials, but the C. elegans net-
to much larger DKL s, suggesting that the network is work operates in a very different regime. The success of
more sensitive to clamping, and perhaps robust against pairwise models in this new context adds urgency to the
(limited) ablation. Interestingly, this result echoes the question of when and why these models should work, and
experimental observation that C. elegans locomotion is when we might expect them to fail.
easily disturbed through optogenetic manipulation of sin-
gle neurons [44, 45], while ablation of single neurons has Beyond the fact that the models make successful quan-
limited effect on the worms’ ability to perform different titative predictions, we find other similarities with analy-
patterns of locomotion [46–48], although further experi- ses of vertebrate neural networks. The probability distri-
mental investigation is needed to test our hypotheses on butions that we infer have multiple peaks, corresponding
network response. to a rough energy landscape, and the parameters of these
models appear close to a critical surface. In addition, we
have shown that the inferred model is sparse, and has
topological properties similar to that of the structural
VI. DISCUSSION connectome. Nevertheless, global response is observed
when the modeled network is perturbed locally, in a way
Soon it should be possible to record the activity of the similar to experimental observations.
entire nervous system of C. elegans as it engages in rea-
sonably natural behaviors. As these experiments evolve, With the next generation of experiments, we hope to
we would like to be in a position to ask question about extend our analysis in four ways. First, longer recording
collective phenomena in this small neural network, per- will allow construction of meaningful models for larger
haps discovering aspects of these phenomena which are groups of neurons. If coupled with higher signal–to–noise
shared with larger systems, or even (one might hope) ratios, it should also be possible to make a more refined
universal. We start modestly, guided by the state of the description of the continuous signals relevant to C. el-
data. egans neurons, rather than having to compress our de-
We have built maximum entropy models for groups scription down to a small number of discrete states. Sec-
of up to N = 50 cells, matching the mean activity and ond, registration and identification of the observed neu-
pairwise correlations in these subnetworks. Perhaps our rons will make it possible to compare the anatomical con-
most important result is that these models work, provid- nections between neurons with the pattern of interactions
ing successful quantitative predictions for many higher in our probabilistic models. Being able to identify neu-
rons across multiple worms will also allow us to address
the degree of reproducibility across individuals, and per-
haps extend the effective size of data sets by averaging.
(a) Jij (c) clamping k DKL (e) ablating k DKL
Third, optogenetic tools will allow local perturbation of
0.5
0.5 0.10.1
Clamped neuron ID, k
Clamped neuron ID, k
10
10 1 110
10 0.4 1010
0.4 0.08
0.08 the neural network experimentally, which can be com-
pared directly with the theoretical predictions in §V.D
DKL (P||Pk )
DKL (P||Pk )
20
20 2020 2020 0.06
j 00 k 0.3
0.3 k 0.06
jj
30
30 3030 0.2 3030
0.2 0.04
0.04 above. Finally, improvements in experimental methods
40
40 -1-1 4040
4040 0.1
0.1 0.02
0.02
will enable constructions of maximum entropy models for
50
50 5050 0 0 5050 0 0
2020
i
4040 2020
i ID,ID,i i
4040
i ID,ID,i i
2020 4040 freely moving worms, with which we can map the relation
ii Marginal neuron
(b) (d) Marginal
Marginalneuron
neuron
(f) Marginal neuron
between the collective behavior identified in the neuronal
density
probability density
10 2 10 2 15 15 10 2 10 2 10 4
probability density
probability density
10 0 10 0 10 10 10 2
probability
10 0 10 0
0
10 -2 10 -2 5 5 10
-2
10 -4 10 -4 0 010 -2 10 -2 10
-2 0 2 -2 -0.5 0 0 2 0.5 -0.5 0 0 1 0.5 2 0 0 1 1 2 2
J ij JJijij S DKL, clamping
S DKL D KL,KLablation
DKL D , ablation
Acknowledgement
FIG. 11: Local perturbation of the neural network leads to
global response. (a, b) For a typical group of N = 50 neu-
rons, the inferred interaction matrix J is sparse. Here, the We thank F Beroz, AN Linder, L Meshulam, JP
neuron index i and j are sorted based on mflat
i , as in Fig. 4.
Nguyen, M Scholz, NS Wingreen, and B Xu for many
(c, d) When neuron k is clamped to a constant voltage, the helpful discussions. Work supported in part by the Na-
Kullback-Leibler divergence (in bits) of the marginal distri- tional Science Foundation through the Center for the
bution of states for neuron i is distributed throughout the Physics of Biological Function (PHY–1734030), the Cen-
network. (e, f) When neuron k is ablated, the DKL is also ter for the Science of Information (CCF–0939370), and
distributed throughout the network, but is smaller than in PHY–1607612; and by the Simons Collaboration on the
response to clamping. Global Brain.
11
Author Contributions Now, let us assume that a set of true underlying param-
eters, {g ∗ }, exists for the system we study, which leads to
XC and WB performed the analyses and the simula- a true expectation value be fi∗ = fi (g ∗ ). However, we are
tions. FR and AML designed and carried out the ex- only given finite number of observations, σ 1 , σ 2 , . . . , σ T ,
periments. All authors contributed to the manuscript from which we construct a maximum entropy model, i.e.
preparation. infer the parameters {ĝ} by maximizing the likelihood of
the data. Our hope is that the difference between the
true parameters and the inferred parameters is small, in
Appendix A: Perturbation methods for overfitting which case we can approximate the inferred parameters
analysis using a linear approximation
m T2 m T
" !# " !#
1
X 1 X 0 X 1 X
Ltest − Ltrain = − log Z(ĝ) − ĝi φti − − log Z(ĝ) − ĝi φt
i=1
T2 0 i=1
T1 t=1 i
t =1
! (24)
m T1 T1 T2
!
X
gi∗ −
X 1 X 1 X 1 X 0
= χ
eij φt − fj∗ φti − φti .
i=1 j
T1 t=1 j T1 t=1 T2 0
t =1
For simplicity of notation, let us write ference [Eq. (24)], have expectation values
T1 T2
(1) 1 X (2) 1 X 1
αi = φt − fi∗ , αi = φt − fi∗ .(25) (1)
hαi i = 0 , hαi
((1) (1)
αj i = χij . (26)
T1 t=1 i T2 t=1 i T1
(1) (2)
By the Central Limit Theorem, αi and αi are Gaus- In addition, because we assume that the training data
sian variables. Terms that appear in the likelihood dif- and the test data are independent, the cross-covariance
12
between the training data and the test data is Appendix B: Maximum entropy model with the
pairwise correlation tensor constraint
((1) (2)
hαi αj i = 0. (27)
To fully describe the pairwise correlation between neu-
rons with p = 3 states, the equal-state pairwise correla-
Combining all the above expressions, we obtain the
tion cij = hδσi σj i is not enough; rather, we should con-
expectation value of the likelihood difference [Eq. (24)],
strain the pairwise correlation tensor, defined as
m
DX E crs
ij ≡ hδσi r δσj s i . (31)
(1) (1) (2)
X
hLtest − Ltrain i = gi∗ − eij αj αi − αi
χ
i=1 j Here, we constrain the pairwise correlation tensor crs ij to-
m X
m gether with the local magnetization mri ≡ hδσi r i. No-
(1) (1) tice that for each pair of neurons (i, j), the number of
X
=− eij hαi αj i
χ
i=1 j=1
constraints are p2 + 2p = 15, but these constraints P r are
m X m related through normalization requirements, r mi = 1
1 X
and
P rs
c = m r
, which leads to only 7 independent
=− χ
eij χij s ij i
T1 i=1 j=1
variables for each pair of neurons. Because of this non-
m independence, choosing which variables to constraint is
=− a problem of gauge fixing. Here, we choose the gauge
T1
(28) where we constrain the local magnetization mri for states
“rise” and “fall”, and the pairwise correlations crij ≡ crr ij ;
in this gauge the parameters can be compared meaning-
Note that the difference of likelihood is only related to
fully to the equal-state maximum entropy model above.
the number of parameters in our model and the number
The corresponding maximum entropy model has the form
of independent samples in the training data.
Similarly, we can evaluate the variance of the likelihood
3 2
difference to be 1 XX
r
XX
P (σ) ∝ exp − Jij δσi r δσj r − hri δσi r
2 r=1 i r=1
i6=j
2
X
∗ ∗ 1 1 (32)
h(Ltest − Ltrain ) i = gi gk χik +
T1 T2 Note that the equivalence between constraining the
i,k
1 m equal-state correlation for each state and constraining the
+ (m2 + 2m) + (29) full pairwise correlation tensor only holds for the case of
T12 T1 T2 p = 3. For p > 3 states, one need to choose more con-
straints to fix the gauge, and it is not obvious which
using Wick’s theorem for multivariate Gaussian variables variables to fix.
and chain rules of partial derivatives. We train the maximum entropy model with tensor con-
In order to test whether perturbation theory can be straint [Eq. (32)] with the same procedure as the model
applied to the maximum entropy model learned from the with equal-state correlation constraint, described in the
real data, we estimate the number of independent sam- main text. The model is able to reproduce the con-
ples using Nind. sample ∼ T /τ , where T is the length of straints with a sparse interaction tensor J. However,
the experiment and τ is the correlation time. The cor- as shown in the bottom panel of Fig. 5, the difference
relation time is extracted as the decay exponent of the between ltrain , the per-neuron log likelihood of the train-
overlap function, defined to be ing data (randomly chosen 5/6 of all data) and ltest , the
per-neuron log likelihood of the test data, is greater than
D1 XN E zero within error bars. This indicates that the maxi-
q(∆t) = δσi (t)σi (t+∆t) , (30) mum entropy model with tensor constraint overfits for
N i=1 t
all N = 10, 20, . . . , 50.
h⌧↵data i
time is defined as the average time a trajectory spends in
<latexit sha1_base64="XOb/o3matVuyZZFJ2VbX83Rdrqw=">AAACEHicdVA9TxtBEN3jGyeACWWaFVaUVKc92ziUSGkoiRQDks9Yc+uxvWJv77Q7h2Kd/BNo+Cs0FEQRbUo6/g1rYySC4EkjPb03o5l5Sa6VIyEegoXFpeWV1bX1yoePG5tb1e1Pxy4rrMS2zHRmTxNwqJXBNinSeJpbhDTReJKc/5j6JxdoncrMLxrn2E1haNRASSAv9apfYw1mqJHHBEUvBp2P4Cwm/E1lHwgmPLYzv1etiVA0GmIv4iKs7zdbe01PWvWo3mrwKBQz1NgcR73qfdzPZJGiIanBuU4kcuqWYElJjZNKXDjMQZ7DEDueGkjRdcvZQxP+xSt9PsisL0N8pr6cKCF1bpwmvjMFGrnX3lR8y+sUNNjvlsrkBaGRT4sGheaU8Wk6vK8sStJjT0Ba5W/lcgQWJPkMKz6E50/5++S4HkYijH42awdiHsca+8x22TcWse/sgB2yI9Zmkl2ya3bL/gRXwU3wN7h7al0I5jM77D8E/x4B5ziduA==</latexit>
(sec)
h⌧↵data i / P↵0.5±0.027
a basin before escaping to another basin. For equilibrium
dynamics, the mean occupancy time is determined by the
<latexit sha1_base64="i6SKBv5WghJZpVb2NiJZSL/H21w=">AAACLXicbVDLSiNBFK12fMZXHJduCoPgKnSLEpeCLmYZwUQhHcPtyk1SWF1dVN0WQ5MfcuOvDAOziIjb+Y2pPARfBwoO59zLrXMSo6SjMBwHCz8Wl5ZXVtdK6xubW9vlnZ9Nl+VWYENkKrM3CThUUmODJCm8MRYhTRReJ3fnE//6Hq2Tmb6iocF2Cn0te1IAealTvogV6L7CmCDvxKDMAG5jwgcqukAwiu3MNTYzlPH620gRVk9ik/KwGh7VRp1yxZMp+FcSzUmFzVHvlP/E3UzkKWoSCpxrRaGhdgGWpFA4KsW5QwPiDvrY8lRDiq5dTNOO+IFXuryXWf808an6fqOA1LlhmvjJFGjgPnsT8TuvlVPvtF1IbXJCLWaHerniPvikOt6VFgWpoScgrPR/5WIAFgT5gku+hOhz5K+keVSNwmp0eVw5C+d1rLI9ts8OWcRq7Iz9YnXWYII9st9szJ6Dp+Bv8BK8zkYXgvnOLvuA4N9/oLqoxw==</latexit>
0
10
h⌧↵MC i
height of energy barriers according to the transition state
model
theory, or by considering random walks on the energy
<latexit sha1_base64="ytC6xhkR+dl6rAIVagyF1ctAOPw=">AAACDnicdVDBSiNBEO1Rd3WjuxvXo5fGIHgaZmLcxJuQixchglEhE0NNp5I09vQM3TViGPIFXvZXvHhQlr169rZ/YydGcBd9UPB4r4qqenGmpKUg+OstLC59+ry88qW0uvb12/fy+o9Tm+ZGYFukKjXnMVhUUmObJCk8zwxCEis8iy+bU//sCo2VqT6hcYbdBIZaDqQAclKvvB0p0EOFPCLIexGobAQXEeE1FUfNCY/MzO2VK4G/X937uVfjgV+rh7uNXUeqYb1RDXnoBzNU2BytXvkp6qciT1CTUGBtJwwy6hZgSAqFk1KUW8xAXMIQO45qSNB2i9k7E77tlD4fpMaVJj5T304UkFg7TmLXmQCN7P/eVHzP6+Q0aHQLqbOcUIuXRYNccUr5NBvelwYFqbEjIIx0t3IxAgOCXIIlF8Lrp/xjclr1w8APj2uVg2AexwrbZFtsh4Wszg7YIWuxNhPsht2ye/bg/fLuvN/en5fWBW8+s8H+gff4DAKjnKs=</latexit>
sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit>
sha1_base64="2O57DSZK9mfx3RPtw5h2brH+nsQ=">AAACA3icbZC9SgNBFIXv+m/8i7Y2gyJYhV0bLQUbG0HBmEA2hruTm2RwdnaZuSuGJU9g46vYWCjiO9j5Nk5iCv8ODBzOmeHO/ZJcK8dh+BHMzM7NLywuLVdWVtfWN6qbq1cuK6ykusx0ZpsJOtLKUJ0Va2rmljBNNDWSm5Nx37gl61RmLnmYUzvFvlE9JZF91KnuxRpNX5OIGYtOjDof4HXMdMfl2clIxHbSdqq7YS2cSPw10dTswlTnnep73M1kkZJhqdG5VhTm3C7RspKaRpW4cJSjvME+tbw1mJJrl5N1RmLPJ13Ry6w/hsUk/f6ixNS5YZr4mynywP3uxuF/Xavg3lG7VCYvmIz8GtQrtOBMjNmIrrIkWQ+9QWmV/6uQA7Qo2ROseAjR75X/mquDWhTWoosQlmAbdmAfIjiEYziFc6iDhHt4hGd4CR6Cp+D1C9dMMOW2BT8UvH0C3uea9g==</latexit>
sha1_base64="PQSt19o7XeD8eef3KCoKuz3z6nI=">AAACA3icdZDNSiNBFIVv64w60dGM29kUI4Krpjv+JO4G3MxGcMCokI7hduUmKayubqpui6HJE7jxVWbjQhnmHWY3b2MlKjiiBwoO51Rx635poZXjKPoXzM1/+LiwuPSptrzyeXWt/mXlxOWlldSWuc7tWYqOtDLUZsWazgpLmKWaTtOLg2l/eknWqdwc87igboZDowZKIvuoV99MNJqhJpEwlr0EdTHC84TpiqvDg4lI7Kzt1TeicL+xu7e7I6Jwpxlvt7a9acTNViMWcRjNtAFPOurV/yb9XJYZGZYanevEUcHdCi0rqWlSS0pHBcoLHFLHW4MZuW41W2ciNn3SF4Pc+mNYzNKXLyrMnBtnqb+ZIY/c624avtV1Sh60upUyRclk5OOgQakF52LKRvSVJcl67A1Kq/xfhRyhRcmeYM1DeN5UvG9OGmEchfHPCJbgK3yDLYihCd/hBxxBGyRcwy+4g/vgJrgNfj/imgueuK3Dfwr+PABM4ZtE</latexit>
sha1_base64="w5CfhoMF57sszR/7ktr2I43mCzs=">AAACDnicdVBNSyNBEO3xY9XsukY97qUxCHsaeuJHsjfBixfBhY0KmWyo6VSSxp6eobtGNgz5BV78K148KMtePXvz32wnRlDZfVDweK+KqnpJrpUjIZ6CufmFxQ9LyyuVj59WP69V1zdOXVZYiS2Z6cyeJ+BQK4MtUqTxPLcIaaLxLLk4nPhnl2idyswPGuXYSWFgVF9JIC91q9uxBjPQyGOCohuDzofwMyb8ReXx4ZjHdup2qzURfqvv7e/tchHuNqKd5o4n9ajRrEc8CsUUNTbDSbf6GPcyWaRoSGpwrh2JnDolWFJS47gSFw5zkBcwwLanBlJ0nXL6zphve6XH+5n1ZYhP1dcTJaTOjdLEd6ZAQ/fem4j/8toF9ZudUpm8IDTyeVG/0JwyPsmG95RFSXrkCUir/K1cDsGCJJ9gxYfw8in/Pzmth5EIo++idiBmcSyzL2yLfWURa7ADdsROWItJdsVu2B27D66D2+B38Oe5dS6YzWyyNwge/gIBY5yn</latexit>
(MC
sweep) h⌧↵RW i =
9 1
2e log P↵
landscape, which gives the relation τ ∼ −p2 /2e ln(Pα ),
where p = 3 is the number of Potts states and Pα is the
<latexit sha1_base64="4SJrWOe/r0wKavyxMei7GxFDvlc=">AAACM3icbVDPSxtBGJ211mpsNa1HL4NB6KVhVwq1B0HwIj2l0hghk4ZvJ98mg7Ozy8y30rDs/+TFf8SDUDy0FK/+D06SPdQfDwYe732Pb74X51o5CsPfwdKr5dcrb1bXGutv321sNt9/OHVZYSV2ZaYzexaDQ60MdkmRxrPcIqSxxl58fjTzexdoncrMD5rmOEhhbFSiJJCXhs1vQoMZaxQExVCAzifwUxD+ovKkVwk79/gB/yQSC7L8WpV7WC14VJVCZ2PeqWPVsNkK2+Ec/DmJatJiNTrD5rUYZbJI0ZDU4Fw/CnMalGBJSY1VQxQOc5DnMMa+pwZSdINyfnPFd70y4klm/TPE5+r/iRJS56Zp7CdToIl76s3El7x+Qcn+oFQmLwiNXCxKCs0p47MC+UhZlKSnnoC0yv+Vywn4RsjX3PAlRE9Pfk5O99pR2I6+f24dhnUdq2yb7bCPLGJf2CE7Zh3WZZJdshv2h/0NroLb4F9wtxhdCurMFnuE4P4BgC+r8g==</latexit>
-1
10
10 -4 10 -3 10 -2 10 -1 10 0 fraction of time the system visits basin α. As shown in
P
Probability of visiting basin α, Pα Figure 12, the mean occupancy time hταMC i found in the
Monte Carlo simulation can be predicted by this simple
FIG. 12: Equilibrium dynamics of the inferred pairwise max- approximation. In contrast, the empirical neural dynam-
imum entropy model fails to capture the neural dynamics of ics deviates from the equilibrium dynamics, as we might
C. elegans. The mean occupancy time of each basin of the have expected. The dependence between hταdata i and
0.5±0.027
energy landscape, hτα i, is plotted against the fraction of time Pαdata is weak; a linear fit gives hταdata i ≈ Pαdata .
the system visits the basin, Pα . For 10 subgroups of N = 10
(in dots) and N = 20 (in asterisks) neurons, the empirical
dynamics exhibits a weak power-law relation between hτα i
and Pα . The striped patterns are artifacts due to finite sam-
ple size. In contrast, equilibrium dynamics extracted from a
Monte Carlo simulation following detailed balance shows an
inverse logarithmic relation between hτα i and Pα , which can
be explained by random walks on the energy landscape. Error
bars of the data are extracted from random halves of the data.
Error bars of the Monte Carlo simulation are calculated using
correlation time and standard deviation of the observables.
[1] J. J. Hopfield, Proc. Nat. Acad. Sci. USA 79, 2554 Sci. USA 106, 14058 (2009).
(1982). [13] G. Tkacik, E. Schneidman, I. Berry, J. Michael, and
[2] J. J. Hopfield, Proc. Nat. Acad. Sci. USA 81, 3088 W. Bialek, arXiv preprint arXiv:0912.5409 (2009).
(1984). [14] G. Tkaik, O. Marre, D. Amodei, E. Schneidman,
[3] D. J. Amit, H. Gutfreund, and H. Sompolinsky, Phys. W. Bialek, and M. J. Berry, II, PLoS Comp. Biol. 10,
Rev. A 32, 1007 (1985). 1 (2014).
[4] D. A. Dombeck, C. D. Harvey, L. Tian, L. L. Looger, and [15] R. Monasson and S. Rosay, Phys. Rev. Lett. 115, 098101
D. W. Tank, Nat. Neurosci. 13, 1433 (2010). (2015).
[5] M. B. Ahrens, M. B. Orger, D. N. Robson, J. M. Li, and [16] L. Posani, S. Cocco, K. Ježek, and R. Monasson, J.
P. J. Keller, Nat. Methods 10, 413 (2013). Comp. Neurosci. 43, 17 (2017).
[6] R. Segev, J. Goodhouse, J. Puchalla, and M. J. Berry II, [17] L. Meshulam, J. L. Gauthier, C. D. Brody, D. W. Tank,
Nat. Neurosci. 7, 1155 (2004). and W. Bialek, Neuron 96, 1178 (2017).
[7] J. P. Nguyen, F. B. Shipley, A. N. Linder, G. S. Plummer, [18] A. Tang, D. Jackson, J. Hobbs, W. Chen, J. L. Smith,
M. Liu, S. U. Setru, J. W. Shaevitz, and A. M. Leifer, H. Patel, A. Prieto, D. Petrusca, M. I. Grivich, A. Sher,
Proc. Nat. Acad. Sci. USA 113, E1074 (2016). et al., J. Neurosci. 28, 505 (2008).
[8] J. P. Nguyen, A. N. Linder, G. S. Plummer, J. W. Shae- [19] I. E. Ohiorhenuan, F. Mechler, K. P. Purpura, A. M.
vitz, and A. M. Leifer, PLoS Comp. Biol. 13, e1005517 Schmid, Q. Hu, and J. D. Victor, Nature 466, 617 (2010).
(2017). [20] U. Köster, J. Sohl-Dickstein, C. M. Gray, and B. A. Ol-
[9] V. Venkatachalam, N. Ji, X. Wang, C. Clark, J. K. shausen, PLoS Comp. Biol. 10, e1003684 (2014).
Mitchell, M. Klein, C. J. Tabone, J. Florman, H. Ji, [21] T. Mora and W. Bialek, J. Stat. Phys. 144, 268 (2011).
J. Greenwood, et al., Proc. Nat. Acad. Sci. USA 113, [22] M. A. Muñoz, Rev. Mod. Phys. 90, 031001 (2018).
E1082 (2016). [23] L. Meshulam, J. L. Gauthier, C. D. Brody, D. W.
[10] E. T. Jaynes, Phys. Rev. 106, 620 (1957). Tank, and W. Bialek, arXiv preprint arXiv:1809.08461
[11] E. Schneidman, M. J. Berry II, R. Segev, and W. Bialek, [q-bio.NC] (2018).
Nature 440, 1007 EP (2006). [24] F. Rieke, D. Warland, R. de Ruyter van Steveninck,
[12] S. Cocco, S. Leibler, and R. Monasson, Proc. Natl. Acad. and W. Bialek, Spikes: Exploring the Neural Code (MIT
14
Press, 1997). Proc. Nat. Acad. Sci. USA 107, 5405 (2010).
[25] M. B. Goodman, D. H. Hall, L. Avery, and S. R. Lockery, [38] W. Bialek, A. Cavagna, I. Giardina, T. Mora, E. Silvestri,
Neuron 20, 763 (1998). M. Viale, and A. M. Walczak, Proc. Nat. Acad. Sci. USA
[26] G. J. Stephens, M. Bueno de Mesquita, W. S. Ryu, and 109, 4786 (2012).
W. Bialek, Proc. Nat. Acad. Sci. USA 108, 7286 (2011). [39] M. Dudı́k, S. J. Phillips, and R. E. Schapire, in Pro-
[27] P. Sengupta and A. D. Samuel, Curr. Opin. Neurobiol. ceedings of the 17th annual Conference on Learning The-
19, 637 (2009). ory,(COLT 2004), Banff, Canada (Springer, 2004), vol.
[28] E. L. Ardiel and C. H. Rankin, Learn. Mem. 17, 191 3120, pp. 472–486.
(2010). [40] T. Broderick, M. Dudik, G. Tkacik, R. E. Schapire, and
[29] A. L. Nichols, T. Eichler, R. Latham, and M. Zimmer, W. Bialek, arXiv preprint arXiv:0712.2437 (2007).
Science 356, eaam6851 (2017). [41] M. Schmidt, UGM: A Matlab toolbox for probabilis-
[30] T. Bullock and G. A. Horridge, Structure and function tic undirected graphical models, http://www.cs.ubc.ca/
in the nervous systems of invertebrates. (San Francisco, ~schmidtm/Software/UGM.html (2007).
1965). [42] O. M. Becker and M. Karplus, J. Chem. Phys. 106, 1495
[31] J. G. White, E. Southgate, J. N. Thomson, and S. Bren- (1997).
ner, Philos. Trans. R. Soc. Lond. B Biol. Sci. 314, 1 [43] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
(1986). [44] A. Gordus, N. Pokala, S. Levy, S. W. Flavell, and C. I.
[32] T.-W. Chen, T. J. Wardill, Y. Sun, S. R. Pulver, S. L. Bargmann, Cell 161, 215 (2015).
Renninger, A. Baohan, E. R. Schreiter, R. A. Kerr, M. B. [45] M. Liu, A. K. Sharma, J. Shaevitz, and A. M. Leifer,
Orger, V. Jayaraman, et al., Nature 499, 295 (2013). eLife 7, e36419 (2018).
[33] E. Kim, L. Sun, C. V. Gabel, and C. Fang-Yen, PLoS [46] J. M. Gray, J. J. Hill, and C. I. Bargmann, Proc. Nat.
One 8, e53419 (2013). Acad. Sci. USA 102, 3184 (2005).
[34] N. Slonim, G. S. Atwal, G. Tkacik, and W. Bialek, arXiv [47] B. J. Piggott, J. Liu, Z. Feng, S. A. Wescott, and X. S.
preprint cs/0502017 (2005). Xu, Cell 147, 922 (2011).
[35] R. Chartrand, ISRN Appl. Math. 2011 (2011). [48] G. Yan, P. E. Vértes, E. K. Towlson, Y. L. Chew, D. S.
[36] M. Weigt, R. A. White, H. Szurmant, J. A. Hoch, and Walker, W. R. Schafer, and A.-L. Barabási, Nature 550,
T. Hwa, Proc. Nat. Acad. Sci. USA 106, 67 (2009). 519 (2017).
[37] T. Mora, A. M. Walczak, W. Bialek, and C. G. Callan,