Bayesian Control Variates For Optimal Covariance Estimation With Pairs of Simulations and Surrogates

MNRAS 000, 000–000 (2022) Preprint 13 April 2022 Compiled using MNRAS LATEX style file v3.
Bayesian Control Variates for optimal covariance estimation with pairs of

simulations and surrogates
Nicolas Chartier1,2 and Benjamin D. Wandelt 2,3
1 Laboratoire de Physique de l’École Normale Supérieure, ENS, Universite PSL, CNRS, Sorbonne Université, Université Paris Cité, F-75005 Paris, France
2 Sorbonne Université, CNRS, UMR 7095, Institut d’Astrophysique de Paris, 98 bis bd Arago, 75014 Paris, France
3 Center for Computational Astrophysics, Flatiron Institute, 162 5th Avenue, New York, NY 10010, USA
arXiv:2204.03070v2 [astro-ph.CO] 12 Apr 2022
Accepted XXX. Received YYY; in original form ZZZ
ABSTRACT
Predictions of the mean and covariance matrix of summary statistics are critical for confronting cosmological theories with
observations, not least for likelihood approximations and parameter inference. The price to pay for accurate estimates is
the extreme cost of running 𝑁-body and hydrodynamics simulations. Approximate solvers, or surrogates, greatly reduce the
computational cost but can introduce significant biases, for example in the non-linear regime of cosmic structure growth. We
propose "CARPool Bayes", an approach to solve the inference problem for both the means and covariances using a combination
of simulations and surrogates. Our framework allows incorporating prior information for the mean and covariance. We derive
closed-form solutions for Maximum A Posteriori covariance estimates that are efficient Bayesian shrinkage estimators, guarantee
positive semi-definiteness, and can optionally leverage analytical covariance approximations. We discuss choices of the prior
and propose a simple procedure for obtaining optimal prior hyperparameter values with a small set of test simulations. We test
our method by estimating the covariances of clustering statistics of GADGET-III 𝑁-body simulations at redshift 𝑧 = 0.5 using
surrogates from a 100-1000× faster particle-mesh code. Taking the sample covariance from 15,000 simulations as the truth, and
using an empirical Bayes prior with diagonal blocks, our estimator produces nearly identical Fisher matrix contours for ΛCDM
parameters using only 15 simulations of the non-linear dark matter power spectrum. In this case the number of simulations is so
small that the sample covariance would be degenerate. We show cases where even with a naïve prior our method still improves
the estimate. Our framework is applicable to a wide range of cosmological and astrophysical problems where fast surrogates are
available.
Key words: large-scale structure, cosmological simulations, 𝑁-body, covariance
1 INTRODUCTION in terms of redshift, sky area, volume, etc,... We then use the sam-
ples to compute the unbiased and positive definite sample covariance
To study the large-scale structure of the universe and cosmic growth
estimator, but getting a reliable estimate requires many realizations,
history in the era of data-driven cosmology, one needs to accurately
especially if we need the precision matrix for the estimation of pa-
model the statistical properties of observables in order to infer cosmo-
rameter confidence bounds.
logical parameters constraints from surveys. The covariance matrix
𝚺 of a summary statistics vector, such as the matter power spectrum To reduce the computational cost of generating simulation samples
across different wave-numbers, and most importantly its inverse — various parallel, distributed-memory 𝑁-body solvers have been de-
the precision matrix—are paramount to extracting low-dimensional veloped sometimes with GPU-acceleration ( Springel (2005) (GAD-
summaries, building inference frameworks or computing likelihood GET), Ishiyama et al. (2009) (GreeM), Warren (2013) (2HOT),
approximations from mock catalogues (Heavens et al. (2000); Eifler, Harnois-Déraps et al. (2013) ( CUBEP3 M), Garrison (2019) (Aba-
T. et al. (2009); Takahashi et al. (2009); Harnois-Déraps et al. (2012); cus), Habib et al. (2016) (HACC), Potter et al. (2017) (PKDGRAV3),
Dodelson & Schneider (2013); Harnois-Déraps & Pen (2013); Taylor Yu et al. (2018) and Cheng et al. (2020) (CUBE)). Relying solely
& Joachimi (2014); Percival et al. (2014); Blot et al. (2014); Joachimi on massively parallel computing to tackle next-generation obser-
& Taylor (2014); Alsing & Wandelt (2018); Harnois-Déraps et al. vational datasets appears impractical given our time, memory, and
(2019); Hikage et al. (2020); Wadekar et al. (2020); Giocoli et al. energy resources since thousands of simulations are needed to pro-
(2021)). duce sufficiently accurate cosmological parameter constraints (see
The most trusted yet costly method to compute the covariance for instance Blot et al. (2016)).
matrix of large-scale structure clustering statistics consists in gener- For this reason, cosmologists have been searching for alternatives
ating mock realizations of survey observables with intensive 𝑁-body to running a large number of 𝑁-body simulations for a particular
simulations – or even hydrodynamical simulations for certain appli- cosmological model.
cations – that mimic the conditions of observational data sampling On the theoretical side, analytical computations give covariance
© 2022 The Authors

2 Nicolas Chartier and Benjamin D. Wandelt
matrices that have little or no Monte Carlo noise but approximate speed and memory gains. As a consequence, parameters constraints
and only valid for some assumptions on the data model. Such com- derived from surrogates only do not match the reliability and accu-
putations typically exploit the Gaussian limit and/or deviations from racy needed for upcoming surveys. For experiments, see the studies
Gaussianity of the covariance (Philcox & Eisenstein 2019; Li et al. by Lippich et al. (2019), Blot et al. (2019) and Colavincenzo et al.
2019; Philcox et al. 2020) or stem from perturbation theory (Mo- (2019), where statistical biases in parameters estimation using co-
hammed & Seljak 2014; Mohammed et al. 2017). For reviews of variance matrices from surrogates range up to 10 − 20% higher than
methods motivated by theoretical predictions, refer to Bernardeau with covariances computed from full 𝑁-body solvers.
et al. (2002) and Desjacques et al. (2018). Another approach is to attempt to reduce the number of needed
On the computational side, researchers have developed approx- simulations by modifying the statistical estimator of the covariance
imate solvers which are much faster than full 𝑁-body or hydro- matrix. Numerous studies have been encouraging the use of new
dynamical codes at the cost of coarser computations and simplifi- methods in order to deal with future surveys large data sets: covari-
cations that reduce the overall accuracy with respect to intensive ance tapering in Paz & Sánchez (2015) who demonstrated the ability
solutions, especially at small scales. An important part of these ap- to reduce the confidence intervals of parameters without adding bias,
proximate solvers use Lagrangian Perturbation Theory (LPT) within fitting a theoretical model with mock samples (Pearson & Samushia
a low-fidelity Particle-Mesh (PM) framework: Tassev et al. (2013) 2016), jackknife resampling for the covariance (Escoffier et al. 2016;
(COLA), Tassev et al. (2015) (sCOLA) implemented by Leclercq Favole et al. 2020), reducing the number of simulations by using
et al. (2020), Feng et al. (2016) (FastPM) available in a distributed both theoretical and simulated covariances (Hall & Taylor 2019),
version by Modi et al. (2020), White et al. (2014) (QPM), and Kitaura combining an empirical covariance with a simple target via (non-
et al. (2014) (PATCHY), to name a few. Methods based on low order )linear shrinkage (Pope & Szapudi 2008; Joachimi 2017). As hinted
LPT predictions provide numerous fast structure formation statis- at above, precision matrix estimation is the elephant in the room
tics for cosmology: Scoccimarro & Sheth (2002) (PTHalos), Tassev when it comes to undesirable effects – parameters shifts... – of poor
& Zaldarriaga (2012) and Monaco et al. (2013) building upon the conditioning onto parameter constraints. Among the recent papers
work of Taffoni et al. (2002) (PINOCCHIO), or Chuang et al. (2015) that deal with these limits and means to overcome (some of) them, the
(EZmocks). reader can refer to Taylor et al. (2013) who show how the accuracy
An increasingly popular approach, based upon optimization, is of the precision matrix impacts parameters constraints in the case
to construct mathematical models–emulators– that directly predict of Gaussian-distributed weak lensing power spectra, the precision
summary statistics for specific cosmologies and parameters and of matrix expansion method from Friedrich & Eifler (2018), Sellentin
which the free-parameters were previously determined through train- & Heavens (2018) who show the limit of a Gaussian likelihood to
ing with a specific loss function and, most importantly, simulation derive parameter constraints, the Appendix B of Philcox et al. (2021)
suites covering an appropriate range of the space of the upcoming that details parameter shifts stemming from a noisy covariance esti-
input data (McClintock et al. 2019a; Zhai et al. 2019; McClintock mate, Percival et al. (2021) who choose a specific covariance prior
et al. 2019b; DeRose et al. 2019; Lucie-Smith et al. 2019; Kasim et al. in a Bayesian framework, and also the Dark Energy Survey (DES)
2020; Angulo et al. 2020; Alsing et al. 2020; Rogers & Peiris 2021; Year 3 results from Friedrich et al. (2021).
Pedersen et al. 2021). A large proportion of the underlying mathemat- Variance reduction methods allow to exploit the accuracy of 𝑁-
ical models of emulators are trained Neural Networks architectures body solvers while dramatically lowering the number of required
(Lucie-Smith et al. 2020; Remy et al. 2020; Alves de Oliveira et al. samples to compute robust moments estimators. Smith et al. (2021),
2020; Villaescusa-Navarro et al. 2021a; Spurio Mancini et al. 2022) for example, combined different lines of sight in redshift space and
that produce summary statistics, and some have been specifically lowered the variance of the quadrupole estimator of the two-point
designed to output matter density fields from input initial conditions, clustering statistic by more than one third.
or even snapshots of low-resolution 𝑁-body simulations with par- Pontzen et al. (2016), Angulo & Pontzen (2016), and Villaescusa-
ticles positions and velocities (He et al. 2019; Dai & Seljak 2020; Navarro et al. (2018) discuss variance reduction with simulation pairs
Kodi Ramanah et al. 2020). Recently, Modi et al. (2021) proposed having special initial conditions. The technique allows to estimate
a solution to the inverse problem of estimating the initial density the mean of statistics such as the power spectrum, the monopole
field of the Early Universe : they combine a differentiable 𝑁-body and quadrupole of the redshift-space correlation functions or the
solver with a Recurrent Neural Network architecture (RNN) to build halo mass function faster by a factor of more than 50. The induced
a tractable inference scheme. Also, Hassan et al. (2021) (HIFLOW) bias, however, on certain higher-order 𝑁-point functions renders the
trained an emulator and are able to produce 2D neutral Hydrogen method not adapted to covariance estimation.
maps conditioned on cosmology. In Chartier et al. (2021) and Chartier & Wandelt (2021) (CWAV20
As a consequence of the growing enthusiasm for Machine Learn- and CW21 from now on), we developed the Convergence Accel-
ing solvers we have seen the production of massive simulation suites – eration by Regression and Pooling (CARPool) method, a general
Garrison et al. (2018), Villaescusa-Navarro et al. (2020), Villaescusa- approach to reducing the number of simulations needed for low vari-
Navarro et al. (2021c) and Villaescusa-Navarro et al. (2021b) – that ance and explicitly unbiased estimates of clustering statistics mo-
more and more often aim specifically at providing ways to train vari- ments. CWAV20 demonstrated a dramatic reduction of the number
ous emulators and models. Any trained model suffers from two main of simulations required to estimate the mean of a given statistic by ex-
drawbacks: namely the need for many training simulations and the ploiting the variance reduction principle known as control variates.
subsequent limitation of the model to generalize by the parameter The key idea is to combine a small number of costly simulations
range of the training set; and the absence of guarantee for unbiased- with a large number of correlated surrogates. Very recently, Ding
ness of the predictions with respect to full 𝑁-body or hydrodynamical et al. (2022) tested the CARPool principle to estimate the mean of
outputs. the two-point and three-point clustering statistics of halos, in order
All the fast solvers described above – which we will refer to col- to prepare the high-resolution simulations needed for the Dark En-
lectively as surrogates – trade the accuracy of full 𝑁-body mocks, ergy spectroscopic Instrument (DESI). By pairing AbacusSummit
especially in the non-linear regime at small scales, for computational suite (Maksimova et al. 2021) simulations with FastPM approxima-
MNRAS 000, 000–000 (2022)

Bayesian Control Variates for optimal covariance estimation with pairs of simulations and surrogates 3
tions, they found ≈ 100 times smaller variances with CARPool at To get an accurate estimate of the confidence bounds for the pa-
scales 𝑘 ≤ 0.3 ℎMpc−1 than with high-resolution simulations alone. rameters requires an accurate estimate of the covariance matrix 𝚺𝒔𝒔 .
Additionally, the extension of the method to different cosmologies Using the standard sample covariance estimator (or maximum like-
(one or very few simulations of the cosmologies of interest paired lihood estimator) we would expect to need thousands of simula-
with a "primary cosmology" as the surrogate) resulted in an increase tions costing O (107 ) CPU hours, much like in the Quĳote suite
of the effective volume by ≈ 20 times. In CW21, we extended the (Villaescusa-Navarro et al. 2020).
principle to covariance estimation by applying the variance reduc- But we have at our disposal a much faster surrogate solver that
tion approach to individual elements of a symmetric matrix, and we uses approximations from a Lagrangian fluid description of the dark
assessed the covariance estimates by deriving cosmological parame- matter field to produce fast but unfortunately biased approximations
ters confidence intervals with the Fisher matrix (using the precision). of this power spectrum. In this paper we show how to leverage these
With this straightforward approach we found significant improvement fast surrogates to obtain accurate estimates of the means and covari-
in many cases, but a definite drawback was that positive-definiteness ance of the summary statistics while reducing the required number
of the covariance estimate is not guaranteed. The main reason for of simulations by orders of magnitude.
this was because the covariance matrix was treated as a first order Figure 1 illustrates the take-home message of this work. It shows
statistic for the methods in CWAV20 to be directly applicable. the predicted marginal confidence regions of ΛCDM cosmological
In this paper, to circumvent this drawback, we frame the problem parameters 1 computed using different estimates of the power spec-
as a Bayesian inference of simulation means and covariances when a trum covariance. The case labeled "Truth" uses the standard Max-
(typically small) set of pairs of simulations and surrogates are avail- imum Likelihood Estimate (MLE) of the covariance matrix from
able in addition to a (typically large) set of unpaired fast surrogates. 15,000 full simulations. This "Truth" case is hardly visible because
We derive closed-form Maximum A Posteriori (MAP) estimators of the contours are nearly perfectly overlapped by the "CARPool Bayes"
the covariance of the simulation statistics that incorporate the infor- case that uses only 15 simulations in combination with fast surro-
mation brought by the surrogates and the prior, and test the estimates gates (noted as 10 + 5 simulations, the second-term being the number
by comparing the resulting confidence bounds for a ΛCDM cosmol- of test simulations used to set a prior hyperparameter; see discussion
ogy with the true bounds. The results in this paper are very general in section 3.5). This is one of the Bayesian covariance estimators
and can apply to any summary statistics from simulations. For this we develop in this paper. These two cases are to be compared with
reason, we motivate the study with an introductory example in sec- the "ML (sims only)" case showing the standard MLE of the covari-
tion 2 before explaining the notations and derivations in section 3. ance matrix from 200 simulations but without surrogates. The case
We show several example applications to large scale structure statis- labeled "ML (surr. only)" illustrates that relying on the surrogates
tics in section 4 and we conclude and discuss the implications of our alone results in biased estimates of the size and orientation of the
work in section 5. contours. 2
Figure 1 emphasizes the potential of the Bayesian formulation of
the CARPool approach that we develop in detail in the following.
Readers mostly interested in applications and numerical examples
2 ILLUSTRATIVE EXAMPLE can skip to section 4.
Imagine having a simulation code to compute the evolution of col-
lisionless dark matter particles in an expanding ΛCDM universe,
within a simulation volume mimicking the observational conditions
of some future survey. We would like to ask: "What amount of in- 3 BAYESIAN INFERENCE OF COVARIANCE FROM
formation the clustering statistics of the large-scale structure carry SIMULATION-SURROGATE PAIRS
about the cosmological parameters? By which amount will we be We wish to estimate the covariance matrix of the summary statistics
able to constrain cosmological parameters with said statistics?" Let’s 𝒔, dim(𝒔) = 𝑝 𝑠 from accurate, expensive simulations. We also have
say we try with the two-point correlation function in Fourier space, access to a fast surrogate solver, 𝒓, dim(𝒓) = 𝑝 𝑟 , which we would
i.e., the (dark matter) power spectrum. For each of 𝑛𝑠 runs, with dif- not rely on alone. Inspired by CARPool, we build estimators to
ferent random seeds for the initial conditions, labelled 𝑖, 1 ≤ 𝑖 ≤ 𝑛𝑠 , exploit both simulation and surrogate statistics, with the main goal
the output is the vector 𝒔𝑖 of 𝑝 𝑠 = 158 power spectrum bins up of reducing the number of intensive simulations we have to run.
to 𝑘 max ≈ 1.0 ℎMpc−1 . We will introduce the detailed notation in
section 3.1.
Under the hypothesis that the observable is sampled from a Mul-
tivariate Normal (MVN) distribution and that the covariance matrix 3.1 Definitions and notations
does not depend on the parameters, the Fisher matrix for 𝑑 parameters
With simulation summary statistics samples 𝒔𝑖 , 𝑖 = 1, . . . , 𝑛𝑠 the
is the symmetric matrix of size (𝑑, 𝑑)
standard approach to estimating the covariance matrix 𝚺𝒔𝒔 is to
𝜕 𝝁(𝜽) 𝑻 −1 𝜕 𝝁(𝜽)

F𝑖 𝑗 = 𝚺𝒚𝒚 ; (1)
𝜕𝜃 𝑖 𝜕𝜃 𝑗
1 We use 𝑛spec as the spectral index not to induce confusion with the number
hence the importance of having an accurate estimate of the covariance
of simulations 𝑛𝑠 used in the paper.
matrix and its inverse, the precision matrix. Then, for a parameter 𝜃 𝑖 , 2 We correct the bias of the precision matrix computed by inverting the
the Cramér-Rao inequality gives the lower-bound, marginalized over
standard sample covariance matrix estimator in equation (1) with the so-
the remaining parameters, for the variance of an unbiased estimator called "Hartlap factor" (see section 3.6.1 for a reminder) when using sample
of 𝜃 𝑖 : covariances, i.e. for "ML (sims only)", "ML (surr. only)" and also for the truth
h i even if the correction is small. We do not use any correction when using the
𝜎𝜃2𝑖 ≥ F −1 . (2) "CARPool Bayes" estimate, a point which we discuss in section 3.6.2.
𝑖𝑖
MNRAS 000, 000–000 (2022)

Figure 1. Illustrating the power of Bayesian control variates using the confidence contours of the cosmological parameters computed using the Fisher matrix
based on the estimated matter power spectrum covariance matrix. The "truth" designates the confidence regions (black) from the sample covariance matrix of
15,000 𝑁 -body simulations, and the parameter means are set to the ΛCDM model used in the simulations. The contours are overlapped nearly perfectly by the
light blue when the covariance in the Fisher matrix is computed using only 15 simulations with our CARPool Bayes MAP estimator (10 simulations and 5 for
setting a prior hyperparameter, see section 3.5). The sample covariance (ML) estimator based on many more simulations than ours gives less accurate contours.
Contours based on 3100 COLA surrogates alone are rotated and too small showing that the surrogates alone are inaccurate. Detailed discussion in the text and
in section 4.
compute eigenvalues will dominate the precision matrix and impact parameter
parameter constraints (Taylor et al. (2013), Blot et al. (2016)).
𝑛𝑠
𝛾 ∑︁ Now we add surrogates. The goal is to build a Bayesian model
𝚺𝒔𝒔 =
b (𝒔𝑖 − 𝒔¯) (𝒔𝑖 − 𝒔¯)𝑻 (3) for the covariance of the simulations but including whatever in-
𝑛𝑠
𝑖=1 formation is provided by the surrogates. The set of surrogates 𝒓 𝑗 ,
𝑛𝑠
1 ∑︁ 𝑗 = 1, . . . , 𝑛𝑠 + 𝑛𝑟 comprises 𝑛𝑠 samples that are paired with the
𝒔¯ = 𝒔𝑖 , simulations, i.e., they were computed using the same random num-
𝑛𝑠
𝑖=1 bers, and 𝑛𝑟 additional unpaired surrogates. We combine pairs of
simulations and surrogates into a single vector
the Maximum-Likelihood (ML) estimator given a Multivariate Nor-
mal (MVN) likelihood function when 𝛾 = 1. To get an unbiased
estimator, we use Bessel’s correction factor 𝛾 = 𝑛𝑠 /(𝑛𝑠 − 1) in the
ML estimator for the covariance. Equation (3) needs many samples
to provide a high-quality estimate: as a matter of fact, the conver- 𝑻
gence of the smallest eigenvalues is slow (Bai & Yin 1993) and these 𝒙 ≡ 𝒔, 𝒓 (4)
MNRAS 000, 000–000 (2022)

which implies a block matrix structure for the mean and covariance simulation-surrogate pairs, is
− 2 ln L ({𝒙}, {𝒓 ∗ }|𝚺) = (𝑛𝑠 + 𝑛𝑟 ) ln [det (𝚺)]
𝑻
𝝁 ≡ E [𝒙] = 𝝁 𝒔 , 𝝁𝒓 (7)
𝑛𝑠 𝑛𝑟
𝚺 𝚺𝒔𝒓
𝚺 ≡ 𝒔𝒔
∑︁ ∑︁
. + 𝒙𝑻𝑖 𝚺−1 𝒙 𝑖 + 𝒙 ∗𝑖 𝑻 𝚺−1 𝒙 ∗𝑖 + 𝑐 𝑓 ,
𝚺𝒓 𝒔 𝚺𝒓𝒓
𝑖=1 𝑖=1
Following the standard notation, we will denote the Schur comple- where 𝑐 𝑓 is the remaining constant of the likelihood for the full model
ment as including 𝒙 and 𝒙 ∗ . Treating the simulations 𝒔∗ in 𝒙 ∗ as unobserved,
−1
(𝚺/𝚺𝑟𝑟 ) ≡ 𝚺 𝑠𝑠 − 𝚺 𝑠𝑟 𝚺𝑟𝑟 𝚺𝑟 𝑠 . (5) latent variables we use the Expectation Maximization (EM) approach
(Dempster et al. 1977). While EM is typically an iterative algorithm
S +𝑝 designates the space of symmetric positive-definite matrices, that can be slow to converge, we show in Appendix A that we can
which is a subset of R 𝑝 ( 𝑝+1)/2 . find the Maximum Likelihood (ML) estimators of the mean and of
For the 𝑛𝑟 unpaired surrogates 𝒓 ∗ we introduce the unobserved the covariance from simulations and surrogates in closed-form by
(and in fact non-existent) corresponding simulations 𝒔∗ as latent computing the fixed point of the EM iterations. These are
variables and then treat them as missing data. Again we combine −1
into a vector 𝒙 ∗ ≡ (𝒔∗ , 𝒓 ∗ )𝑻 giving 𝑩
b= b 𝚺𝒔𝒓 b
𝚺𝒓𝒓 (8)

𝒔 1 , . . . , 𝒔 𝑛𝑠 , 𝒔∗1 , . . . , 𝒔∗𝑛𝑟 𝒔 |𝒓 = 𝒔 + 𝑩 𝒓 − 𝒓
𝝁d b ★ (9)
𝒓 1 , . . . , 𝒓 𝑛𝑠 , 𝒓 ∗1 , . . . , 𝒓 ∗𝑛𝑟 . ML
𝚺𝒔𝒔
b = (𝚺/𝚺
𝑟𝑟 ) + 𝑩
bb ★ b𝑻
𝚺𝒓𝒓 𝑩 (10)
| {z } | {z }
𝒙 𝒙∗ ★ b𝑻 ,
=b𝚺𝒔𝒔 + 𝑩
b b 𝚺𝒓𝒓 𝑩
𝚺𝒓𝒓 − b
We will also distinguish the empirical counterparts of the surrogate
moments according to whether they use all the 𝒓 available or just the where b𝚺𝒔𝒔 is the sample covariance from equation (3) using simu-
paired ones, i.e., lations only. We provide a proof in Appendix A that as long as the
ML
𝚺𝒔𝒔
covariance of the surrogate is positive definite the ML estimate b
𝒓, b
𝚺𝒓𝒓 −→ estimated from the unpaired set only; 3
is guaranteed to be positive (semi-)definite .
★ ★ As we will show in section 4, this solution improves the estimated
𝒓 ,b
𝚺𝒓𝒓 −→ estimated from both the paired and unpaired sets.
simulation covariance significantly with respect to the ML covariance
For instance, computed from simulations only, Eq. (3). But the key ingredient for
𝑠 +𝑛𝑟
𝑛∑︁ many applications is the precision matrix: computing optimal data
1
𝒓★ = 𝒓𝑗 , combinations, least square estimators and optimal filtering. We will
𝑛 𝑠 + 𝑛𝑟 see that the dramatically underestimated smallest eigenvalues of the
𝑗=1
ML estimate of the covariance are critical.
where we do not differentiate the paired and unpaired surrogates for
Fortunately, the Bayesian approach allows us to include priors
simplicity (𝒓 𝑗 = 𝒓 ∗𝑗−𝑛𝑠 if 𝑗 ≥ 𝑛𝑠 + 1).
amounting to a form of regularization, as we will show now.
We recall the well-known result that the best prediction 𝒔b∗ for any
𝒔∗ given 𝒓 ∗ with no constraints (i.e we do not restrict the problem to
the class of linear estimators) under the square loss of residuals co- 3.3 Inclusion of a Prior Information and Maximum A
incides, when under a MVN distribution, with the linear regression: Posteriori (MAP) solutions
P (𝒔∗ |𝒓 ∗ , 𝚺) = 𝑀𝑉 𝑁 ( 𝝁 𝒔∗ |𝒓 ∗ , 𝚺 𝒔∗ |𝒓 ∗ ) (6) A convenient prior to choose for the block covariance 𝚺, with 𝑃 ≡
−1 ∗ 𝑝 𝑠 + 𝑝 𝑟 , is the Inverse-Wishart (W −1 ) prior with hyperparameters
𝒔b∗ = 𝝁 𝒔∗ |𝒓 ∗ = 𝚺 𝑠𝑟 𝚺𝑟𝑟 (𝒓 − 𝝁𝒓 ) + 𝝁 𝒔 + , the scale matrix, and 𝜈, the number of degrees of freedom.
𝚿 ∈ S𝑃
𝚺 𝒔∗ |𝒓 ∗ = (𝚺/𝚺𝑟𝑟 ) With 𝑛 𝑝 ≡ 𝜈 + 𝑃 + 1 then
The regression matrix of 𝒔 given 𝒓 will appear from now on as det(𝚿) 𝜈/2 1 −1
W −1 (𝚺|𝚿, 𝜈) = 𝜈
det(𝚺) −𝑛 𝑝 /2 𝑒 − 2 tr(𝚿𝚺 ) (11)
−1 𝑃( 2)
2𝜈 𝑃/2 Γ
𝑩 ≡ 𝚺 𝑠𝑟 𝚺𝑟𝑟

𝚿𝒔𝒔 𝚿𝒔𝒓
For legibility, and without loss of generality, we will write all 𝚿≡ ,
𝚿𝒓 𝒔 𝚿𝒓𝒓
random vectors as zero-mean in the derivations such that for any
sample 𝑖 where Γ 𝑃 is the multivariate Gamma function. W −1 (𝚺|𝚿, 𝜈) has
𝒙𝑖 ← 𝒙𝑖 − 𝝁𝒙 . mode 𝚿/𝑛 𝑝 for 𝑛 𝑝 > 2𝑃. Its mean 𝚿/(𝑛 𝑝 − (2𝑃 + 2)) exists if
𝑛 𝑝 > 2𝑃 + 2. In our problem, for any prior P (𝚺), the mode of the
The final equations serving as numerical recipes will include the
means explicitly.
With these notations, we now turn to inferring the simulation block 3 Anderson (1957) derived the same ML estimator by integrating out the 𝒔 ∗
of the covariance 𝚺𝒔𝒔 with the help of surrogates, given (multiple in Eq. (7) to obtain the marginal likelihood for the observed samples only
realisations of) 𝒙 and 𝒙 ∗ . 𝑛𝑠
∑︁
− 2 ln L ( {𝒙 }, {𝒓 ∗ } |𝚺) = 𝑛𝑠 ln [det (𝚺) ] + 𝒙𝑻 −1

𝑖 𝚺 𝒙𝑖
𝑖=1
𝑛𝑟
3.2 Maximum-likelihood solution with surrogates ∑︁
+ 𝑛𝑟 ln [det (𝚺𝒓𝒓 ) ] + 𝒓 ∗𝑗 𝑻 𝚺𝒓𝒓 −1 𝒓 ∗𝑗 + 𝑐𝑚 ,
In a Gaussian model, the log-likelihood of 𝑛𝑠 independent and 𝑗=1
identically distributed (iid) samples of 𝒙 and 𝑛𝑟 iid samples 𝒙 ∗ of with 𝑐𝑚 the remaining constant of the model with missing 𝒔 ∗ .
MNRAS 000, 000–000 (2022)

posterior distribution is located at the Maximum A Posteriori (MAP)
estimate ★ +𝚿
MAP (𝑛𝑠 + 𝑛𝑟 )b
𝚺𝒓𝒓 𝒓𝒓
𝚺𝒓𝒓 = (19)
= argmax L ({𝒙}, {𝒓 ∗ }|𝚺) × P (𝚺)
MAP
b
𝚺𝒔𝒔
b (12) 𝑛 𝑠 + 𝑛𝑟 + 𝜈 − 𝑝 𝑠 + 𝑝 𝑟 + 1
+
𝚺𝒔𝒔 ∈S𝑃 𝑛𝑠
∑︁
In order to get a 𝚺𝒔𝒔 MAP estimate, we chose to study two ap- 𝑩
bMAP = 𝚿𝒔𝒓 + (𝒔𝑖 − 𝝁 𝒔 )(𝒓 𝑖 − 𝝁𝒓 )𝑻 (20)
proaches: solving the MAP either for the whole 𝚺 matrix and W −1 𝑖=1
𝑛𝑠 −1
prior (section 3.3.1), or for the "regression" parameters used to infer ∑︁
the 𝚺𝒔𝒔 block, which amounts to dealing with the problem solved in × 𝚿𝒓𝒓 + (𝒓 𝑖 − 𝝁𝒓 )(𝒓 𝑖 − 𝝁𝒓 )𝑻
Anderson (1957) and reparametrizing the W −1 prior (section 3.3.2). 𝑖=1
𝑻
𝚺𝒔 |𝒓 + 𝚿𝒔 |𝒓 + 𝑩
𝑛𝑠 b bMAP − 𝑩𝚿 𝚿𝒓𝒓 𝑩bMAP − 𝑩𝚿
𝚺𝒔MAP
b
|𝒓 = (21)
3.3.1 MAP with prior on the block covariance 𝚺 𝜈 + 𝑛𝑠 + 2𝑝𝑠 + 1
MAP MAP MAP b𝑻
We take P (𝚺) = W −1 (𝚺|𝚿, 𝜈). The derivation of the MAP esti- 𝚺𝒔𝒔 = 𝚺𝒔 |𝒓 + 𝑩MAP 𝚺𝒓𝒓 𝑩MAP ,
b b b b (22)
mator for the "full" covariance, in this case, bears similarity to the
well-known proof that the Inverse-Wishart distribution is a conjugate where both 𝑩
bMAP – rewritten explicitly as found in the derivation – and
𝑻
prior for the covariance matrix under a MVN likelihood (where 𝚿 𝚺 = 1/𝑛𝑠 𝑛𝑠 𝒔 𝑗 − E
Í
𝝁MAP –intervening in b
E
𝒔 |𝒓 𝒔 |𝒓 𝝁MAP 𝒔 𝑗 − E
𝑗=1
𝝁MAP –
𝒔 |𝒓 𝒔 |𝒓
becomes an additional factor of 𝚺−1 in the trace factorization of the estimators are identical to section 3.3.1. And we have written
log-likelihood). In the absence of additional unpaired surrogates in 𝚿𝒔 |𝒓 = 𝚿𝒔𝒔 − 𝚿𝒔𝒓 𝚿𝒓𝒓 −1 𝚿𝒓 𝒔 . We have dropped the 𝑃 notation here
equation (7), the MAP estimator for 𝚺 with the prior of equation (11) since the reparametrization of the likelihood and prior in terms of
would match the classical result the regression matrices instead of 𝚺 makes 𝑝 𝑠 and 𝑝 𝑟 appear sep-
𝚫 𝚫 arately. As expected, the MAP estimator for 𝚺𝒔𝒔 in this approach

𝚺+𝚿
𝑛𝑠 b 𝚺 𝚺𝒔𝒓
𝚺𝚫 = ≡ b𝒔𝒔
b b
b
𝚫 𝚫 (13) differs from the one derived in section 3.3.1 since the prior is not
𝑛𝑠 + 𝑛 𝑝 𝚺𝒓 𝒔 b𝚺𝒓𝒓
parameterization-invariant4 .
The unpaired surrogate samples, in our case, can be used in addi-
𝚫:
𝚺𝒔𝒔
tion to the standard b
3.4 Choice of the prior parameter 𝚿
★ +𝚿 𝚺𝒓𝒓 + (𝑛𝑠 + 𝑛 𝑝 )b 𝚫
𝚺𝒓𝒓
MAP (𝑛𝑠 + 𝑛𝑟 )b
𝚺𝒓𝒓 𝒓𝒓 𝑛𝑟 b How should we choose the form of the parameter matrix 𝚿? From
𝚺𝒓𝒓
b = = (14)
𝑛 𝑠 + 𝑛𝑟 + 𝑛 𝑝 𝑛 𝑠 + 𝑛𝑟 + 𝑛 𝑝 now on, we consider that the surrogate and simulation summary
𝚫 b𝚫 −1 statistics have the same dimension 𝑝 𝑠 = 𝑝 𝑟 , as this will be the case
𝑩 𝚺𝒔𝒓
bMAP = b 𝚺𝒓𝒓 (15) in section 4. In the context of an Inverse-Wishart distribution, 𝚿 must

bMAP 𝒓★ − 𝒓
𝝁MAP = 𝒔 + 𝑩
E (16) be a 2𝑝 𝑠 × 2𝑝 𝑠 symmetric positive-definite matrix.
𝒔 |𝒓
Two generic choices we will present in the following with 1) blocks
MAP b𝑻
b MAP
𝚺𝒔𝒔 𝚺𝒔𝚫|𝒓 + 𝑩
=b 𝚺𝒓𝒓
bMAP b 𝑩MAP (17) that are proportional to the identity matrix (the "identity" prior) or
𝚫

MAP

𝚫 b𝑻 2) blocks that are diagonal matrices (the "diagonal" prior). In both
𝚺𝒔𝒔
=b +𝑩bMAP b 𝚺𝒓𝒓 −b𝚺𝒓𝒓 𝑩MAP , cases, the coefficients and covariances are estimated based on the
Notice that in the absence of a prior, equation (17) reduces to equa- simulation-surrogate pairs. Readers familiar with shrinkage estima-
tion (10) and to the standard result b 𝚫 with no unpaired surrogates.
𝚺𝒔𝒔 tors may recognize these as popular shrinkage targets (other common
Priors for the simulation and surrogate means could be trivially targets appear in Table 2 from Schäfer & Strimmer (2005)). We will
included as derived in Appendix A in CWAV20. find that 𝚿 appears in our estimators in an analogous way. For other
Note that a simple limit of these equations exist for the case when particular applications, more tailored choices are of course possible.
the surrogate covariance is known exactly, This may be the case when an approximate theoretical model for
the covariances is available. As we will see in the numerical exper-
Δ

MAP,𝚺𝒓𝒓 𝚫 b𝑇 . iments in section 4, even the choice of a "diagonal" prior performs
𝚺𝒔𝒔
b =b𝚺𝒔𝒔 +B bMAP 𝚺𝑟𝑟 − 𝚺 b𝑟𝑟 B MAP (18)
well and avoids the overfitting observed in the ML estimator as long
In Appendix A2 we show that this result can be obtained by taking as 𝑛 𝑝 is chosen using the simple procedure described in section 3.5.
the limit of equation (17) for infinite number of surrogates. In this The "identity" prior demonstrated improvement over the sample co-
case no unpaired surrogates need to be generated which can lead to variance of simulations for a much higher 𝑛𝑠 than the "diagonal"
significant savings when the computational expense for generating a one, thus we will only present in section 4 computations with the
large number of unpaired surrogates is not negligible compared to "diagonal" prior. We briefly describe the priors below.
the simulation cost. In addition, any residual error in the estimate due
to a limited number of surrogates is eliminated.
3.4.1 "Identity" prior
3.3.2 MAP with prior on the regression parameters A common form adopted as a target for shrinkage estimates of co-
variance matrices is the "identity" prior: the auto-covariance of sim-
A different approach is to solve the MAP for the parameters that allow ulations and surrogates are proportional to identity matrices and the
to estimate 𝚺𝒔𝒔 = 𝚺𝒔 |𝒓 + 𝑩𝚺𝒓𝒓 𝑩𝑻 , that is to say we use a prior for
the joint distribution P (𝑩, 𝚺𝒔 |𝒓 , 𝚺𝒓𝒓 ) which is a reparametrization
4
of the 𝑝 𝑠 ( 𝑝 𝑠 + 1)/2 + 𝑝 𝑟 ( 𝑝 𝑟 + 1)/2 + 𝑝 𝑠 𝑝 𝑟 parameters of P (𝚺). For We know that for two random vectors 𝒙 and 𝒚 with 𝒚 = ℎ (𝒙), if ℎ is differ-
that, we need the properties of the blocks of a covariance sampled entiable, then for probability distributions P𝑦 (𝒚) = P𝑥 (𝒙) × det 𝑱 ℎ−1 (𝒚)
from a W −1 (𝚺|𝚿, 𝜈) distribution. A quick outline of the derivation where 𝑱 is the Jacobian matrix. So under a reparametrization, the two distri-
appears in Appendix B. With 𝑩𝚿 ≡ 𝚿𝒔𝒓 𝚿𝒓𝒓 −1 we get butions have no reason to peak at the same coordinates.
MNRAS 000, 000–000 (2022)

MAP
cross-covariance a diagonal matrix such that the correlation in each Consider the estimate b 𝚺𝒔𝒔 (𝑛 𝑝 ) as a function of 𝑛 𝑝 . This can be
bin equals to 𝜌𝜎𝑠 𝜎𝑟 , with 𝜌 ∈ [0, 1[. computed with the same 𝑛𝑠 simulations, 𝚿 prior and 𝑛𝑟 surrogates.
Then an optimal 𝑛 𝑝 can be computed by evaluating the MVN likeli-
hood L ({𝒔 𝑡𝑒𝑠𝑡 }|𝚺𝒔𝒔 (𝑛 𝑝 )) which plays the role of a utility function.
© 𝜎𝑠
2 0 𝜌𝜎𝑟 𝜎𝑠 0 ª We find the 𝑛 𝑝 that maximizes the likelihood on the test data5
.. .. ®

. . ®
® In our tests, we allow 𝑛 𝑝 ∈ È1, 4 ∗ 𝑝 𝑠 + 1É, the upper bound being
the smallest integer for which the Inverse-Wishart distribution is
𝚿id ≡ 0 𝜎𝑠2 0
®
𝜌𝜎𝑟 𝜎𝑠 ®®
(23) normalizable. While low 𝑛 𝑝 values correspond to an improper prior

𝜎𝑟2 0 ®
® we find in our numerical experiments that the likelihood rises quickly

𝑻
𝚿𝒔𝒓 ..
®
® for small values of 𝑛 𝑝 , with corresponding improvements to the MAP
. ®
® covariance estimates. Then a plateau is reached, with a shallow peak
« 0 2
𝜎𝑟 ¬ or plateau and a slow decrease as 𝑛 𝑝 increases. Within the shallow
We require 𝜌 < 1 for this matrix to be positive definite since peak the covariance estimates are robust to the precise value of 𝑛 𝑝
𝑝𝑠 and we advise choosing small values once the shallow regions is
det(𝚿𝑖𝑑 ) = 𝜎𝑠2 𝜎𝑟2 (1 − 𝜌 2 ) . This choice of 𝚿 is very simple
reached. We interpret this preference for low values as being due to
but still serves as a "regularizer" of the estimators from section 3.3. the fact that for the simple, generic priors we used (block covariance
We adopt an empirical Bayes approach, where we estimate 𝜌 and the with diagonal blocks, see section 3.4) and for the summary statistics
variances 𝜎𝑟2 and 𝜎𝑠2 directly from the simulation-surrogate pairs. at hand a minimum of regularization by the prior is nearly optimal
The estimated variance of 𝑦 𝑖 = 𝑠𝑖 or 𝑟 𝑖 , 1 ≤ 𝑖 ≤ 𝑝 𝑠 , is 𝜎𝑦2𝑖 = when 𝑛𝑠 is small. If specifically motivated prior matrices are available
1 Í 𝑛𝑠 𝑦 2
𝑛𝑠 −1 𝑗=1 𝑖, 𝑗
− 𝑦 𝑗 and the estimated covariance between 𝑠𝑖 and larger 𝑛 𝑝 could perhaps become advantageous.
𝑟 𝑖 is 𝜌𝑖 𝜎𝑟𝑖 𝜎𝑠𝑖 = 𝑛𝑠1−1 𝑛𝑗=1
Í 𝑠
𝑠𝑖, 𝑗 − 𝑠𝑖 𝑟 𝑖, 𝑗 − 𝑟 𝑖 . In equation (23), We present a summary of the estimation process, for the case of
the block covariance estimation of section 3.3.1, in Algorithm 1. We
each of the parameters 𝜎𝑠 , 𝜎𝑟 and 𝜌𝜎𝑟 𝜎𝑠 is the average of the 𝑝 𝑠
obtained nearly identical results treating 𝑛 𝑝 as a hyperparameter and
corresponding quantities, indexed by 𝑖.
introducing a (Jeffreys) scale prior for it before maximization.
Our numerical experiments with dark matter clustering statistics
strongly preferred the "diagonal" prior we discuss next.
Algorithm 1: MAP Estimator for 𝚺𝒔𝒔 given 𝑛 𝑝
Input: A collection {𝒙 𝑖 ≡ (𝒔𝑖 , 𝒓 𝑖 )} , 𝑖 ∈ È1, 𝑛𝑠 É of paired
3.4.2 "Diagonal" prior
simulation and surrogate statistics;
n o a large number of
A natural choice to regularize the Maximum-Likelihood estimate for unpaired surrogate samples 𝒓 ∗𝑗 , 𝑗 ∈ È1, 𝑛𝑟 É; a small
the covariance with simulations and surrogates is to use the estimated
number 𝑛𝑡𝑒𝑠𝑡 of simulation statistics; a block prior 𝚿;
diagonal elements of 𝚺𝒔𝒔 , 𝚺𝒔𝒓 and 𝚺𝒓𝒓 . 𝑠
a set N 𝑝 of "prior weights" 𝑛 𝑘𝑝 , 𝑘 ≤ 𝑐𝑎𝑟𝑑 (N 𝑝 ).
/* Here we compute the "loss" on a single test
2
©𝜎𝑠1 0 𝜌1 𝜎𝑟1 𝜎𝑠1 0 ª
®
simulations set for simplification, but 𝐾 -fold
cross-validation is also an option. */
.. .. ®

. . ® 1 for 𝑛 𝑘𝑝 ∈ N 𝑝 do
𝚿emp ≡ 0 0
®
𝜎𝑠2𝑝𝑠
MAP
Compute b 𝚺𝒔𝒔 (𝑛 𝑘𝑝 ) using equations (14) to (17).

𝜌 𝑝𝑠 𝜎𝑟 𝑝𝑠 𝜎𝑠 𝑝𝑠 ®® 2

0 Compute the MVN likelihood L {𝒔}𝑡𝑒𝑠𝑡 |𝚺𝒔𝒔 (𝑛 𝑝 ) .
® 3
𝜎𝑟21
®

𝑻 end
®
4
𝚿𝒔𝒓
®
..
Determine 𝑛★𝑝 = argmax N𝑝 L {𝒔}𝑡𝑒𝑠𝑡 |𝚺𝒔𝒔 (𝑛 𝑝 )
. ®

® 5
0
®
𝜎𝑟2𝑝𝑟 6 return b 𝚺MAP (𝑛★ ); E
𝒔𝒔 𝝁MAP (𝑛★ )
« ¬ 𝑝 𝒔 |𝒓 𝑝
(24)
The computation of each 𝜎𝑠2𝑖 , 𝜎𝑟2𝑖 and 𝜌𝑖 𝜎𝑠𝑖 𝜎𝑟𝑖 is the same as from
the "identity" prior above.
While having a very simple structure, we can see this prior as a 3.6 Correction factor for the precision
more adapted correction of the eigenvalues of the block matrix 𝚺
To compute confidence bounds of the cosmological parameters in
based on the data, whereas 𝚿id adds the same amount of correction
the context of a likelihood-analysis, we need to invert the covari-
on all the eigenvalues, regardless of the statistics at hand.
ance matrix estimate. We briefly explain the correction used for the
standard sample covariance.
3.5 (Cross-)validation to choose the prior hyperparameter 𝑛 𝑝
The hyperparameter 𝜈 (through 𝑛 𝑝 = 𝜈 + 𝑝 𝑠 + 𝑝 𝑟 + 1) in equation 3.6.1 Classical result for the sample covariance
(11) will be seen to determine the weight attributed to the prior in the
MAP
It is well-known that taking the inverse of the bias-corrected version
closed-form solutions for b 𝚺𝒔𝒔 . For different statistics, and in terms
of the Maximum-Likelihood estimator from equation (10), i.e 𝛾b 𝚺𝒔𝒔
of the maximum number of simulations 𝑛𝑠 one is able to run, varying
where 𝛾 ≡ 𝑛𝑠 /(𝑛𝑠 − 1), results in a biased estimator of the precision
𝑛 𝑝 via 𝜈 can significantly impact the quality of the covariance, as we
will discuss in section 4.
We propose retaining a small set {𝒔 𝑡𝑒𝑠𝑡 } of test simulations such that 5 We compared this to using 𝐾 -fold cross-validation but found no signif-
𝑛𝑠 = 𝑛cov test cov
𝑠 + 𝑛 𝑠 , where 𝑛 𝑠 plays the role of the 𝑛 𝑠 of the paired set icant impact on the determination of the optimal 𝑛 𝑝 comparatively to just
in equations (13) to (22). evaluating the likelihood once without splitting the data.
MNRAS 000, 000–000 (2022)

matrix (Kaufman 1967; Hartlap et al. 2007). For data sampled from L-PICOLA developed by Howlett et al. (2015), with a coarser force
a MVN, an unbiased estimator of the precision is cola = 512.
mesh grid size of 𝑁 𝑚
b𝒔𝒔 = 𝑛𝑠 − 𝑝 𝑠 − 2 b
𝑷 𝚺𝒔𝒔
−1
(25) 4.1.3 Post-processing of snapshots
𝑛𝑠 − 1
We chose, for this study, to include what in the cosmology lit- To extract the summary statistics from our L-PICOLA snapshots, we
erature is referred to as the "Hartlap factor" to the inverse of the used the exact same code modules and parameters used to compute
bias-corrected sample covariance of simulations summary statistics the clustering statistics available in the Quĳote data outputs. There-
(including the truth using 15,000 simulations). fore, the simulation and surrogate summary statistics have the same
dimension 𝑝 𝑠 = 𝑝 𝑟 . We transform the snapshots in density contrast
fields with the Cloud-In-Cell (CiC) algorithm. For the matter power
3.6.2 For the MAP estimates spectra and the correlation functions, we used the Python3 module
Pylians3 7 , For the bispectra, the results of which appearing in Ap-
Our MAP estimates derived in section 3.3.1 is constructed to ensure pendix C3, the post-processing code is pySpectrum8 . More details
that the result will be a symmetric positive semi-definite matrix. As can be found in CWAV20.
a consequence, we lose formal unbiasedness but gain dramatically
improved estimates according to multiple criteria, as discussed in
section 4. If unbiasedness of the covariance estimate is important the 4.2 The CARPool Bayes estimator and results on clustering
method in CW21can be used. statistics
The following tests of the Bayesian covariance estimation approach in
this paper use the sample covariance matrix with all the simulations
4 NUMERICAL EXPERIMENTS ON ΛCDM SIMULATION we have (𝑛𝑡𝑟 𝑢𝑡 ℎ = 15, 000) as the "truth" to compare with other
𝑠
STATISTICS estimates. Within the main part of this paper we only present a subset
4.1 Simulation and surrogate data of the estimators that gave the best match in terms of parameters
constraints with respect to the truth.
The simulation and surrogate solvers we use are identical to those In particular, we use the MAP estimator from section 3.3.1 with
CW21 and CWAV20. We recall the main points here for conve- the "diagonal" empirical Bayes prior 𝚿emp , equation (24), estimated
nience. For more details please refer to CWAV20. The solvers evolve on the paired set of 𝑛𝑐𝑜𝑣 simulations and surrogates. All our MAP
𝑠
Np = 5123 Cold Dark Matter (CDM) particles in a box volume of covariance estimates with simulations and surrogates use the optimal
3
1000 ℎ−1 Mpc . The simulation-surrogate sample pairs take the 𝑛 𝑝 determined through the process described in Algorithm 1 with
a small number of test simulations. We display the total number
same Second-order Lagrangian perturbation theory (2LPT) initial
of simulations used for each covariance matrix estimate as 𝑛𝑠 =
conditions at starting redshift 𝑧 𝑖 = 127.0.
𝑛cov test
𝑠 + 𝑛𝑠 .
In the following, we will refer to this approach as the "CARPool
4.1.1 𝑁-body solver Bayes" estimator.
We find that the alternative estimator written in terms of the regres-
We downloaded the 𝑁-body snapshots clustering statistics from the sion parameters, section 3.3.2, performs comparatively poorer than
publicly available Quĳote simulation suite6 (Villaescusa-Navarro the CARPool Bayes estimator. We show an example on the power
et al. 2020). The solver for all the simulations is the TreePM code spectrum covariance and discuss the reasons for this in Appendix
GADGET-III built upon the previous version GADGET-II by Springel C2. Briefly summarized, this estimator requires using a proper prior
(2005). The force mesh grid size to solve the comoving Poisson equa- and therefore affords us less flexibility in choosing the weight of the
tion at each timestep is Nm = 1024. In the following, we will use prior. It therefore tends to give covariance estimates that are more
the sample covariance of all 15,000 available realizations of the fidu- sensitive to the choice of the prior parameters.
cial cosmology as the simulation "truth", or more precisely the best The plan for the remainder of this sections is as follows: we will
covariance estimate we have access to. first present the power spectrum results in more details that were
already partially described in section 2.
4.1.2 Surrogate solver Then, we turn to the real space 2-point correlation function. This
is an interesting case because it illustrates the power of limiting the
We generate the fast surrogate samples with The COmoving La- range of the estimator to the set of all positive definite covariance
grangian Acceleration (COLA) method from Tassev et al. (2013) (see matrices, a feature of the Bayesian version of CARPool. The unbiased
also Leclercq et al. (2020)), which allows generating approximate CARPool approach to the covariance matrix in CW21 failed to yield
gravitational 𝑁-body outputs using a smaller number of timesteps positive-definite covariance estimates for this application in spite of a
than our simulation code. The principle of COLA is to add residual significant reduction of variance for the covariance matrix individual
displacements, computed with a Particle-Mesh (PM) 𝑁-body solver, elements.
to the trajectory given by analytical LPT approximations (usually For a complete comparison with CW21, we also computed results
first- or second-order). Izard et al. (2016) proposed tests of the accu- on the bispectrum covariance matrix. Since these show similar, large
racy and computational cost of COLA against 𝑁-body simulations at improvement over the CW21 approach as for the power spectrum,
different redshifts and with different timestepping parameters. Like we relegate details to Appendix C3.
in CWAV20 and CW21, we used the parallel MPI implementation
7 https://github.com/franciscovillaescusa/Pylians3
6 https://quijote-simulations.readthedocs.io/en/latest/ 8 Available at https://github.com/changhoonhahn/pySpectrum
MNRAS 000, 000–000 (2022)

4.2.1 Matter power spectrum covariance
The matter power spectrum [𝑀 𝑝𝑐3 ], at wave number 𝑘 [ℎMpc−1 ],
under the conditions of homogeneity and isotropy (cosmological
principle), is the average in 3D Fourier space of |𝛿 (𝑘) | 2 , 𝑘 ∈
[𝑘 − Δ𝑘/2, 𝑘 + Δ𝑘/2], where 𝛿(𝒙) is the matter density contrast in
real space. For each snapshot, we compute 𝛿(𝒙) on a square grid
of size 1024 with the Cloud-In-Cell (CIC) algorithm. The follow-
ing analysis is for 𝑘 ∈ 8.900 × 10−3 , 1.0 ℎMpc−1 . We have then

𝑝 𝑠 = 158 linearly space bins. Note that the power spectrum is not
compressed unlike in CWAV20 and CW21, making the covariance
estimation tasks more difficult.
Figure 1 shows that using only 𝑛𝑠 = 10 + 5 simulations with paired
surrogates and an additional set of surrogate samples, we get confi-
dence bounds for the cosmological parameters which are very close
to the ones given by the "true" sample covariance using 15,000 sim-
ulations (for 𝑝 𝑠 = 158). This result is all the more encouraging that
with only 10 simulations we would get a sample covariance of rank
at most 9. In other words, we can see the small set of simulations in Figure 2. Illustration of step 5 of Algorithm 1 for the matter power spectrum
the "CARPool Bayes " estimate a correction to the eigenvalues and example: for fixed 𝑛𝑠 = 𝑛𝑠cov + 𝑛𝑠test and fixed prior 𝚿emp , we compute the
eigenvectors of the precision matrix computed from a biased but cor- MAP
𝚺𝒔𝒔
likelihood of b (𝑛 𝑝 ) on test simulation samples.
related surrogate. We also show in Appendix C1 the relatively small
gain, in terms of closeness of the parameters confidence contours to
the truth, of running 𝑛𝑠 = 40 + 10 simulations for comparison. other estimates and avoids the characteristic underestimation of small
Here we examine the procedure to determine the best 𝑛 𝑝 for a eigenvalues for covariance matrices estimated from a small number
given 𝑛𝑠 and 𝚿 in Figure 2. There are several points to notice here: of samples.
(i) For 𝑛 𝑝 ≈ 1, especially for 𝑛𝑠 ≥ 𝑝 𝑠 + 1 (when the sample
covariance can be full-rank), the likelihood on test data rapidly in-
4.2.2 Matter correlation function covariance
creases. It shows for this case that a minimum of "regularization"
brought by the prior greatly improves the estimate of 𝚺𝒔𝒔 . The example of the two-point matter correlation function 𝜉 (𝒓) for 𝒓 ∈
(ii) Around the empirical 𝑛★𝑝 , the likelihood is rather flat and [5.0, 160.0] ℎ−1 Mpc (𝑝 𝑠 = 159) is of particular interest in our study.
slowly decreases when 𝑛 𝑝 > 𝑛★𝑝 . In other words, once a certain With the variance reduction approach in CW21 for the covariance
threshold of "improvement" is reached with 𝑛 𝑝 , misestimating 𝑛 𝑝 matrix, we found no improvement over the standard estimator. While
does not radically worsen the estimate of 𝚺𝒔𝒔 . unbiased and strongly reducing the errors of all individual elements
of the covariance matrix the resulting matrix failed to be positive-
In Figure 3, we visualize the estimated covariance matrices (top
definite. This means that no estimate for the precision matrix could
row) and their inverse (bottom row). For the "CARPool Bayes" esti-
be obtained, as would be required to derive Fisher matrices or for a
mate with the prior 𝚿emp , i.e. our "headline" estimate with 𝑛𝑠 = 10+5
likelihood approximation to derive parameter constraints.
that gives the confidence bounds in Figure 1, we notice some structure
As we can observe in Figure 5, the structure of the covariance
in the covariance due to the small number of simulations. The close-
is particular, with a band of high-magnitude covariances around the
ness to the truth of the "CARPool Bayes" covariance with very few
diagonal of variances. As a result, the precision estimate based on
simulations is particularly visible for the structure of the precision
the standard sample covariance estimator is very noisy for 𝑛𝑠 =
matrix. It can be seen that at low 𝑘, where the correlation between
200, which we compare with our estimate including surrogates from
surrogates and simulations is particularly high, the CARPool Bayes
Algorithm 1, with 𝑛𝑠 = 160 + 20. Looking at the precision matrices
estimate (and the Maximum Likelihood estimate without the prior)
(bottom row) would indicate a significantly better recovery of the
is significantly less noisy than the standard estimator even though it
structure of the true precision.
uses an order of magnitude less simulations.
In terms of cosmological parameter forecast constraints, as shown
In Figure 4, we compare the covariance estimates to the large-
in Figure 6 , we get a slight improvement with respect to the sample
sample "truth" in the spectral domain, showing the eigenvalues and
covariance matrix (and the precision including the Hartlap factor),
the co-diagonalization coefficients 9 .
but not nearly as large as for the matter power spectrum. The CARPool
At the top, we show the ordered eigenvalues ratio of each matrix. A
Bayes estimate with 𝑛𝑠 = 160 + 20 produces bounds for Ω𝑚 , 𝑛𝑠 and
ratio far from 1, and especially close to zero for the smallest eigenval-
𝜎8 that are closer to the truth than with the sample covariance with
ues as for the standard sample covariance, indicates a very poor con-
𝑛𝑠 = 180. But the confidence regions for Ω𝑏 and ℎ are not improved.
ditioning of the matrix. At the bottom, we see the co-diagonalization
Similarly to the previous section, in Figure 7, the "CARPool Bayes"
coefficients. A horizontal line at 1 would indicate that the matrices
estimator raises up the smallest eigenvalues – as well as the smallest
are identical. The CARPool Bayes estimate clearly outperforms the
"co-diagonalization" coefficient – contrarily to the ML solutions with
and without surrogates where they are close to 0.
9 For 𝑨 and 𝑩 two 𝑝 × 𝑝 real symmetric matrices, if 𝑨 is positive definite, The wide band of correlations visible in Figure 5 indicate that
then there exists a matrix 𝑴 such that 𝑴 𝑻 𝑨𝑴 = 𝑰 𝑝 and 𝑴 𝑻 𝑩𝑴 = 𝑫 with our choice of "diagonal" prior is not optimal for this case. Choosing
𝑫 = diag(𝑑1 , . . . , 𝑑 𝑝 ). We call the 𝑑𝑖 "co-diagonalization coefficients." This a prior with a more gradual falloff of correlation from the diagonal
is a simplified statement from theorem A9.9 in Muirhead (1982). If D = 1 𝑝 would likely produce better results. Figure 8 indicates that for various
then A = B. number of simulations 𝑛𝑠 , the CARPool Bayes estimator for the
MNRAS 000, 000–000 (2022)

Figure 3. Matter power spectrum covariance estimates

√︂ (top row) and their inverse (bottom row). We show the covariances as correlation matrices with the

normalization 𝑫−1 b
𝚺𝑫−1 with the diagonal 𝑫 = diag b𝚺 , and the precision matrices below are the inverses of these correlation matrices. Columns from left to
right show the standard sample covariance estimate from 200 simulations; the reference covariance from 15,000 simulations; the Maximum Likelihood estimate
using the combination of surrogates and 200 simulations (section 3.2); and the "CARPool Bayes" estimate with a "diagonal" prior, combining 𝑛𝑠 = 10 + 5
GADGET simulations with surrogates.
matter correlation function covariance consistently prefers low 𝑛 𝑝 vious approaches such as the sample covariance or the first-order
(i.e. prior weight) values with the "diagonal" prior from section CARPool approach in CW21 according to multiple criteria. These
3.4.2. improvements are particularly noticeable for the inverse covariance
In summary, the application to the matter correlation function, or precision matrix required for many applications such as comput-
demonstrates that the CARPool Bayes estimator is guaranteed to ing Fisher matrices, or for the Gaussian likelihood approximations
produce positive definite matrices. It visibly improves the structure frequently used for parameter estimation.
of the precision matrix (Figure 5) and the relative errors of the Our Bayesian approach combines estimations for the both the mean
small eigenvalues (Figure 7). This translates into some, but not all, (through the well known regression 𝝁 𝒔 |𝒓 , equation 16) and the covari-
parameter confidence bounds being closer to the truth than for the ance of simulation summary statistics using surrogates. In this paper
sample covariance based on 180 simulations. we focused on showing the results for the simulation covariance esti-
mates 𝚺𝒔𝒔 since this is the first time that the control variate approach
has been cast in a Bayesian framework for covariance estimation.
Our Bayesian approach used a multivariate Gaussian model for the
5 DISCUSSION AND CONCLUSION
simulations and surrogates and includes a conjugate Inverse-Wishart
We consider the problem of estimating the covariance matrices of distribution prior for the covariance matrix. In the generic case we
cosmological summary statistics within a Bayesian framework, when found a "diagonal" prior on the block covariance of simulation and
paired simulations and surrogates are available. surrogate summary statistics, whose diagonal elements were eval-
This study constitutes an extension of the CARPool principle, uated on simulation-surrogate pairs, section 3.4.2, to give excellent
presented in CWAV20 and applied to covariance matrices in CW21. results, especially for the matter power spectrum and bispectrum. We
Our method improves on the latter work by solving a Maximum A obtain the same confidence bounds as with the true covariance of the
Posterior optimization directly in the space of symmetric positive power spectrum with 𝑝 𝑠 = 158 bins up to 𝑘 ≈ 1.0 ℎMpc−1 with only
semi-definite matrices and allows introducing priors in analogy to 𝑛𝑠 = 10 + 5 simulations. In this case, we can think of the actual 10
frequentist shrinkage estimators. We prove that our approach, dubbed simulations of the covariance estimate as correcting the eigenspec-
CARPool Bayes, guarantees positive definite estimates, for the price trum of the well-converged covariance of the correlated surrogate
of abandoning the guarantee of unbiasedness of individual covariance that incorporates many samples.
matrix elements provided by the first order estimator described in The same outstanding gain appears for the bispectrum, as we show
CW21. in Appendix C3 for two triangle configurations. This demonstrates
By casting CWAV20 in a Bayesian framework we provide a new the superiority of the CARPool Bayes approach over CW21.
solution to covariance estimation with simulations and surrogates. Regarding the 2-point matter correlation function in real space,
We demonstrate that this estimator can strongly improve over pre- we do get positive-definite estimates by construction—this is not
MNRAS 000, 000–000 (2022)

5.1 Generating samples from the posterior
As an alternative to focusing on closed-form point-estimates of the
covariance by taking the Maximum A Posterior (MAP) of the pos-
terior distribution we could have considered generating samples of
the simulation covariance matrix from the posterior. This is possible
using a Gibbs sampling approach where we explicitly include the
missing simulations 𝒔∗ as latent variables. We briefly sketch the ap-
proach here: first draw 𝚺 from a conditional Inverse-Wishart for pos-
itive (semi-)definite covariance matrices given the data augmented
by the latest 𝒔∗ sample. Since the augmented data is a complete
set of simulation-surrogate pairs the 𝚺 sample would therefore be
guaranteed to be positive (semi-)definite. Simply extracting the sim-
ulation auto-correlation block from 𝚺 would produce samples from
the marginal posterior for 𝚺𝒔𝒔 .
While samples from the marginal posterior would potentially be
useful to propagate the uncertainty in the estimates, or to study
other posterior summaries such as the posterior mean, we do not
explore this approach further, for two reasons: one is computational
cost though that is perhaps tolerable for summaries with moderate
dimension (i.e., up to O (100)); the other is that we would like to
obtain a point estimate for the covariance that we can use in other
contexts, without worrying if the Monte Carlo estimate, e.g., of the
posterior mean of the signal covariance, has converged to sufficient
accuracy.
Figure 4. Comparison of the CARPool Bayes covariance estimate (section 5.2 Potential for future applications in cosmology and beyond
3.3.1), the standard ML estimator, and the ML estimator combining simula- Our numerical experiments demonstrate the capability of running
tions and surrogates (section 3.2) with the large-sample "truth" in the spectral
fewer intensive simulations in order to get theoretical predictions
domain. We show ordered eigenvalue ratio at the top and co-diagonalisation
coefficients at the bottom. The CARPool Bayes estimator avoids the character- of the means and covariances of observables for next-generation
istic underestimation of small eigenvalues for covariance matrices estimated surveys. Many additional applications of these techniques remain to
from a small number of samples. See discussion in the text. be explored. The free choice of what to use as surrogates makes our
methods very broadly applicable.
Some surrogates might be useful because they are nearly free com-
putationally. A case in point would be Eulerian linear perturbation
theory for the power spectrum applied to the initial conditions of
an 𝑁-body simulation. In this case each simulation comes with the
guaranteed in CW21—and we obtain a slight improvement on the paired surrogate for free (since the initial conditions are necessary to
parameter constraints with respect to the sample covariance of sim- run the simulation in the first place) and its expectation and covari-
ulations when 𝑛𝑠 & 𝑝 𝑠 + 1 is close to the dimension of the summary ance can be computed analytically nearly for free. It could be argued
statistics. But in a case where running a high enough number (we that such automatic surrogates ought to be exploited systematically
tested 𝑛𝑠 = 300) of simulations is possible, the gain over the sample when predicting commonly used clustering statistics from simula-
covariance diminishes as 𝑛𝑠 increases, at least regarding the impact tions. A very similar application of this idea to a non-perturbative
of the matter correlation function covariance on the parameter con- statistic would be to the computation of halo number functions: apply
straints. the Press-Schechter approach to the initial conditions as a surrogate
Throughout this study, we applied the "diagonal" empirical Bayes for the mass function for a given realisation. In this example, the
prior through taking the diagonal of each block of the summary statis- classical Press-Schechter formula provides the expectation of the
tics as a concatenation of the simulation and surrogate output. Using surrogate and would reduce the variance in the number function for
the former was sufficient to demonstrate the capability of the method the largest (and rarest) clusters in the simulations, thus increasing the
for the case where we consider the problem of estimating the whole effective volume of the simulations.
block covariance 𝚺 to then extract the simulation block 𝚺𝒔𝒔 . We In other cases, the surrogates may consist of costly simulations
derived new MAP estimators in section 3.3.2 where we directly esti- that have already been run at a different set of parameters. In this
mate the regression parameters allowing to compute the simulation case it may be possible to "update" the means and covariances from
block of the covariance using 𝚺 𝑠𝑠 ≡ (𝚺/𝚺𝑟𝑟 ) + 𝚺 𝑠𝑟 𝚺𝑟𝑟 −1 𝚺 the previous simulation set to a new set of parameters by pairing a
𝑟 𝑠 from
the hypothesis of data sampled from a MVN distribution. This new small number of the existing old simulations (now surrogates) with
estimator did not provide improvement in our tests over the sample the same number of new simulations.
covariance, which we attribute to the strong prior dependence inher- The availability of perturbative results and analytical estimates,
ent in it. We leave for future studies the question whether the this the increasing need for accurate simulations to analyze current and
different parametrization can turn beneficial for cosmological survey upcoming data sets in all subfields of cosmology, and the vast param-
forecasts when theoretically motivated covariances for the prior are eter space to explore with cosmological simulations make it likely
available. that the concepts described here will continue to find powerful appli-
MNRAS 000, 000–000 (2022)

Figure 5. Matter correlation function covariance estimates (top) and their inverse (bottom), shown similarly to Figure 3. The "CARPool Bayes" estimates uses
𝑛𝑠 = 160 + 20 GADGET simulations.
cations. We look forward to seeing the cosmological advances that APPENDIX A: DERIVATION OF ESTIMATORS AND
CARPool will enable. PROOF OF POSITIVE DEFINITENESS USING
EXPECTATION MAXIMIZATION
A1 Expectation Maximization
In this section, we aim at showing the equivalence of the results given
by the Expectation-Maximum algorithm – which naturally comes to
ACKNOWLEDGEMENTS mind in the presence of missing samples – and the simple result
from the Maximum-Likelihood and Maximum a Posteriori problems
We warmly thank Ethan Anderes and Francisco Villaescusa-Navarro formulated in sections 3.2 and 3.3.1.
for stimulating discussion and feedback. N.C. acknowledges funding
from LabEx ENS-ICFP (PSL). B.D.W. acknowledges support by the
ANR BIG4 project, grant ANR-16-CE23-0002 of the French Agence A1.1 Iterative algorithm
Nationale de la Recherche; and the Labex ILP (reference ANR-10- The Expectation-Maximization (EM) algorithm (Dempster et al.
LABX-63) part of the Idex SUPER, and received financial state aid 1977) is an iterative technique to maximize the likelihood (or pos-
managed by the Agence Nationale de la Recherche, as part of the terior) in the presence of missing data. Briefly, it works by casting
programme Investissements d’avenir under the reference ANR-11- the problem as a sequence of simpler optimization problems. Each
IDEX-0004-02. The Flatiron Institute is supported by the Simons iteration consists of two steps: the E-step which removes the missing
Foundation. This work has made use of the Infinity Cluster hosted data from the log-likelihood by taking its expectation with respect to
by Institut d’Astrophysique de Paris. the missing data assuming the current iterates are the true values of
the parameters; and the M-step which updates the parameters by find-
ing their values that maximize the expected log-likelihood from the
E-step. We focus in this appendix on the covariance estimation; in-
cluding the solution for the estimators of the mean is straightforward
and we give the result in the main text.
DATA AVAILABILITY We recall Eq. (7) here for convenience as a starting point
− 2 ln L ({𝒙}, {𝒙 ∗ }|𝚺) = (𝑛𝑠 + 𝑛𝑟 ) ln [det (2𝜋𝚺)]

The data samples underlying this article are available through
globus.org, and instructions to reproduce the summary statis- ∑︁𝑛𝑠
! 𝑛𝑟
tics from snapshots can be found at https://github.com/ 𝑻 −1 ©∑︁ ∗ 𝑻 −1 ∗ ª
+ 𝒙𝑖 𝚺 𝒙𝑖 + 𝒙 𝑗 𝚺 𝒙 𝑗® , (A1)
franciscovillaescusa/Quijote-simulations. Additionally, 𝑖=1 « 𝑗=1 ¬
a Python3 package with code examples and documentation is pro-
vided at https://github.com/CompiledAtBirth/pyCARPool E-step. Consider conditional expectation of the log-likelihood over
to experiment with CARPool. missing data 𝒔∗ given the (observed) data and the covariance at the
MNRAS 000, 000–000 (2022)

Figure 6. Confidence contours of the cosmological parameters computed using the Fisher matrix based on the estimated matter correlation function covariance
matrix. The estimators which we compare are the same as in Figure 1.
𝑘-th step, 𝚺[𝑘] We stress that equations (A4), (A5) and (A6) depend on 𝑘 because
− 2E𝒔∗ |𝒓 ∗ ln L ({𝒙}, {𝒙 ∗ }|𝚺[𝑘]) = (𝑛𝑠 + 𝑛𝑟 ) ln [det (𝚺[𝑘])]
we use 𝚺[𝑘] as 𝚺.
! Writing
𝑛𝑠
© 𝑛𝑟 ∗ 𝑻
∑︁  ∑︁ 
𝑻 −1 −1 ∗ ª
+ 𝒙 𝑖 𝚺[𝑘] 𝒙 𝑖 + E𝒔∗ |𝒓 ∗ 
 𝒙 𝑗 𝚺[𝑘] 𝒙 𝑗 ® + 𝑐 𝑛𝑠
∑︁
𝑖=1  𝑗=1
« ¬
 𝑛𝑠 𝚺
b= 𝒙 𝑖 𝒙𝑻𝑖
Using linearity of expectation we can look at each summand of 𝑖=1
the last term on the RHS

h i h i and
E 𝒙 ∗𝑻𝑖 𝚺[𝑘] −1 𝒙 ∗𝑖 = tr 𝚺[𝑘] −1 E 𝒙 ∗𝑖 𝒙 ∗𝑖 𝑻 (A2)
𝑛𝑟
∑︁
Define 𝑛𝑟 𝑨[𝑘] = 𝑨𝑖 [𝑘],
h i 𝑨 𝑨𝑖,𝒔𝒓
𝑖=1
𝑨𝑖 ≡ E 𝒙 ∗𝑖 𝒙 ∗𝑖 𝑻 = 𝑻
𝑖,𝒔𝒔
. (A3)
𝑨𝑖,𝒔𝒓 𝑨𝑖,𝒓𝒓
we find the expected log-likelihood
Then
− 2E𝒔∗ |𝒓 ∗ ln L ({𝒙}, {𝒙 ∗ }|𝚺[𝑘]) = (𝑛𝑠 + 𝑛𝑟 ) ln [det(𝚺[𝑘])]

𝑨𝑖,𝒔𝒔 = (𝚺/𝚺𝒓𝒓 ) + 𝑩𝒓 ∗𝑖 𝒓 ∗𝑖 𝑻 𝑩𝑻 (A4)
𝑨𝑖,𝒔𝒓 = 𝑩𝒓 ∗𝑖 𝒓 ∗𝑖 𝑻 (A5) (A7)
h i
−1
𝑨𝑖,𝒓𝒓 = 𝒓 ∗𝑖 𝒓 ∗𝑖 𝑻 (A6) + tr 𝚺 b + 𝑛𝑟 𝑨[𝑘]
𝑛𝑠 𝚺 +𝑐
MNRAS 000, 000–000 (2022)

to
1
𝚺[𝑘 + 1] = b + 𝑛𝑟 𝑨[𝑘] + 𝚿
𝑛𝑠 𝚺 (A9)
𝑛𝑠 + 𝑛𝑟 + (𝜈 + 𝑃 + 1)
When 𝑛𝑠 ≈ 𝑃, the Maximum A Posteriori (MAP) estimator is quite
different from the ML estimator.
A1.3 Proof that EM iterations conserve positive (semi-)

definiteness of 𝚺
To prove the positive definiteness of the estimated covariance matrix,
we recall the following very useful characterization of positive semi-
definite (psd) matrices using the Schur complement:
Lemma (e.g., Gallier (2011)): Let 𝑴 22 be positive definite, 𝑴 22 >
0. Then

𝑴 11 𝑴 12
𝑴= ≥0 (A10)
𝑴 𝑇12 𝑴 22
if and only if ( 𝑴/𝑴 22 ) ≥ 0.

We wish to show that as long as we have enough surrogates such
that covariance matrix estimated from them is positive definite, then
it is true that if we initialize 𝚺[0] such that (𝚺[0]/𝚺𝒓𝒓 [0]) ≥ 0 then
𝚺[𝑘] ≥ 0 throughout the EM iteration and therefore also for the fixed
point. This follows directly from Lemma 1, as follows.
At step 𝑘 of the EM iteration assume 𝚺[𝑘] is such that
(𝚺[𝑘]/𝚺𝒓𝒓 [𝑘]) ≥ 0. By assumption we always have enough sur-
Figure 7. Same as Figure 4 for the matter correlation function. rogates that 𝑨𝑟𝑟 > 0. Therefore 𝑨𝑟𝑟 is invertible and we have that
( 𝑨/𝑨22 ) = (𝚺[𝑘]/𝚺𝒓𝒓 [𝑘]) ≥ 𝑨 (A11)
by assumption. This implies 𝑨 > 0 by the Lemma. The sum of

two psd matrices is itself psd, and since 𝚺b is manifestly psd, this
guarantees that 𝚺[𝑘 + 1] ≥ 0. The "only if" direction of the Lemma
guarantees that (𝚺[𝑘 + 1]/𝚺𝒓𝒓 [𝑘 + 1]) ≥ 0 at the next iteration.
Therefore, 𝚺[𝑘] ≥ 0 for all 𝑖 ≥ 0 by induction.
A1.4 The Maximum Likelihood and A Posteriori solutions as Fixed

Point of the EM iterations
While the iterations are computationally very light, since we have
closed-form solutions for the iterative updates (sections 3.2 and 3.3),
we can do even better by deriving a closed-form solution directly for
the iterative fixed point and thus demonstrate the equivalence with
EM. Solving 𝚺[𝑘 + 1] = 𝚺[𝑘] ≡ 𝚺 bMAP by combining equation (A8)
with equations (A4), (A5) and (A6) gives
Figure 8. Same plot as in Figure 2 for the matter correlation function, still
𝑛𝑟 𝑨𝒓𝒓 + (𝑛𝑠 + 𝑛 𝑝 )b 𝚫
𝚺𝒓𝒓
with 𝚿emp . EM
𝚺𝒓𝒓
b = (A12)
𝑛 𝑠 + 𝑛𝑟 + 𝑛 𝑝
EM 𝚫
M-step. Maximizing the expected log-likelihood, Eq. (A7) to find 𝚺
d 𝒔𝒓 = (𝑛𝑠 + 𝑛 𝑝 )b
𝚺𝒔𝒓 (A13)
the next value of the parameter is now trivial: EM −1

× (𝑛𝑠 + 𝑛𝑟 + 𝑛 𝑝 )1 𝑝𝑠 − [ 𝚺
d 𝑟𝑟 ] 𝑛𝑟 𝑨𝒓𝒓 (A14)
1 b
EM EM −1
𝚺[𝑘 + 1] = 𝑛𝑠 𝚺 + 𝑛𝑟 𝑨[𝑘] (A8) bEM = 𝚺
𝑩 [𝚺 ] (A15)
𝑛 𝑠 + 𝑛𝑟 d 𝒔𝒓 d 𝑟𝑟
𝚫 +𝑛 𝑩 EM ( 𝑨 − b EM EM𝑇
EM (𝑛𝑠 + 𝑛 𝑝 )b
𝚺𝒔𝒓 𝑟b 𝒓𝒓 𝚺𝒓𝒓 )𝑩
b
𝚺
d 𝒔𝒔 = (A16)
A1.2 Inclusion of an Inverse-Wishart prior for 𝚺 𝑛𝑠 + 𝑛 𝑝
The generalization to maximizing the posterior for 𝚺 with a conjugate Equation (A16) is equivalent to equation (17), even though it looks
prior taking the Inverse-Wishart form is immediate. Taking 𝚿 to be more complicated. This is because solving for the EM introduces the
the parameter of the prior, 𝑃 = 2 dim(𝑠), and 𝜈 > 𝑃 − 1 the number covariance of the unpaired surrogates only 𝑨𝒓𝒓 and not the covariance
of degrees of freedom, then this modifies the EM update, Eq. (A8) of the paired and unpaired surrogates b ★.
𝚺𝒓𝒓
MNRAS 000, 000–000 (2022)

A2 Case when the surrogate covariance is known
− 2 ln P (𝚺|{𝒙}, {𝒓 ∗ }) = (𝑛𝑠 + 𝑛𝑟 + 𝜈 − 𝑝 𝑠 + 𝑝 𝑟 + 1) ln [det(𝚺𝒓𝒓 )]

We can rewrite the simulation summary statistics covariance from
section 3.3.1 as
+ (𝑛𝑠 + 𝜈 + 2𝑝 𝑠 + 1) ln det(𝚺𝒔 |𝒓 )
𝑠 +𝑛𝑟
!
Δ 𝑇 𝑛∑︁
𝑛𝑟
MAP = b 𝚫
𝚺

𝑠𝑠 𝚺𝒔𝒔 + bMAP 𝚺
B d𝑟𝑟 − 𝚺
d 𝑟𝑟 B b
MAP . (A17) + Tr 𝑻
(𝒓 𝑖 − 𝝁𝒓 )(𝒓 𝑖 − 𝝁𝒓 ) + 𝚿𝒓𝒓 𝚺𝒓𝒓 −1
𝑛𝑟 + 𝑛 𝑠 + 𝑛 𝑝
𝑖=1
𝑛𝑠
∑︁
which is strictly equivalent to equation (17).
+ Tr (𝒔 𝑗 − 𝝁 𝒔 |𝒓 )(𝒔 𝑗 − 𝝁 𝒔 |𝒓 )𝑻 + 𝚿𝒔 |𝒓
The case when a theoretical covariance 𝚺𝑟𝑟 for the surrogates is
𝑗=1
available directly obtains from the limit of Eq. (A17) as 𝑛𝑟 → ∞ !

𝑻 𝑻 𝑻 𝑻 𝑻 −1
MAP,𝚺𝒓𝒓 + (𝑩 − 𝚪 ) 𝚿𝒓𝒓 (𝑩 − 𝚪 ) 𝚺𝒔 |𝒓 (B3)
𝚺𝒔𝒔 = lim b 𝚺MAP =
𝑛𝑟 →∞ 𝒔𝒔
b
Δ 𝜕 ln[ P (𝚺 | {𝒙 }, {𝒓 ∗ }) ]

𝚫 b𝑇 .
𝚺𝒔𝒔
b +B bMAP 𝚺𝑟𝑟 − 𝚺
b𝑟𝑟 BMAP (A18) We then solve successively 𝜕𝛼 = 0 for 𝛼 ∈
𝚺𝒓𝒓 , 𝑩𝑻 , 𝚺𝒔|𝒓 , after a bit of derivation and linear algebra, and

we find the solutions from section 3.3.2, i.e. equations (19), (20) and
MAP
(21) which allow to compute b 𝚺𝒔𝒔 from equation (22)
APPENDIX B: MAP DERIVATION (REGRESSION APPENDIX C: SOME ADDITIONAL RESULTS

PARAMETERS) C1 Relative gain for the power spectrum
This section presents the derivation of the closed-form solutions for We simply show the confidence bounds for the ΛCDM parameters
the covariance in section 3.3.2. We can extend Anderson’s in An- using the power spectrum covariance matrix, this time with more
derson (1957) derivation by including an Inverse-Wishart Prior with simulations for the CARPool Bayes covariance, i.e 𝑛𝑠 = 40 + 10
parameters 𝚿 and 𝜈. Under the hypothesis that the block covariance versus 𝑛𝑠 = 10 + 5 in section 2. Figure C1 shows CARPool Bayes
Σ of simulation and surrogates summary statistics is drawn from and marginal bounds even closer to the truth than in Figure 1 at the price
Inverse-Wishart distribution (equation (11), the following properties of running 50 simulations in total instead of 15. This demonstrates the
hold true: relative gain of running more simulations is small for the covariance
matrix when the simulation and surrogate summary statistics are well
(i) 𝚺𝒓𝒓 𝚺𝒓𝒓 −1 𝚺𝒓 𝒔 = 𝑩𝑻 .
|= |=
correlated.
(ii) 𝚺𝒓𝒓 𝚺𝒔 |𝒓 .
(iii) 𝚺𝒓𝒓 ∼ W −1 (𝚿𝒓𝒓 , 𝜈 − 𝑝 𝑠 ).
(iv) 𝚺𝒔 |𝒓 ∼ W −1 (𝚿𝒔 |𝒓 , 𝜈) with 𝚿𝒔 |𝒓 ≡ (𝚿/𝚿𝒓𝒓 ). C2 MAP on the regression parameters
(v) 𝑩𝑻 |𝚺𝒔 |𝒓 ∼ MN (𝚿𝒓𝒓 −1 𝚿𝒓 𝒔 , 𝚺𝒔 |𝒓 ⊗ 𝚿𝒓𝒓 −1 ) where ⊗ is the
Kronecker product and MN designates the matrix normal distribu- We chose to present one particular example of the MAP estimate
tion. from section 3.3.2 on the power spectrum, which showed the most
successful results with the "block" parametrization from section 3.3.1
This is particularly convenient for our problem and we can extend An- with only 𝑛𝑠 = 10 + 5 simulations. We fix 𝜈 = 2 ∗ 𝑝 𝑠 + 2 in this
derson’s result straightforwardly to a Maximum A Posteriori (MAP) case and do not consider it a free parameter, nor do we allow it
estimate. In particular, we can re-parametrize the distribution to define an improper prior, i.e. we do not allow 𝜈 ≤ 2 ∗ 𝑝𝑠 − 1.
This corresponds to the lowest integer for which the expectation of
P (𝚺) = P (𝑩𝑻 |𝚺𝒔 |𝒓 )P (𝚺𝒔 |𝒓 )P (𝚺𝒓𝒓 ) . (B1) the Inverse-Wishart exists. In Figure C2, the marginal confidence
bounds are much wider than the truth for both 𝑛𝑠 = 10 and 𝑛𝑠 =
160 for the CARPool Bayes estimator (this time the "regression"
Let’s index the unpaired surrogate samples as 𝒓 𝑖 with 𝑖 = 1, . . . , 𝑛𝑟 framework from section 3.3.2). Since the MAP on the regression
and the surrogate samples that are part of the pairs 𝒙 𝑖 𝑖 = 1, . . . , 𝑛𝑠 parameters does not allow for an improper prior, the estimator of the
as 𝒓 𝑖 with 𝑖 = 𝑛𝑟 + 1, . . . , 𝑛𝑟 + 𝑛𝑠 . We factorize the likelihood as simulation covariance puts too much weight on the naïve diagonal
Anderson, that is to say empirical Bayes prior we use (section 3.4.2). For future studies, we
can explore whether having a "smarter prior", for instance a model
𝑛𝑠 𝑛𝑟
Ö Ö covariance computed from theoretical approximations to parametrize
L ({𝒙}, {𝒓 ∗ }|𝚺) = P (𝒙 𝑖 | 𝝁 𝒔 , 𝚺𝒔𝒔 ) P (𝒓 𝑗 | 𝝁𝒓 , 𝚺𝒓𝒓 ) (B2)
the Inverse-Wishart distribution, can significantly improve or not both
𝑖=1 𝑗=1
the CARPool Bayes estimators from sections 3.3.1 and 3.3.2.
𝑠 +𝑛𝑟
𝑛Ö 𝑛𝑠
Ö
= P (𝒓 𝑖 | 𝝁𝒓 , 𝚺𝒓𝒓 ) P (𝒔𝑖 | 𝝁 𝒔𝑖 |𝒓 𝑖 , 𝚺𝒔 |𝒓 )
𝑖=1 𝑖=1
C3 Results from the bispectrum
The right hand side depends separately on 𝚺𝒓𝒓 , 𝚺𝒔 |𝒓 and 𝑩𝑻 (through Here, we directly present the confidence bounds for the ΛCDM pa-
𝝁 𝒔 |𝒓 ) as the prior, so we can solve the MAP problem from equation rameter found using various covariance estimators of the bispectrum.
(12). The motivation here is to demonstrate the improvement over CW21
Then the natural logarithm posterior distribution is for the same summary statistics. The first summary statistics we
MNRAS 000, 000–000 (2022)

Figure C1. Fisher confidence contours of the cosmological parameters based on the estimated covariance matrix of the matter power spectrum. The estimators
which we compare are the same as in Figure 1, except that we have now 𝑛𝑠 = 40 + 10 simulations for CARPool Bayes (empirical Bayes prior on the block
covariance).
test is the set of squeezed isosceles triangles, that is to say the bis- REFERENCES
pectra computed for 𝑘 1 = 𝑘 2 and in ascending order of the ratio
Alsing J., Wandelt B., 2018, MNRAS, 476, L60
𝑘 3 /𝑘 1 ≤ 0.20 (𝑝 𝑠 = 98 in this case).
Alsing J., et al., 2020, ApJS, 249, 5
Alves de Oliveira R., Li Y., Villaescusa-Navarro F., Ho S., Spergel D. N.,
2020, arXiv e-prints, p. arXiv:2012.00240
Figure C3 demonstrate that we get parameter constraints much Anderson T. W., 1957, Journal of the American Statistical Association, 52,
more representative of the truth with 𝑛𝑠 = 20 + 10 simulations that 200
with the sample covariance using 𝑛𝑠 = 110 simulations. The CAR- Angulo R. E., Pontzen A., 2016, MNRAS, 462, L1
pool Bayes estimator is the one from section 3.3.1 using the empirical Angulo R. E., Zennaro M., Contreras S., Aricò G., Pellejero-Ibañez M.,
Bayes prior from section 3.4.2. Stücker J., 2020, arXiv e-prints, p. arXiv:2004.06245
Bai Z. D., Yin Y. Q., 1993, The Annals of Probability, 21, 1275
Bernardeau F., Colombi S., Gaztanaga E., Scoccimarro R., 2002, Phys. Rept.,
367, 1
Then, we take a look at the reduced bispectrum of equilateral Blot L., Corasaniti P. S., Alimi J.-M., Reverdy V., Rasera Y., 2014, Monthly
equilateral triangles with 𝑘 1 = 𝑘 2 = 𝑘 3 varying up to 𝑘 max = 0.75 Notices of the Royal Astronomical Society, 446, 1756
ℎMpc−1 (𝑝 𝑠 = 40). In Figure C4, we observe the CARPool Bayes Blot L., Corasaniti P. S., Amendola L., Kitching T. D., 2016, MNRAS, 458,
estimator gives almost identical parameters marginal contours to the 4462
truth with only 𝑛𝑠 = 10 + 5 simulations, while the sample covariance Blot L., et al., 2019, MNRAS, 485, 2806
of simulations uses 𝑛𝑠 = 100 simulations. Chartier N., Wandelt B. D., 2021, Monthly Notices of the Royal Astronomical
MNRAS 000, 000–000 (2022)

Figure C2. Confidence contours of the cosmological parameters computed using the Fisher matrix based on the estimated matter power spectrum covariance
The "CARPool Bayes" estimates follow the computations of section 3.3.2, where the prior, still the empirical Bayes one from section 3.4.2, is parametrized
given the regression parameters. We stress that this is the only Figure in the paper that shows a computation of the "regression" MAP from section 3.3.2.
Society, 509, 2220 Friedrich O., Eifler T., 2018, MNRAS, 473, 4150
Chartier N., Wandelt B., Akrami Y., Villaescusa-Navarro F., 2021, Monthly Friedrich O., et al., 2021, MNRAS,
Notices of the Royal Astronomical Society, 503, 1897 Gallier J., 2011, Geometric Methods and Applications. Springer New
Cheng S., Yu H.-R., Inman D., Liao Q., Wu Q., Lin J., 2020, arXiv e-prints, York, doi:10.1007/978-1-4419-9961-0, https://doi.org/10.1007%
p. arXiv:2003.03931 2F978-1-4419-9961-0
Chuang C.-H., Kitaura F.-S., Prada F., Zhao C., Yepes G., 2015, MNRAS, Garrison L., 2019, PhD thesis, University Of Washington
446, 2621 Garrison L. H., Eisenstein D. J., Ferrer D., Tinker J. L., Pinto P. A., Weinberg
Colavincenzo M., et al., 2019, MNRAS, 482, 4883 D. H., 2018, ApJS, 236, 43
Dai B., Seljak U., 2020, arXiv e-prints, p. arXiv:2010.02926 Giocoli C., et al., 2021, A&A, 653, A19
DeRose J., et al., 2019, The Astrophysical Journal, 875, 69 Habib S., et al., 2016, New Astron., 42, 49
Dempster A. P., Laird N. M., Rubin D. B., 1977, Journal of the Royal Statistical Hall A., Taylor A., 2019, MNRAS, 483, 189
Society. Series B (Methodological), 39, 1 Harnois-Déraps J., Pen U.-L., 2013, MNRAS, 431, 3349
Desjacques V., Jeong D., Schmidt F., 2018, Phys. Rept., 733, 1 Harnois-Déraps J., Vafaei S., Van Waerbeke L., 2012, MNRAS, 426, 1262
Ding Z., et al., 2022, arXiv e-prints, p. arXiv:2202.06074 Harnois-Déraps J., Pen U.-L., Iliev I. T., Merz H., Emberson J. D., Desjacques
Dodelson S., Schneider M. D., 2013, Phys. Rev. D, 88, 063537 V., 2013, MNRAS, 436, 540
Eifler, T. Schneider, P. Hartlap, J. 2009, A&A, 502, 721 Harnois-Déraps J., Giblin B., Joachimi B., 2019, A&A, 631, A160
Escoffier S., et al., 2016, arXiv e-prints, p. arXiv:1606.00233 Hartlap J., Simon P., Schneider P., 2007, A&A, 464, 399
Favole G., Granett B. R., Silva Lafaurie J., Sapone D., 2020, arXiv e-prints, Hassan S., et al., 2021, arXiv e-prints, p. arXiv:2110.02983
p. arXiv:2004.13436 He S., Li Y., Feng Y., Ho S., Ravanbakhsh S., Chen W., Póczos B., 2019,
Feng Y., Chu M.-Y., Seljak U., McDonald P., 2016, MNRAS, 463, 2273 Proceedings of the National Academy of Sciences, 116, 13825–13832
MNRAS 000, 000–000 (2022)

Figure C3. Confidence contours of the cosmological parameters computed using the Fisher matrix based on the estimated matter bispectrum covariance matrix,
for a set of squeezed isosceles triangles. The estimators result from the same computations as in Figure 1
.
Heavens A. F., Jimenez R., Lahav O., 2000, MNRAS, 317, 965 Lippich M., et al., 2019, MNRAS, 482, 1786
Hikage C., Takahashi R., Koyama K., 2020, Phys. Rev. D, 102, 083514 Lucie-Smith L., Peiris H. V., Pontzen A., 2019, MNRAS, 490, 331
Howlett C., Manera M., Percival W., 2015, Astronomy and Computing, 12, Lucie-Smith L., Peiris H. V., Pontzen A., Nord B., Thiyagalingam J., 2020,
109–126 arXiv e-prints, p. arXiv:2011.10577
Ishiyama T., Fukushige T., Makino J., 2009, PASJ, 61, 1319 Maksimova N. A., Garrison L. H., Eisenstein D. J., Hadzhiyska B., Bose S.,
Izard A., Crocce M., Fosalba P., 2016, Monthly Notices of the Royal Astro- Satterthwaite T. P., 2021, MNRAS, 508, 4017
nomical Society, 459, 2327–2341 McClintock T., et al., 2019a, arXiv e-prints, p. arXiv:1907.13167
Joachimi B., 2017, MNRAS, 466, L83 McClintock T., et al., 2019b, The Astrophysical Journal, 872, 53
Joachimi B., Taylor A., 2014, Statistical Challenges in 21st Century Cosmol- Modi C., Lanusse F., Seljak U., 2020, arXiv e-prints, p. arXiv:2010.11847
ogy, 306, 99 Modi C., Lanusse F., Seljak U., Spergel D. N., Perreault-Levasseur L., 2021,
Kasim M. F., et al., 2020, arXiv e-prints, p. arXiv:2001.08055 arXiv e-prints, p. arXiv:2104.12864
Kaufman G. M., 1967, Center for Operations Research and Econometrics Mohammed I., Seljak U., 2014, Mon. Not. Roy. Astron. Soc., 445, 3382
Report no. 6710. Catholic University of Louvain. Heverlee, Belgium. Mohammed I., Seljak U., Vlah Z., 2017, MNRAS, 466, 780
Kitaura F. S., Yepes G., Prada F., 2014, MNRAS, 439, L21 Monaco P., Sefusatti E., Borgani S., Crocce M., Fosalba P., Sheth R. K.,
Kodi Ramanah D., Charnock T., Villaescusa-Navarro F., Wandelt B. D., 2020, Theuns T., 2013, MNRAS, 433, 2389
Monthly Notices of the Royal Astronomical Society, 495, 4227–4236 Muirhead R. J., 1982, Aspects of Multivariate Statistical Theory. John Wiley
Leclercq F., Faure B., Lavaux G., Wandelt B. D., Jaffe A. H., Heavens A. F., & Sons, Inc
Percival W. J., 2020, A&A, 639, A91 Paz D. J., Sánchez A. G., 2015, Monthly Notices of the Royal Astronomical
Li Y., Singh S., Yu B., Feng Y., Seljak U., 2019, J. Cosmology Astropart. Society, 454, 4326
Phys., 2019, 016 Pearson D. W., Samushia L., 2016, MNRAS, 457, 993
MNRAS 000, 000–000 (2022)

Figure C4. Confidence contours of the cosmological parameters computed using the Fisher matrix based on the estimated matter bispectrum covariance matrix,
for a set of equilateral triangles. The estimators we compare are the same as in Figure 1
Pedersen C., Font-Ribera A., Rogers K. K., McDonald P., Peiris H. V., Pontzen Smith A., de Mattia A., Burtin E., Chuang C.-H., Zhao C., 2021, MNRAS,
A., Slosar A., 2021, J. Cosmology Astropart. Phys., 2021, 033 500, 259
Percival W. J., et al., 2014, MNRAS, 439, 2531 Springel V., 2005, MNRAS, 364, 1105
Percival W. J., Friedrich O., Sellentin E., Heavens A., 2021, arXiv e-prints, Spurio Mancini A., Piras D., Alsing J., Joachimi B., Hobson M. P., 2022,
p. arXiv:2108.10402 MNRAS, 511, 1771
Philcox O. H. E., Eisenstein D. J., 2019, MNRAS, 490, 5931 Taffoni G., Monaco P., Theuns T., 2002, Monthly Notices of the Royal Astro-
Philcox O. H. E., Eisenstein D. J., O’Connell R., Wiegand A., 2020, MNRAS, nomical Society, 333, 623
491, 3290 Takahashi R., et al., 2009, ApJ, 700, 479
Philcox O. H. E., Ivanov M. M., Zaldarriaga M., Simonović M., Schmittfull Tassev S., Zaldarriaga M., 2012, J. Cosmology Astropart. Phys., 2012, 013
M., 2021, Phys. Rev. D, 103, 043508 Tassev S., Zaldarriaga M., Eisenstein D. J., 2013, Journal of Cosmology and
Pontzen A., Slosar A., Roth N., Peiris H. V., 2016, Phys. Rev. D, 93, 103519 Astroparticle Physics, 2013, 036–036
Pope A. C., Szapudi I., 2008, MNRAS, 389, 766 Tassev S., Eisenstein D. J., Wand elt B. D., Zaldarriaga M., 2015, arXiv
Potter D., Stadel J., Teyssier R., 2017, Computational Astrophysics and Cos- e-prints, p. arXiv:1502.07751
mology, 4, 2 Taylor A., Joachimi B., 2014, MNRAS, 442, 2728
Remy B., Lanusse F., Ramzi Z., Liu J., Jeffrey N., Starck J.-L., 2020, arXiv Taylor A., Joachimi B., Kitching T., 2013, MNRAS, 432, 1928
e-prints, p. arXiv:2011.08271 Villaescusa-Navarro F., et al., 2018, The Astrophysical Journal, 867, 137
Rogers K. K., Peiris H. V., 2021, Phys. Rev. D, 103, 043526 Villaescusa-Navarro F., et al., 2020, ApJS, 250, 2
Schäfer J., Strimmer K., 2005, Statistical applications in genetics and molec- Villaescusa-Navarro F., et al., 2021a, arXiv e-prints, p. arXiv:2109.09747
ular biology, 4, Article32 Villaescusa-Navarro F., et al., 2021b, arXiv e-prints, p. arXiv:2109.10915
Scoccimarro R., Sheth R. K., 2002, MNRAS, 329, 629 Villaescusa-Navarro F., et al., 2021c, ApJ, 915, 71
Sellentin E., Heavens A. F., 2018, MNRAS, 473, 2355 Wadekar D., Ivanov M. M., Scoccimarro R., 2020, Phys. Rev. D, 102, 123521
MNRAS 000, 000–000 (2022)

Warren M. S., 2013, arXiv e-prints, p. arXiv:1310.4502
White M., Tinker J. L., McBride C. K., 2014, MNRAS, 437, 2594
Yu H.-R., Pen U.-L., Wang X., 2018, ApJS, 237, 24
Zhai Z., et al., 2019, The Astrophysical Journal, 874, 95
This paper has been typeset from a TEX/LATEX file prepared by the author.
MNRAS 000, 000–000 (2022)

Bayesian Control Variates For Optimal Covariance Estimation With Pairs of Simulations and Surrogates

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayesian Control Variates For Optimal Covariance Estimation With Pairs of Simulations and Surrogates

Uploaded by

Copyright:

Available Formats

MNRAS 000, 000–000 (2022) Preprint 13 April 2022 Compiled using MNRAS LATEX style file v3.

Bayesian Control Variates for optimal covariance estimation with pairs of

Accepted XXX. Received YYY; in original form ZZZ

© 2022 The Authors

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

Figure 3. Matter power spectrum covariance estimates

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

the last term on the RHS

MNRAS 000, 000–000 (2022)

A1.3 Proof that EM iterations conserve positive (semi-)

if and only if ( 𝑴/𝑴 22 ) ≥ 0.

( 𝑨/𝑨22 ) = (𝚺[𝑘]/𝚺𝒓𝒓 [𝑘]) ≥ 𝑨 (A11)

by assumption. This implies 𝑨 > 0 by the Lemma. The sum of

A1.4 The Maximum Likelihood and A Posteriori solutions as Fixed

MNRAS 000, 000–000 (2022)

− 2 ln P (𝚺|{𝒙}, {𝒓 ∗ }) = (𝑛𝑠 + 𝑛𝑟 + 𝜈 − 𝑝 𝑠 + 𝑝 𝑟 + 1) ln [det(𝚺𝒓𝒓 )]

APPENDIX B: MAP DERIVATION (REGRESSION APPENDIX C: SOME ADDITIONAL RESULTS

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

MNRAS 000, 000–000 (2022)

You might also like