Professional Documents
Culture Documents
as weather, wildfires and epidemics are often spatio-temporal scales, but these often address only ideal-
founded on models described by Partial Differen- ized systems and their computational cost prevents experi-
tial Equations (PDEs). However, simulations that mentation and uncertainty quantification. On the other hand,
capture the full range of spatio-temporal scales reduced order and coarse-grained models are fast, but lim-
in such PDEs are often prohibitively expensive. ited by the linearization of complex system dynamics while
Consequently, coarse-grained simulations that em- their associated closures are often based on heuristics (Peng
ploy heuristics and empirical closure terms are fre- et al., 2021).
quently utilized as an alternative. We propose a We propose to complement coarse-grained simulations with
novel and systematic approach for identifying clo- closures in the form of a policies that are discovered by a
sures in under-resolved PDEs using Multi-Agent multi-agent reinforcement learning framework. A central
Reinforcement Learning (MARL). The MARL policy that is based on CNNs and uses locality as inductive
formulation incorporates inductive bias and ex- bias, corrects the error of the numerical discretization and
ploits locality by deploying a central policy rep- improves the overall accuracy. This approach is inspired
resented efficiently by Convolutional Neural Net- by recent work in Reinforcement Learning (RL) for image
works (CNN). We demonstrate the capabilities reconstruction (Furuta et al., 2019). The agents learn and
and limitations of MARL through numerical solu- act locally on the grid, in a manner that is reminiscent of the
tions of the advection equation and the Burgers’ numerical discretizations of PDEs based on Taylor series
equation. Our results show accurate predictions approximations. Hence, each agent only gets information
for in- and out-of-distribution test cases as well from its spatial neighborhood. Together with a central
as a significant speedup compared to resolving all policy, this results in efficient training, as each agent needs
scales. to process only a limited amount of observations. After
training, the framework is capable of accurate predictions
in both in- and out-of-distribution test cases. We remark
1. Introduction that the actions taken by the agents are highly correlated
Simulations of critical phenomena such as climate, ocean dy- with the numerical errors and can be viewed as an implicit
namics and epidemics, have become essential for decision- correction to the effective convolution performed via the
making, and their veracity, reliability, and energy demands coarse-grained discretization of the PDEs (Bergdorf et al.,
have great impact on our society. Many of these simu- 2005).
lations are based on models described by PDEs express-
ing system dynamics that span multiple spatio-temporal
scales. Examples include turbulence (Wilcox, 1988), neuro- 2. Related Work
science (Dura-Bernal et al., 2019), climate (Council, 2012)
and ocean dynamics (Mahadevan, 2016). Today, we benefit The main contribution of this work is a MARL algorithm
from decades of remarkable efforts in the development of for the discovery of closures for coarse-grained PDEs. The
numerical methods, algorithms, software, and hardware and algorithm provides an automated process for the correction
witness simulation frontiers that were unimaginable even of the errors of the associated numerical discretization. We
a few years ago (Hey, 2009). Large-scale simulations that show that the proposed method is able to capture scales
beyond the training regime and provides a potent method
1
ETH Zurich, Switzerland 2 Harvard SEAS, United for solving PDEs with high accuracy and limited resolution.
States. Correspondence to: Petros Koumoutsakos <pet-
ros@seas.harvard.edu>.
1
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Machine Learning and Partial Differential Equations: during extrapolative predictions (Goyal & Bengio, 2022).
In recent years, there has been significant interest in learning One way to achieve this is by shaping appropriately the
the solution of PDEs using Neural Networks. Techniques loss function (Karniadakis et al., 2021; Kaltenbach & Kout-
such as PINNs (Raissi et al., 2019; Karniadakis et al., 2021), sourelakis, 2020; Yin et al., 2021; Wang et al., 2021; 2022),
DeepONet (Lu et al., 2021), the Fourier Neural Operator or by incorporating parameterized mappings that are based
(Li et al., 2020), NOMAD (Seidman et al., 2022), Clifford on known constraints (Greydanus et al., 2019; Kaltenbach
Neural Layers (Brandstetter et al., 2023) and an invertible & Koutsourelakis, 2021; Cranmer et al., 2020). Our RL
formulation (Kaltenbach et al., 2023) have shown promising framework incorporates locality and is thus consistent with
results for both forward and inverse problems. However, numerical discretizations that rely on local Taylor series
there are concerns about their accuracy and related com- based approximations. The incorporation of inductive bias
putational cost, especially for low-dimensional problems in RL has also been attempted by focusing on the beginning
(Karnakov et al., 2022). These methods aim to substitute of the training phase (Uchendu et al., 2023; Walke et al.,
numerical discretizations with neural nets, in contrast to our 2023) in order to shorten the exploration phase.
RL framework, which aims to complement them. Moreover,
their loss function is required to be differentiable, which 3. Methodology
is not necessary for the stochastic formulation of the RL
reward. We consider a time-dependent, parametric, non-linear PDE
defined on a regular domain Ω. The solution of the PDE
Reinforcement Learning: Our work is based upon recent
depends on its initial conditions (ICs) as well as its input
advances in RL (Sutton & Barto, 2018) including PPO
parameters C ∈ C. The PDE is discretized on a spatiotempo-
(Schulman et al., 2017) and the Remember and Forget Ex-
ral grid and the resulting discrete set of equations is solved
perience Replay (ReFER) algorithm (Novati et al., 2019). It
using numerical methods.
is related to other methods that employ multi-agent learn-
ing (Novati et al., 2021; Albrecht et al., 2023; Wen et al., In turn, the number of the deployed computational elements
2022; Bae & Koumoutsakos, 2022; Yang & Wang, 2020) and the structure of the PDE determine whether all of its
for closures except that here we use as reward not global scales have been resolved or whether the discretization
data (such as energy spectrum) but the local dicsretization amounts to a coarse-grained representation of the PDE. In
error as quantified by fine-grained simulations. The present the first case, the (fine grid simulation (FGS)) provides the
approach is designed to solve various PDEs using a central discretized solution ψ, whereas in (coarse grid simulation
policy and it is related to similar work for image reconstruc- (CGS)) the resulting approximation is denoted by ψ̃.1 The
tion (Furuta et al., 2019). We train a central policy for all RL policy can improve the accuracy of the solution ψ̃ by
agents in contrast to other methods where agents are trained introducing an appropriate forcing term in the right-hand
on decoupled subproblems (Freed et al., 2021; Albrecht side of the CGS. For this purpose, FGS of the PDE are used
et al., 2023; Sutton et al., 2023). as training episodes and serve as the ground truth to facil-
itate a reward signal. The CGS and FGS employed herein
Closure Modeling: The development of machine learning
are introduced in the next section. The proposed MARL
methods for discovering closures for under-resolved PDEs
framework is introduced in Section 3.2.
has gained attention in recent years. Current approaches are
mostly tailored to specific applications such as turbulence
modeling (Ling et al., 2016; Durbin, 2018; Novati et al., 3.1. Coarse and Fine Grid Simulation
2021; Bae & Koumoutsakos, 2022) and use data such as We consider a FGS of a spatiotemporal PDE on a Cartesian
energy spectra and drag coefficients of the flow in order to 3D grid with temporal resolution ∆t for NT temporal steps
train the RL policies. In (Lippe et al., 2023), a more general with spatial resolution ∆x, ∆y, ∆z resulting in Nx , Ny , Nz
framework based on diffusion models is used to improve the discretization points. The CGS entails a coarser spatial dis-
solutions of Neural Operators for temporal problems using cretization ∆x
f = d∆x, ∆y f = d∆y, ∆z f = d∆z as well
a multi-step refinement process. Their training is based on as a coarser temporal discretization ∆t = dt ∆t. Here, d
f
supervised learning in contrast to the present RL approach is the spatial and dt the temporal scaling factor. Conse-
which additionally complements existing numerical meth- quently, at a time-step ñ, we define the discretized solu-
ods instead of neural network based surrogates (Li et al., ñ
tion function of the CGS as ψ̃ ∈ Ψ̃ := Rk×Ñx ×Ñy ×Ñz
2020; Gupta & Brandstetter, 2022).
with k being the number of solution variables. The cor-
Inductive Bias: The incorporation of prior knowledge re- responding solution function of the FGS at time-step nf
garding the physical processes described by the PDEs, into is ψ nf ∈ Ψ := Rk×Nx ×Ny ×Nz . The discretized solution
machine learning algorithms is critical for their training 1
In the following, all variables with the ˜ are referring to the
in the Small Data regime and for increasing the accuracy coarse-grid description.
2
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
function of the CGS can thus be described as a subsampled To encourage learning a policy that represents the non-
version of the FGS solution function and the subsampling resolved spatio-temporal scales, the reward is based on the
operator S : Ψ → Ψ̃ connects the two. difference between the CGS and FGS at time step n. In
The time stepping operator of the CGS G : Ψ̃ × C˜ → Ψ̃ more detail, we define a local reward Rn ∈ RÑx ×Ñy in-
leads to the update rule spired by the reward proposed for image reconstruction in
(Furuta et al., 2019):
ñ+1 ñ
ψ̃ = G(ψ̃ , C̃ ñ ). (1) k
1 X ˜n
Similarly, we define the time stepping operator of the FGS
n
Rij = [ψ − S(ψ n )]2 − [ψ˜n − An − S(ψ n )]2
k w=1 wij
as F : Ψ × C → Ψ. In the case of the CGS, C̃ ∈ C, ˜ is an
(4)
adapted version of C, which for instance involves evaluating Here, the square [·]2 is computed per matrix entry. We note
a parameter function using the coarse grid. that the reward function therefore returns a matrix that gives
To simplify the notation, we set n := ñ = dt nf in the a scalar reward for each discretization point of the CGS. If
following. Hence, we apply F dT -times in order to describe the action An is bringing the discretized solution function
the same time instant with the same index n in both CGS of the CGS ψ˜n closer to the subsampled discretized solution
and FGS. The update rule of the FGS are referred to as function of the FGS S(ψ n ), the reward is positive, and vice
ψ n+1 = F dt (ψ n , C n ). (2) versa. In case An corresponds to the zero matrix, the reward
is the zero matrix as well.
In line with the experiments performed in Section 4, and During the training process, the objective is to find the
to further simplify the notation, we are dropping the third optimal policy π ∗ that maximizes the mean expected reward
spatial dimension in the following presentation. per discretization point given the discount rate γ and the
observations:
3.2. RL Environment ∞
!
X
∗ n n
π = argmax Eπ γ r̄ (5)
π
n=0
3
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
We parametrize the local policies using neural networks. The global value function is trained on the MSE loss
However, since training this many individual neural nets
can become computationally expensive, we parametrize all LV (O n , Gn ) = ||V F CN (O n ) − Gn ||22
the agents together using one fully convolutional network
(FCN) (Long et al., 2015). where Gn ∈ RÑx ×Ñy represents the actual global return ob-
served by Pinteraction with the environment and is computed
N
3.4. Parallelizing Local Policies with a FCN as Gn = i=n γ i−n Ri . Here, N represents the length of
the respective trajectory.
All local policies are parametrized using one FCN such that We have provided an overview of the modified PPO algo-
one forward pass through the FCN computes the forward rithm in Appendix A together with further details regarding
pass for all the agents at once. This approach enforces the the local version of the PPO objective Lπij (Oij , Aij ). Our
aforementioned locality and the receptive field of the FCN implementation of the adapted PPO algorithm is based on
corresponds to the spatial neighborhood that an agent at a the single agent PPO algorithm of the Tianshou framework
given discretization point can observe. (Weng et al., 2022).
We define the collection of policies in the FCN as
3.6. Neural Network Architecture
ΠF CN : O → A (8)
4
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
5
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Error Reduction
Relative Error
Relative Error
1.0% -40.0 %
1.0% CNN-MARL
CGS
ACGS -60.0 % 10
Training regime
0.1%
0 200 400 0 50 100 0 200 400
Step Step Step CNN-MARL ACGS CGS
Figure 3. Results of CNN-MARL applied to the advection equation. The relative MAEs are computed over 100 simulations with respect
to (w.r.t.) the FGS. The shaded regions correspond to the respective standard deviations. The violin plot on the right shows the number of
simulation steps until the relative MAE w.r.t. the FGS reaches the threshold of 1%.
6
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
0.01
Action
0.0
-0.01
-0.01 0.0 0.01
Approx. Perf. Action
n
Figure 5. Visual comparison of the input concentration ψ̃ , taken actions An and the corresponding numerical approximation of the
optimal action from left to right. The figure on the right-hand side qualitatively shows the linear relation between approximated optimal
actions and taken actions.
7
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Error Reduction
-20.0 %
Relative Error
-40.0 % 10
10%
-60.0 %
CGS 10%
CNN-MARL
1
-80.0 %
Training regime
1% -100.0 %
0 200 400 0 100 200 0 200 400 0
Step Step Step CNN-MARL CGS
Figure 6. Results of CNN-MARL applied to the Burgers’ equation. The relative MAEs are computed over 100 simulations with respect to
the FGS. The shaded regions correspond to the respective standard deviations. The violin plot on the right shows the number of simulation
steps until the relative MAE w.r.t. the FGS reaches the threshold of 10%.
8
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Broader Impact Statement Furuta, R., Inoue, N., and Yamasaki, T. Pixelrl: Fully
convolutional network with reinforcement learning for
In this paper, we have presented a MARL framework for sys- image processing, 2019.
tematic identification of closure models in coarse-grained
PDEs. The anticipated impact pertains primarily to the sci- Goyal, A. and Bengio, Y. Inductive biases for deep learn-
entific community, whereas the use in real-life applications ing of higher-level cognition. Proceedings of the Royal
would require additional adjustments and expert input. Society A, 478(2266):20210068, 2022.
Greydanus, S., Dzamba, M., and Yosinski, J. Hamiltonian
References neural networks. Advances in neural information process-
Albrecht, S. V., Christianos, F., and Schäfer, L. Multi- ing systems, 32, 2019.
agent reinforcement learning: Foundations and modern
Gupta, J. K. and Brandstetter, J. Towards multi-
approaches. Massachusetts Institute of Technology: Cam-
spatiotemporal-scale generalized pde modeling. arXiv
bridge, MA, USA, 2023.
preprint arXiv:2209.15616, 2022.
Bae, H. J. and Koumoutsakos, P. Scientific multi-agent Hey, T. The fourth paradigm. United States of America.,
reinforcement learning for wall-models of turbulent flows. 2009.
Nature Communications, 13(1):1443, 2022.
Kaltenbach, S. and Koutsourelakis, P.-S. Incorporating
Bergdorf, M., Cottet, G.-H., and Koumoutsakos, P. Multi- physical constraints in a deep probabilistic machine learn-
level adaptive particle methods for convection-diffusion ing framework for coarse-graining dynamical systems.
equations. Multiscale Modeling & Simulation, 4(1):328– Journal of Computational Physics, 419:109673, 2020.
357, 2005.
Kaltenbach, S. and Koutsourelakis, P.-S. Physics-aware,
Brandstetter, J., Berg, R. v. d., Welling, M., and Gupta, J. K. probabilistic model order reduction with guaranteed sta-
Clifford neural layers for pde modeling. ICLR, 2023. bility. ICLR, 2021.
Council, N. R. A National Strategy for Advancing Climate Kaltenbach, S., Perdikaris, P., and Koutsourelakis, P.-S.
Modeling. The National Academies Press, 2012. Semi-supervised invertible neural operators for bayesian
inverse problems. Computational Mechanics, pp. 1–20,
Courant, R., Friedrichs, K., and Lewy, H. Über die
2023.
partiellen differenzengleichungen der mathematischen
physik. Mathematische Annalen, 100(1):32–74, Decem- Karnakov, P., Litvinov, S., and Koumoutsakos, P. Optimiz-
ber 1928. doi: 10.1007/BF01448839. URL http: ing a discrete loss (odil) to solve forward and inverse
//dx.doi.org/10.1007/BF01448839. problems for partial differential equations using machine
learning tools. arXiv preprint arXiv:2205.04611, 2022.
Cranmer, M., Greydanus, S., Hoyer, S., Battaglia, P.,
Spergel, D., and Ho, S. Lagrangian neural networks. Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris,
arXiv preprint arXiv:2003.04630, 2020. P., Wang, S., and Yang, L. Physics-informed machine
learning. Nature Reviews Physics, 3(6):422–440, 2021.
Deng, L. The mnist database of handwritten digit images
for machine learning research. IEEE Signal Processing Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhat-
Magazine, 29(6):141–142, 2012. tacharya, K., Stuart, A., and Anandkumar, A. Fourier
neural operator for parametric partial differential equa-
Dura-Bernal, S., Suter, B. A., Gleeson, P., Cantarelli, M.,
tions. arXiv preprint arXiv:2010.08895, 2020.
Quintana, A., Rodriguez, F., Kedziora, D. J., Chadderdon,
G. L., Kerr, C. C., Neymotin, S. A., et al. Netpyne, a Ling, J., Kurzawski, A., and Templeton, J. Reynolds aver-
tool for data-driven multiscale modeling of brain circuits. aged turbulence modelling using deep neural networks
Elife, 8:e44494, 2019. with embedded invariance. Journal of Fluid Mechanics,
807:155–166, 2016.
Durbin, P. A. Some recent developments in turbulence
closure modeling. Annual Review of Fluid Mechanics, Lippe, P., Veeling, B. S., Perdikaris, P., Turner, R. E.,
50:77–103, 2018. and Brandstetter, J. Pde-refiner: Achieving accurate
long rollouts with neural pde solvers. arXiv preprint
Freed, B., Kapoor, A., Abraham, I., Schneider, J., and arXiv:2308.05732, 2023.
Choset, H. Learning cooperative multi-agent policies
with partial reward decoupling. IEEE Robotics and Au- Long, J., Shelhamer, E., and Darrell, T. Fully convolutional
tomation Letters, 7(2):890–897, 2021. networks for semantic segmentation, 2015.
9
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E. Sutton, R. S. and Barto, A. G. Reinforcement learning: An
Learning nonlinear operators via deeponet based on the introduction. MIT press, 2018.
universal approximation theorem of operators. Nature
machine intelligence, 3(3):218–229, 2021. Sutton, R. S., Machado, M. C., Holland, G. Z., Szepes-
vari, D., Timbers, F., Tanner, B., and White, A. Reward-
Mahadevan, A. The impact of submesoscale physics on pri- respecting subtasks for model-based reinforcement learn-
mary productivity of plankton. Annual review of marine ing. Artificial Intelligence, 324:104001, 2023.
science, 8:161–184, 2016.
Uchendu, I., Xiao, T., Lu, Y., Zhu, B., Yan, M., Simon, J.,
Novati, G., Mahadevan, L., and Koumoutsakos, P. Con- Bennice, M., Fu, C., Ma, C., Jiao, J., et al. Jump-start
trolled gliding and perching through deep-reinforcement- reinforcement learning. In International Conference on
learning. Physical Review Fluids, 4(9):093902, 2019. Machine Learning, pp. 34556–34583. PMLR, 2023.
Novati, G., de Laroussilhe, H. L., and Koumoutsakos, P. Walke, H. R., Yang, J. H., Yu, A., Kumar, A., Orbik, J.,
Automating turbulence modelling by multi-agent rein- Singh, A., and Levine, S. Don’t start from scratch: Lever-
forcement learning. Nature Machine Intelligence, 3(1): aging prior data to automate robotic reinforcement learn-
87–96, 2021. ing. In Conference on Robot Learning, pp. 1652–1662.
Peng, G. C., Alber, M., Buganza Tepole, A., Cannon, W. R., PMLR, 2023.
De, S., Dura-Bernal, S., Garikipati, K., Karniadakis, G., Wang, S., Wang, H., and Perdikaris, P. Learning the solution
Lytton, W. W., Perdikaris, P., et al. Multiscale modeling operator of parametric partial differential equations with
meets machine learning: What can we learn? Archives of physics-informed deeponets. Science advances, 7(40):
Computational Methods in Engineering, 28:1017–1037, eabi8605, 2021.
2021.
Wang, S., Sankaran, S., and Perdikaris, P. Respecting causal-
Quarteroni, A. and Valli, A. Numerical approximation ity is all you need for training physics-informed neural
of partial differential equations, volume 23. Springer networks. arXiv preprint arXiv:2203.07404, 2022.
Science & Business Media, 2008.
Wen, M., Kuba, J., Lin, R., Zhang, W., Wen, Y., Wang, J.,
Raissi, M., Perdikaris, P., and Karniadakis, G. E. Physics-
and Yang, Y. Multi-agent reinforcement learning is a
informed neural networks: A deep learning framework for
sequence modeling problem. Advances in Neural Infor-
solving forward and inverse problems involving nonlinear
mation Processing Systems, 35:16509–16521, 2022.
partial differential equations. Journal of Computational
physics, 378:686–707, 2019. Weng, J., Chen, H., Yan, D., You, K., Duburcq, A.,
Zhang, M., Su, Y., Su, H., and Zhu, J. Tianshou:
Rossinelli, D., Hejazialhosseini, B., Hadjidoukas, P., Bekas,
A highly modularized deep reinforcement learning li-
C., Curioni, A., Bertsch, A., Futral, S., Schmidt, S. J.,
brary. Journal of Machine Learning Research, 23(267):1–
Adams, N. A., and Koumoutsakos, P. 11 PFLOP/s
6, 2022. URL http://jmlr.org/papers/v23/
simulations of cloud cavitation collapse. In Proceed-
21-1127.html.
ings of the International Conference on High Perfor-
mance Computing, Networking, Storage and Analysis, Wilcox, D. C. Multiscale model for turbulent flows. AIAA
SC ’13, pp. 3:1–3:13, New York, NY, USA, 2013. journal, 26(11):1311–1320, 1988.
ACM. ISBN 978-1-4503-2378-9. doi: 10.1145/2503210.
2504565. URL http://doi.acm.org/10.1145/ Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a
2503210.2504565. novel image dataset for benchmarking machine learning
algorithms, 2017.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. Proximal policy optimization algorithms. Yang, Y. and Wang, J. An overview of multi-agent reinforce-
CoRR, abs/1707.06347, 2017. URL http://arxiv. ment learning from game theoretical perspective. arXiv
org/abs/1707.06347. preprint arXiv:2011.00583, 2020.
Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, Yin, Y., Le Guen, V., Dona, J., de Bézenac, E., Ayed, I.,
P. High-dimensional continuous control using generalized Thome, N., and Gallinari, P. Augmenting physical models
advantage estimation, 2018. with deep networks for complex dynamics forecasting.
Journal of Statistical Mechanics: Theory and Experiment,
Seidman, J., Kissas, G., Perdikaris, P., and Pappas, G. J. No- 2021(12):124012, 2021.
mad: Nonlinear manifold decoders for operator learning.
Advances in Neural Information Processing Systems, 35: Zhang, K., Zuo, W., Gu, S., and Zhang, L. Learning deep
5601–5613, 2022. cnn denoiser prior for image restoration, 2017.
10
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
end for
LV (O n , Gn , ϕ) = ||VϕF CN (O n ) − Gn ||22 .
We note that this notation contains the parameters ϕ, which parameterize the underlying neural network.
The objective for the global policy is defined as
Ñx ,Ñy
1 X
LΠ (O n , An , θk , θ, β) := n
Lπij (Oij , Anij , θk , θ) − βH(πij,θ ),
Ñx · Ñy i,j=1
where H(πij,θ ) is the entropy of the local policy and Lπij is the standard single-agent PPO-Clip objective
πij,θ (a|o) πij,θ (a|o)
Lπij (o, a, θk , θ) = min Adv πij,θk (o, a), clip , 1 − ϵ, 1 + ϵ Adv πij,θk (o, a) .
πij,θk (a|o) πij,θk (a|o)
The advantage estimates Adv πij,θk are computed with generalized advantage estimation (GAE) (Schulman et al., 2018)
using the output of the global value network VϕF CN .
The resulting adapted PPO algorithm is presented in Algorithm 1. Major differences compared to the original PPO algorithm
are vectorized versions of the value network loss and PPO-Clip objective, as well as a summation over all the discretization
points of the domain before performing an update step.
11
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
B. Additional Results
B.1. Advection Equation
Figure 7. ψ 0 is sampled from the MNIST test set. The velocity field is sampled from DTV est
ortex
(See Appendix C.2.1). Here, the IC of the
concentration comes from a different distribution than the one used for training.
Figure 8. ψ 0 is a sample from the F-MNIST test set. The velocity field is sampled from DTV rain
ortex
(See Appendix C.2.2). Note that the
velocity field comes from a different distribution than the one used for training.
12
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Figure 10. Velocity magnitude plotted for a 100-step roll out. The set-up is the same as in Figure 9 and the example again shows, that
CNN-MARL leads to qualitatively different dynamics during a roll-out that are closer to those of the FGS.
13
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Figure 11. Visualizations of the evolution of the reward metric averaged over the agents and episode length during training on the advection
equation.
Figure 12. Visualizations of the evolution of the reward metric averaged over the agents and episode length during training on the Burgers’
equation.
14
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Table 3. Numerical values of the simulation parameters used for the advection CGS and FGS. Note that ∆t
f is chosen to guarantee that the
CFL condition in the CGS is fulfilled.
Ω [0, 1] × [0, 1]
Ñx , Ñy 64
d, dt 4
Resulting other parameters:
Nx , Ny d · Ñx , d · Ñy = 256
∆x, ∆y 1/Nx , 1/Ny ≈ 0, 0039
∆x,
f ∆y f 1/Ñx , 1/Ñy ≈ 0, 0156
∆t
f 0.9 · min(∆x,f ∆y) f ≈ 0, 0141
∆t ∆t/dt
f ≈ 0, 0035
Discretization schemes:
FGS, Space Central difference
FGS, Time Fourth-order Runge-Kutta
CGS, Space Upwind
CGS, Time Forward Euler
Table 4. Numerical values of the simulation parameters used for the Burgers’ CGS and FGS. Again, ∆t
f is chosen to guarantee that the
CFL condition in the CGS is fulfilled.
Ω [0, 1] × [0, 1]
Ñx , Ñy 30
d, dt 5, 10
Resulting other parameters:
Nx , Ny d · Ñx , d · Ñy = 150
∆x, ∆y 1/Nx , 1/Ny ≈ 0, 0067
∆x,
f ∆y f 1/Ñx , 1/Ñy ≈ 0, 0333
∆t
f 0.9 · min(∆x,f ∆y) f = 0, 03
∆t ∆t/dt
f = 0, 003
Discretization schemes:
FGS, Space Upwind
FGS, Time Forward Euler
CGS, Space Upwind
CGS, Time Forward Euler
15
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Table 5. Hyperparameters of each of the convolutional layers of the neural network architecture used for the advection equation experiment
and the resulting receptive field (RF) at each layer. For the Burgers’ equation experiment the architecture is simply adapted by setting the
number of in channels of Conv2D 1 to 2 and the number of out channels of Conv2D π to 4.
Furthermore, define the velocity components of a translational velocity field as uT L , v T L ∈ R. To generate a random
incompressible velocity field, we sample 1 to 4 k’s from the set {1, ..., 6}. For each k, we also sample a signk uniformly
from the set {−1, 1} in order to randomize the vortex directions. For an additional translation term, we sample uT L , v T L
independently from uniform(−1, 1). We then initialize the velocity field to
X
uij := uT L + signk · uTijG,k (22)
k
X
v ij := v TL
+ signk · v TijG,k . (23)
k
In the further discussion, we will refer to this distribution of vortices as DTV est
ortex
.
16
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Table 6. Runtime of one simulation step in ms of the different simulations averaged over 500 steps.
17
Closure Discovery for Coarse-Grained PDEs using Multi-Agent Reinforcement Learning
Here, M is the numerical approximation of the PDE an ϵ˜n is the truncation error.
We refer to the numerical approximations of the derivatives in the CGS as T F E for forward Euler and DU W for the upwind
scheme. We obtain
where C̃ n := (ũn , ṽ n ). This update rule would theoretically find the exact solution. It involves a coarse step followed by an
n+1 n+1
additive correction after each step. We can define ϕ̃ := ψ̃ f n , to bring the equation into the same form as seen
+ ∆tϵ̃
in the definition of the RL environment
n+1 n
ϕ̃ = G(ϕ̃ − ∆tϵ̃f n−1 , C̃ n ),
which illustrates that the optimal action at step n would be the previous truncation error times the time increment
An ∗ = ∆tϵ̃
f n−1 . (33)
18