Robust data driven discovery of a seismic wave equation

Geophys. J. Int. (2024) 236, 537–546 https://doi.org/10.
1093/gji/ggad446
Advance Access publication 2023 November 15
GJI General Geophysical Methods
Robust data driven discovery of a seismic wave equation
Shijun Cheng and Tariq Alkhalifah

Division of Physical Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia.
E-mail: sjcheng.academic@gmail.com
Accepted 2023 November 1. Received 2023 September 24; in original form 2023 April 29
SUMMARY
Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

Despite the fact that our physical observations can often be described by derived physical laws,
such as the wave equation, in many cases, we observe data that do not match the laws or have
not been described physically yet. Therefore recently, a branch of machine learning has been
devoted to the discovery of physical laws from data. We test this approach for discovering the
wave equation from the observed spatial-temporal wavefields. The algorithm first pre-trains
a neural network (NN) in a supervised fashion to establish the mapping between the spatial-
temporal locations (x, y, z, t) and the observation displacement wavefield function u(x, y, z,
t). The trained NN serves to generate metadata and provide the time and spatial derivatives
of the wavefield (e.g. utt and uxx ) by automatic differentiation. Then, a preliminary library
of potential terms for the wave equation is optimized from an overcomplete library by using
a genetic algorithm. We, then, use a physics-informed information criterion to evaluate the
precision and parsimony of potential equations in the preliminary library and determine the
best structure of the wave equation. Finally, we train the ‘physics-informed’ neural network to
identify the corresponding coefficients of each functional term. Examples in discovering the
2-D acoustic wave equation validate the feasibility and effectiveness of our implementation.
We also verify the robustness of this method by testing it on noisy and sparsely acquired
wavefield data.
Key words: Machine learning; Non-linear differential equations; Wave propagation.
can we discover a sufficiently precise wave equation directly from

1 I N T RO D U C T I O N
observed data without relying on physical laws?
Seismic wave equations play a critical role in seismology to con- Benefiting from the recent advances in machine learning (ML)
strain and define wave propagation in a specific medium. Having an and data-processing capabilities, the dawn of this question maybe
accurate equation contributes to our understanding of wave propa- in the horizon. More recently, data-driven discovery methods
gation. Seismologists previously followed some basic physical laws have been developed to identify the underlying partial differen-
to propose numerous wave equations (Kjartansson 1979; Thomsen tial equations (PDEs) of physical problems. Specifically, data-
1986; Alkhalifah 2000; Zhu & Harris 2014; Cheng et al. 2021). The driven discovery methods are based on constructing a library
presented equations provide the central ingredient to describe and composed of candidate functional terms, and then using vari-
model seismic wave propagation in the Earth’s interior. However, ous optimization algorithms to select the most appropriate com-
we can not ignore that many these equations are derived based on bination of candidate terms, generating the general form of the
some approximations and assumptions. Moreover, wave propaga- equations. In terms of library construction approaches, current
tion in certain media may involve elusive and complex mechanisms methods are mainly divided into closed and expandable library
(Biot 1955; Dvorkin & Nur 1993; Ba et al. 2008, 2017). Hence, methods.
the resulting governing equations might not accurately describe the Closed library methods, which are the most widely used, first
wavefield, such as in attenuating media, as they inherently depend build an overcomplete library and then use sparse regression meth-
on the underlying assumptions (Hao & Greenhalgh 2021). In this ods to extract the dominating candidate function terms. Lasso (Scha-
case, a natural solution is to seek a new accurate mathematical equa- effer 2017), sequential threshold ridge regression (Rudy et al. 2017),
tion based on existing knowledge and physical principles to replace and SINDy (Brunton et al. 2016) are the leading representative
the original wave equation. Actually, it’s undeniable that this pro- sparse regression methods for discovering PDEs from the closed
cess can be difficult and time consuming. Then, we ask ourselves: construction library. Although sparse regression methods remain

C The Author(s) 2023. Published by Oxford University Press on behalf of The Royal Astronomical Society. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
537
538 S. Cheng and T. Alkhalifah
equation, a physics-informed neural network (PINN) (Raissi et al.

2019), which is initialized by the trained network, is trained to iden-
tify the corresponding coefficients for every term in the discovered
wave equation. We present the discovery of the 2-D acoustic equa-
tion to demonstrate the potential of D-WE, and its robustness in
handling noisy and sparse data.
2 METHOD
2.1 Problem description

In this work, we consider the general form of a seismic wave equa-
Figure 1. Structure of deep fully connected backpropagation NN. tion consisting of:

u T = f (u ); [ξi ]i=1,···,n (1)
highly efficient, the fact is that they are limited to a complete candi- with

date library given beforehand. It is difficult to provide an overcom-
plete candidate library that must contain true PDE terms, especially (u ) = u, u x , u y , u z , u x x , u yy , u zz , · · · , (2)
in a setting in which we lack prior knowledge. The growing size of where uT denotes different orders of derivatives of displacement
candidate libraries, meanwhile, leads to the increasing difficulty of u with respect to time t, for example first (ut ) or second (utt );
sparsifying them, which means that we may discover a wrong PDE. (u) refers to the candidate library composed of potential func-
Furthermore, in the process of constructing a candidate library, the tional terms, in which the subscripts represent different orders
derivatives of the observation data are often obtained by using finite of derivatives in space; [ξ i ]i = 1, · · ·, n denotes the vector of coeffi-
difference, polynomial interpolation and other methods. This is not cients with size n of the candidates in the library and f( · ) is a
robust in the face of data acquired on irregular grids and maybe function parametrizing a wave equation with possible contributing
exposed to noise. terms.
In contrast, expandable library methods have a stronger abil- For a specific displacement wavefield data, denoted as u(xi , yj , zk ,
ity to identify PDEs with complicated structures than the closed tl ), i = 1, · · ·, Nx , j = 1, · · ·, Ny , k = 1, · · ·, Nz , and l = 1, · · ·, Nt ,
library methods. This is because expandable library methods do eq. (1) can be expressed as
not require a pre-determined overcomplete library. They just re- ⎡ ⎤
quires a randomly generated incomplete initial library, which will u T (x 1 , y1 , z 1 , t1 )
⎢ u T (x 2 , y1 , z 1 , t1 ) ⎥
evolve to produce unlimited combinations through the introduc- ⎢ ⎥
⎣ ... ⎦
tion of crossover and mutation operations of genetic algorithms
(GA; Maslyaev et al. 2019). However, processing noisy and sparse u T x N x , y N y , z N z , t Nt
⎡ ⎤
data still remains a challenge. With the rapid development of ML, u (x 1 , y1 , z 1 , t1 ) u x (x 1 , y1 , z 1 , t1 ) · · · ⎡ ⎤
⎢ u (x 2 , y1 , z 1 , t1 ) ξ1
some researchers (Chen et al. 2021; Xu et al. 2022) utilized NN u x (x 2 , y1 , z 1 , t1 ) . . .⎥
=⎢ ⎣
⎥ ⎣. . .⎦ .
⎦
functional representation to calculate derivatives through automatic
... ... ... ξn
differentiation. Compared to conventional numerical methods, the u x N x , y N y , z N z , t Nt u x x N x , y N y , z N z , t Nt . . .
calculation of derivatives provided by NN is more stable and robust (3)
to noisy data (Xu et al. 2022). Meanwhile, NNs can operate in mesh
free environments when computing derivatives via automatic dif- We can see that the linear system in eq. (3) has Nx · Ny · Nz ·
ferentiation. However, automatic differentiation is relatively costly, Nt equations and n unknown coefficients. In most cases, Nx · Ny
but for discovering an equation, we believe the accuracy is worth · Nz · Nt n holds, which implies eq. (3) is an overdetermined
the cost. system. Fortunately, a parsimonious form exists in most seismic
In this work, we adapt a new data-driven method to discover the wave equations, thus, just a few functional terms have non-zero
wave equation, named D-WE, which combines a neural network coefficients.
(NN) and GA. In D-WE, a fully connected deep NN is first trained, The objective of a data-driven discovery of a wave equation is
where the input to the network is spatial-temporal locations (e.g. to identify the closed form of f( · ), that is, find the linear com-
x, y, z, t), and the corresponding output is the observed pressure bination of functional terms in (u) and corresponding vector of
wavefield u(x, y, z, t). After training the network, we can produce coefficients [ξ i ]i = 1, · · ·, n . Actually, these two problems are mutu-
metadata and compute time and spatial derivatives of u(x, y, z, t). ally dependent. On the one hand, once we identify which coeffi-
Subsequently, a digital coding is created to define the form of the cients are non-zero from the coefficients [ξ i ]i = 1, · · ·, n , the structure
underlying wave equation, including the left-hand side (LHS) and of the correct equation is also determined. On the other hand, if
right-hand side (RHS). The special encoding corresponds to the we discover the structure of the equation, then how to obtain the
genomes for the combination of some candidate function terms. non-zero coefficients becomes a simple problem of solving a linear
Crossover and mutation are used to expand the diversity of the equation.
library, that is, increasing the search scope of candidate wave equa-
tions. To determine the proper wave equation from the numerous
2.2 The neural network
candidate equations, we use the physics-informed information cri-
terion (PIC) (Xu et al. 2022), which simultaneously considers the As mentioned, the problem of discovering the seismic wave equa-
evaluation of parsimony and precision, resulting in an equation with tion is given by eq. (3). However, how can we obtain the derivative
physical interpretability. After discovering the structure of the wave terms on the LHS and RHS when only the displacement wavefield is
Discovery of wave equation 539

Figure 2. The workflow of the genetic algorithm.
Figure 3. An illustration of the process of crossover and mutation.
Figure 4. An illustration of moving horizon.
known? A feasible option is to use finite difference and other numer- derivatives, which is usually difficult to guarantee, especially for
ical methods to calculate time and spatial derivatives. Although nu- field data.
merical methods are efficient, the observations need to be on a reg- In contrast, NNs have proven their robustness in representing
ular grid and have high signal-to-noise ratio for accurate numerical noisy data (Xu et al. 2022). Hence, an alternative solution is that we
Table 1. The evolution process of the preliminary library when the left-hand side of the equation is the
first-order time derivative.
Number of Genome and Fitness
generations translation
1 Genome:[1]{[1, 1], [0, 1, 3]} 135.62
Translation: ut = ξ1 (u 2x + u 2z ) + ξ2 (uu x u x x x + uu z u z z z )
20 Genome:[1]{[1, 1, 2], [0, 2, 2], [0, 1, 1], [1, 1]} 110.37
Translation: ut = ξ1 (u 2x u x x + u 2z u zz ) + ξ2 (uu 2x x + uu 2zz )
+ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z )
40 Genome:[1]{[0, 0], [0, 2], [0, 1, 1], [1, 1]} 47.27
Translation:ut = ξ 1 u2 + ξ 2 (uuxx + uuzz )
+ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z )
60 Genome:[1]{[0, 0], [0, 2], [0, 1, 1], [1, 1]} 47.27
+ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z )

80 Genome:[1]{[0, 0], [0, 2], [0, 1, 1], [1, 1]} 47.27
+ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z )
100 Genome:[1]{[0, 0], [0, 2], [0, 1, 1], [1, 1]} 47.27
+ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z )
Table 2. The evolution process of the preliminary library when the left-hand side of the equation is the second-order
time derivative.
Number of Genome and Fitness
generations translation
1 Genome:[2]{[2], [0, 0, 2], [0, 1, 1]} 47880.75
Translation:utt = ξ 1 (uxx + uzz ) + ξ 2 (u2 uxx + u2 uzz )
+ξ3 (uu 2x + uu 2z )
20 Genome:[2]{[2], [0, 0, 2], [0, 1, 3]} 47748.91
+ξ 3 (uux uxxx + uuz uzzz )
40 Genome:[2]{[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]} 47730.40
+ξ 3 (uux uxx + uuz uzz ) + ξ 4 (uux uxxx + uuz uzzz )
60 Genome:[2]{[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]} 47730.40
80 Genome:[2]{[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]} 47730.40
100 Genome:[2]{[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]} 47730.40
Table 3. The potential wave equations and the corresponding PIC when the Table 4. The potential wave equations and the corresponding PIC when the
left-hand side of the equation is the first-order time derivative. left-hand side of the equation is the second-order time derivative.
Potential wave equation PIC Potential wave equation PIC
ut = ξ 1 u2 + ξ 2 (uuxx + uuzz ) 0.028099
u t = ξ1 u 2 + ξ2 (uu x x + uu zz ) + ξ2 (u 2x + u 2z ) 0.0093211 utt = ξ 1 (uxx + uzz ) 0.000187
ut = ξ 1 (uuxx + uuzz ) 0.0075959 utt = ξ 1 (u2 uxx + u2 uzz ) 0.02694
u t = ξ1 u 2 + ξ4 (u 2x + u 2z ) 0.011005 utt = ξ 2 (uux uxxx + uuz uzzz ) 0.035408
u t = ξ1 u 2 + ξ2 (uu x x + uu zz ) + ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z ) 0.011924 utt = ξ 1 (u2 uxx + u2 uzz ) + ξ 2 (uux uxxx + uuz uzzz ) 0.040359
utt = ξ 1 (uux uxx + uuz uzz ) + ξ 2 (uux uxxx + uuz uzzz ) 0.050408
Table 5. Test on discovery of a 2-D acoustic wave equation with varying gene, to represent different order of derivatives. For example,
subsets of the total observations. ⎧
⎪
⎪ 0⇔u
Volume of data Discovered equation Error ⎨
1 ⇔ u x or u y or u z or u t
Gene: . (5)
100% utt = 3.99(uxx + uzz ) 0.25% ⎪
⎪ 2 ⇔ u x x or u yy or u zz or u tt
⎩
60% utt = 3.999(uxx + uzz ) 0.025% 3 ⇔ u x x x or u y y y or u z z z
20% utt = 3.989(uxx + uzz ) 0.28%
5% utt = 3.992(uxx + uzz ) 0.2% Here, number 0 represents displacement/pressure wavefield u; num-
1% utt = 3.941(uxx + uzz ) 1.48% ber 1, 2, and 3 are used to encode the first, second, and third order
0.5% utt = 3.969(uxx + uzz ) 0.78% spatial derivatives of the displacement/pressure wavefield, respec-
tively. Also, we use numbers 1 and 2 to represent first and second
order time derivatives of the displacement/pressure wavefield, re-
spectively. Then, we combine some genes to form gene modules,
can skilfully use automatic differentiation of the NN to calculate the which can be utilized to define functional terms. For example, the
derivatives during the process of backpropagation. For this purpose, functional terms in the LHS are represented as
we only need to train a deep fully connected backpropagation NN
[1] ⇔ u t
(shown in Fig. 1) using the following loss function Gene module: , (6)
[2] ⇔ u tt

N Ny
x Nz N t while the functional terms in the RHS have the form
1
L(θ ) =
N x N y N z Nt i=1 j=1 k=1 l=1 [2] ⇔ u x x + u yy + u zz
Gene module: . (7)
2 [0, 3] ⇔ uu x x x + uu y y y + uu z z z
u ( xi , y j , z k , tl ) − NN( xi , y j , z k , tl ; θ ) , (4)
It is noted that for most wave equations, the LHS of the equation is
where θ denotes the NN trainable parameters, and the corresponding expressed as a first- or second-order time derivative. Hence, we only
inputs to the network are the spatial-temporal locations (xi , yj , zk , consider these two cases here. Meanwhile, we have to emphasize
tl ), to approximate the displacement wavefield u(xi , yj , zk , tl ) and its that we consider the symmetry of the wave equation in the RHS of
derivatives. In real applications, the wavefield (i.e. labels) are given the equation. For example, the gene module [2], which is different
by the recorded data. However, for this analysis of the approach, we from gene 2, not only represents the second-order spatial derivatives
will simulate the data. of the displacement/pressure wavefield with respect to spatial vari-
Since the inputs to the network are mutually independent po- ables x, but also signifies the second-order spatial derivatives with
sition coordinates, the typical requirement that the observed data respect to the spatial variables y and z. The spatial derivatives of the
must be collected on a regular grid is not need here. Another ad- displacement/pressure wavefield with respect to these three spatial
vantage of this implementation is that the trained NN can be further variables are combined through addition. Additionally, when a gene
used to predict the wavefield at certain spatial locations where we module has multiple genes, it represents the multiplication of the
do not have observations. The predicted wavefields are referred as corresponding genes, that is, [0, 3] denotes uuxxx + uuyyy + uuzzz .
metadata, which can further expand the data volume to us, thereby The combination of gene modules is regarded as the genome, which
assisting in the discovery of the wave equation. Furthermore, this includes the LHS and RHS terms in the potential wave equation. We
trained NN will serve as the initialization for the PINN, signifi- assume the LHS of wave equation only includes the derivatives with
cantly reducing the training cost required to accurately determine respect to t, and the LRS of wave equation consists of the derivatives
the equation structure using PIC and identify the coefficients. We with respect to spatial variables x, y, and z, which applies to many
will elaborate on this in detail later on. PDEs. Hence, in the case of the LHS of the equation given by the
first-order time derivative, we can use the following digitization to
translate the corresponding wave equation:
Genome: [1]{[2], [0, 3]} ⇔ u t = u x x + u yy + u zz + uu x x x + uu y y y
2.3 Genetic algorithm
+uu z z z . (8)
GA is a type of optimization algorithm inspired by the process of nat-
When the LHS of the equation is given by the second-order time
ural selection. Since directly solving eq. (3) is a non-deterministic
derivative, we can represent the wave equation as follows:
polynomial time (NP) hard problem with unlimited combinations
(Xu et al. 2022), we utilize GA to search for an optimal preliminary Genome: [2]{[2], [0, 3]} ⇔ u tt = u x x + u yy + u zz + uu x x x + uu y y y
candidate functional terms from unlimited combinations, which
+uu z z z . (9)
converts the problem to a finite-dimensional problem. The GA does
this by creating a population of candidate solutions, each of which These genomes are presented shown as examples. Similar digitiza-
represents a different set of functional terms. The fitness of each tion and encoding can be obtained analogously. We note that the
candidate term is evaluated based on its ability to accurately model gene modules here are connected by addition. Moreover, we have
the wave behaviour of the proposed equation. In our method, GA not digitized the coefficients, such as velocity and density, which
uses a series of operations, including translation, crossover, muta- are commonly present in the wave equation. Our current focus is
tion, and selection to evolve the population of candidate solutions discovering the wave equation without prior knowledge of the spe-
over multiple generations. The workflow of GA is presented in cific values of these coefficients. The values of the coefficients will
Fig. 2. In the following, we illustrate the steps in detail. be determined using PINN after discovering the equation’s struc-
Firstly, we introduce a principle of translation to digitize the ture, as will be introduced in Section 2.5. In eqs (8) and (9), we
structure of potential seismic wave equation to the corresponding have set the coefficients of the functional terms uxx + uyy + uzz and
genome. Specifically, we first use numbers, which is defined as the uuxxx + uuyyy + uuzzz to 1 for the sake of illustration. However, the

Figure 5. Comparison of wavefield snapshots simulated by the accurate acoustic wave equation and the discovered equation with different data volumes. (a)
Ground truth comes from accurate acoustic wave equation. Panels (b), (d), and (f) correspond to the discovered equations derived from 60%, 5%, and 1%
volume data, respectively, and their differences from the ground truth are plotted in (c), (e), and (g), respectively.
Table 6. Test on discovery of a 2-D acoustic wave equation from data with more concise. Conversely, as decreases, the equation exhibits a
different noise level. more complex structure.
Noise level Discovered equation Error Once we obtain the fitness of all potential candidate equations,
we can select the genomes that better describes the wave propaga-
25% utt = 3.985(uxx + uzz ) 0.38%
tion system. In our case, the best half of the children are selected as
50% utt = 3.973(uxx + uzz ) 0.68%
100% utt = 3.97(uxx + uzz ) 0.75%
the next generation of parents, and all others genomes are replaced
200% utt = 3.904(uxx + uzz ) 2.4% by new random genomes. The process of crossover, mutation, and
300% utt = 3.76(uxx + uzz ) 6% selection is repeated for the new generation. When a certain prede-
400% utt = 3.517(uxx + uzz ) 12.08% fined iteration is reached, the preliminary library with a few terms in
the last generation is reserved. For this preliminary library, the com-
binations of all candidate functional terms are countable, which is
useful to evaluate each combination to further determine the equa-
coefficients can take on arbitrary values, and this does not impact tion. To do this, in the next section, we will use the PIC algorithm
our ability to discover the equation’s structure. (Xu et al. 2022) to discover the accurate structure of the wave
Subsequently, crossover and mutation are conducted under a cer- equation from the preliminary library.
tain probability to obtain next generation candidates. Cross-over
means swapping parts of gene modules of two genomes to gener-
ate their children (see Fig. 3a). Following the crossover, mutation
produces new genes, containing add, delete, and order genes (see
2.4 Physics-informed information criterion
Fig. 3b). It should be emphasized that crossover and mutation are
only applied to the RHS of the equation, whereas the LHS searches The PIC algorithm involves two types of measurements: redun-
for the time derivative order. This is reasonable for most wave equa- dancy and physical losses. Redundancy loss is used to measure the
tions. parsimony of the proposed equation and is based on the idea that
After mutation, we need to measure the quality of the genome the coefficients of redundant terms are unstable when applied to the
and then perform the selection process. The measurement index is observed data on moving windows of a given time step (Lejarza &
computed by a fitness function as follows: Baldea 2022). Therefore, we can utilize this technique, namely the
moving horizon, to calculate the average variation in coefficients
1 2
F= equ L − equ iR ξi + · len (genome), (10) for each combination to obtain the redundancy loss. As shown in
N the Fig. 4, the smooth wavefield snapshots generated by the NN
where N denotes all observation samples, equL denotes the LHS at different times are divided into Nh overlapping horizons Ti (i =
functional terms of the candidate wave equation, equ iR represent i 1, · · ·, Nh ). The Ti is denoted as the wavefield snapshots within a

th function term in the RHS, and the corresponding coefficients ξ i time range, such as [tmin + it, 12 (tmin + tmax ) + it ], where tmin

are calculated using singular value decomposition (SVD). It is worth and tmax represent the the minimum and maximum of the time do-
emphasizing that in this section and the next section, we are utilizing main of the generated snapshots, respectively, and t denotes the
the SVD method to calculate the coefficients corresponding to each length of horizons. For a candidate combination j (i.e. potential
function term. Although it may not achieve absolute precision, it can wave equation), the corresponding vector of coefficients ξ ij in hori-
be relied upon. To avoid redundancy in the discovery equation, we zon Ti can be obtained solving equation equ i,L j − ξ ij · equ i,R j = 0,
use an l0 penalty on the number of terms in the discovered equation. where equ i,L j and equ i,R j are the values of the LHS and RHS terms
Here, len(genome) denotes the length of the genome, and is a for a potential wave equation j in horizon Ti , respectively. For
hyperparameter. In general, as increases, the equation becomes the combination j , when we obtain all the coefficient vectors in

Figure 6. Comparison of wavefield snapshots simulated by the accurate acoustic wave equation and the discovered equation with different noise levels. (a)
Ground truth comes from accurate acoustic wave equation. Panels (b), (e), and (h) are the noisy wavefield data with noise levels of 25%, 100%, and 300%,
respectively, which are obtained by adding noise to the ground truth. Panels (c), (f), and (i) are obtained by solving the discover equations with noise levels of
25%, 100%, and 300%, respectively. Panels (d), (g), and (j) are the corresponding differences with the ground truth.
overlapping horizons Ti , i = 1, · · ·, Nh , we can calculate the corre-

sponding redundancy loss as follows:
Nterm
1 σ j,k
Lr ( j ) = , (11)
Nterm k=1
μ j,k
where Nterm denotes the number of terms, σ j, k and μj, k represent

the standard deviation and mean, respectively, of the Nh different
coefficients over the overlapping horizons corresponding to the kth
function term in the candidate combination j . As can be appreci-
ated in eq. (11), the accurate terms are stable in the moving horizons,
that is, the coefficients have small standard deviations, resulting in
a small redundancy loss. In contrast, the coefficients of redundant
terms exhibit a large degree of variation in different horizons, due
to the need to compensate for errors caused by noise, which could
be different in different horizons.
Figure 7. Schematic diagram of observation system, where the triangle The physical loss, which is based on PINN (Raissi et al. 2019),
represents the receivers, and the star represents the position of the centre of is presented to evaluate the accuracy of the discovered wave equa-
the isotropic Gaussian function used to initialize the wavefield. tion. Here, we need first train a PINN, which maintains the same
architecture as NN shown in Fig. 1 and is initialized by the NN
Table 7. Test on discovery of 2-D acoustic wave equation from limited parameters θ, while the loss function has the form of
observations.
LPINN (θ) = λd MSEd + λ p MSE p (12)
Noise level Discovered equation Error
0% utt = 3.921(uxx + uzz ) 1.98% with the data loss
25% utt = 3.878(uxx + uzz ) 3.05%
Ny
50% utt = 3.706(uxx + uzz ) 7.35% 1
N x Nz t N
75% ut = −1.899 × 10−5 (uux uxxx + uuz uzzz ) MSEd =

N x N y N z Nt i=1 j=1 k=1 l=1
2
u ( xi , y j , z k , tl ) − PINN( xi , y j , z k , tl ; θ) (13)
and the PDE loss (14). However, this solution is not exact, especially for noisy data.

Ny

Nz

Nt
In contrast, PINN provides a reliable framework to identify the
Nx
1 coefficients. Hence, we use the PINN to obtain the values of the co-
MSE p =
N x N y N z Nt i=1 j=1 k=1 l=1 efficients, which is also initialized by the previously trained network
2 NN (Fig. 1), while the loss function is reset to

equ L ( xi , y j , z k , tl ) − ξ · equ R ( xi , y j , z k , tl ; θ ) , (14)
Ny

x N Nzt N
1
where λd and λp are hyperparameters, which control the contribution L(θ, ξ ) =
N x N y N z Nt i=1 j=1 k=1 l=1
of data and PDE losses to the total loss, respectively. Here, the data
loss comes from the average squared error (MSE) between the
2
equ L ( xi , y j , z k , tl ) − ξ · equ R ( xi , y j , z k , tl ; θ, ξ ) .
observed data and the predicted one from PINN, whereas the PDE

loss is obtained by measuring the MSE between the LHS equ L (19)

and RHS ξ · equ R terms of the potential wave equation, which is
Here, the coefficients ξ are not the output of PINN, and thus, we de-
calculated on the metadata ( xi , y j , z k , tl ) generated from the NN.
fine it as additional trainable parameters of PINN, which is updated
It should be emphasized that the coefficients ξ are deduced by along with network parameters θ to minimize the loss function. We

computing equ L − ξ · equ R = 0 during each training process. initialize the values for the trainable coefficients ξ from solving the

After training the PINN, the physical loss for the potential wave
equation (i.e. candidate combination j ) can be calculated as fol- system of linear eq. (3). Although initially it may not be accurate,
lows: it can help PINN converge faster than starting with random initial-
⎛ ⎞1 izations. Once we identify the exact coefficients, we can combine it
2
⎜ 1
N x N y N z Nt 2 ⎟
with the corresponding functional terms to obtain the general form
L p ( j ) = ⎜
⎝ uˆ PINN ( xi , y j , z k , tl ) − uˆ NN ( xi , y j , z k , tl ) ⎟
⎠ ,
N x N y N z Nt i=1 j=1 k=1 l=1 of the discovered wave equation.
(15)
where uˆ PINN NN
and uˆ refer to the normalized output of the metadata 3 NUMERICAL EXAMPLES
predicted by PINN and NN, respectively, which is determined by: To verify the feasibility and effectiveness of D-WE, we present an
u − u min
PINN example in discovering the 2-D acoustic wave equation:
uˆ PINN = , (16)
u max − u min u tt = v 2 (u x x + u zz ) , (20)
u NN − u min
uˆ NN = , (17) where we assume that the body force is absent, and v denotes veloc-
u max − u min ity. We consider wave propagation in a homogeneous medium and
where umax and umin denote the maximum and minimum of the obser- utilizes finite differences (FD) to generate the dataset. The medium,
vation data, respectively. The utilization of such a form of physical which we assume has a velocity 2 km s−1 , is discretized along 101
loss is based on the following fact: when physical constraints are gridpoints in both x and z directions with a grid spacing of 10 m. We
collect 121 snapshots of the pressure wavefield with a time interval
consistent with the data, the predicted results will exhibit significant of 2 ms from zero to 0.24 s. The wavefield is initiated by an isotropic
improvements (see eq. 12). However, if the physical constraints and Gaussian function at the centre of the model given by
data are not parallel, the performance of PINN will decrease. As a
u (i, j, 0) = exp −0.2 ∗ (i − 51)2 + ( j − 51)2 , i, j = 1, · · · , 101, (21)
result, if the underlying wave equation can effectively describe the
wavefield data, the predicted results of the trained PINN is closer to at time zero. In this test, the NN has three hidden layers with 50
the NN’s output, which is relatively accurate. Hence, the physical neurons in each layer. Refer to Xu et al. (2022), the activation
loss will be very small. function is set as a sine function. The maximum population size of
For each candidate combination j , the PIC is obtained by mul- genomes is 400, the maximum number of generations is taken as
tiplying the calculated redundancy and physical losses as follows: 100. The NN and PINN are trained by using an Adam optimizer
PIC( j ) = Lr ( j ) · L p ( j ). (18) (Kingma & Ba 2014). The hyperparameters (eq. 10), λd , and λp
(eq. 12) are set to 10−6 , 1, and 0.01, respectively.
It is worth noting that the PIC is not performed for all possible We first provide an example to illustrate the process of our method
combinations in the preliminary potential library, as calculating in discovering the acoustic wave equation from observed pressure
physical loss is time-consuming. Since the computational cost for wavefields. We randomly select 20% subsets from the complete
redundancy loss is cheap, we first derive all redundancy loss and volume of observed pressure wavefields to train the NN and then
select top Nb combinations with smaller redundancy loss. Follow- utilize them to discover the equation. The NN is trained for 30 000
ing that, we perform the PINN training on the Nb combinations and iterations. We simultaneously consider cases where the LHS of the
further combine redundancy and physical losses to present PIC. equation is given by a first-order and a second-order time derivatives.
Afterwards, we will discover the correct structure of wave equa- We generate the initial library on the LRS of the equation as {[0, 1,
tion with the smallest PIC. 3], [1, 1]}, corresponding to the form uu x u x x x + uu z u z z z + u 2x + u 2z .
By utilizing the GA (as illustrated in Fig. 2), the initial library will
evolve to produce a overcomplete library, including a lot of candi-
2.5 Identifying coefficients
date functional terms. In our case, we limit the number of candidate
Although we assume that the general structure of the wave equa- functional terms to 400, which constitutes the maximum population
tion has been obtained, we still need to determine the coefficients. size of genomes. We list the optimal genomes at some generations,
Certainly, we can obtain the coefficients by directly solving a typ- where Tables 1 and 2 correspond to the equations with first-order
ically overdetermined system of linear eq. (3). For example, we and second-order of time derivatives on the LHS, respectively. The
use the SVD method to calculate the coefficients in eqs (10) and first column represents the number of generation of evolution, the
second column indicates the optimal genome and the correspond- Furthermore, we demonstrate the robustness of D-WE to noisy
ing translated form of the potential equation, and the final column data, which is presented in Table 6. Here, Gaussian noise is added
represents their corresponding fitness scores. to the clean data u to obtain the noisy data u˜ = u + η · std (u ) ·
From Tables 1 and 2, we can see that as with the progress in evo- N (0, 1), where N(0, 1) denotes the standard normal distribution
lution, the optimal genome tends to stabilize. For example, when with mean 0 and standard deviation of 1, and η is the noise level.
the LHS of the equation is a first-order of time derivative, the opti- The results prove that D-WE is reasonably robust to high levels of
mal genome is {[0, 0], [0, 2], [0, 1, 1], [1, 1]}, while when the the noise. Surprisingly, D-WE still accurately discovers the structure
LHS of the equation corresponds to a second-order of derivative, of the equation for data with strong noise (e.g. 300% and 400%
the optimal genome is {[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]}. How- noise level), and limited data. Here, we also numerically solve the
ever, if we are to stop here and choose the equation form based discovered equations at noise levels 25%, 100%, and 300%. Fig. 6
on fitness scores, it would be {[0, 0], [0, 2], [0, 1, 1], [1, 1]}. presents a comparison between the generated wavefield snapshots
Certainly, this does not match the accurate form of the acoustic and their corresponding ground truth (Fig. 6a). We can see that
wave equation. Therefore, as stated earlier, we consider the opti- our method yields highly accurate equations for observation data
mal genome from the GA at the maximum number of generations with low noise levels (Figs 6b–d). As the noise level rises, the
as a preliminary library. The combinations from this preliminary wavefield snapshots simulated by the discovered equations show
library can be countable. We can select arbitrary terms to form a increased signal leakage compared to the ground truth, but the

potential structure of the wave equation and then utilize a more accuracy is still commendable. For observation data with strong
accurate PIC metrics to determine the exact structure of the equa- noise, such as in Fig. 6(h), the wave front is nearly obscured by
tion from all combinations. Tables 3 and 4 display the potential wave noise and the continuity is significantly disrupted. However, even in
equations with the lowest 5 PIC metrics, corresponding to the first- this case, the discovered equation still yields comparable wavefield
order and second-order time derivatives on the LHS, respectively. snapshots to the exact acoustic wave equations, as indicated in
By comparing Tables 3 and 4, we can see that the potential wave Figs 6(i) and (j).
equation in the form of utt = ξ 1 (uxx + uzz ) has the lowest PIC values, To consider more realistic observation systems, we place all
and as a result, is ultimately identified as the discovered equation receivers on the gridpoints of the model boundary, also, use the
structure. isotropic Gaussian function at the centre of the upper surface of the
Compared with the form of the acoustic wave equation, it is model to initialize the wavefield, which is illustrated in Fig. 7. In
demonstrated that the equation structure, which is directly discov- Table 7, we present the results of the discovered wave equation un-
ered from the observed pressure wavefield, is consistent with the der this observation system with different noise levels. Remarkably,
corresponding acoustic wave equation. Once we establish the accu- even under such restricted observation conditions, our method is
rate structure of the equation, we further utilize PINN to optimize still able to accurately identify the structure of the equation and
the coefficient ξ 1 for the terms (uxx + uzz ). Ultimately, we obtain the maintain robustness to some extent of noise. However, it should be
coefficient for the equation as 3.989 km2 s−2 , which has a negligi- acknowledged that, compared with random observation systems, the
ble error with the true value v2 = 4 km2 s−2 . Thus, the discovered restricted boundary observation can lead to the reduced robustness
general form of the equation is utt = 3.989(uxx + uzz ). It is worth of our method in identifying equation coefficients in the presence
emphasizing that apart from the training time of the NN, the en- of noise. Moreover, when the noise level is high (e.g. noise level =
tire discovery process is highly efficient, costing only a total of 75%), our method may discover an incorrect wave equation. This
198 s. is a predictable outcome given the highly constrained nature of the
We then validate the performance of D-WE for discovering the observation points.
acoustic wave equation on sparse observations. We randomly select
a subset of the complete gridpoint measurements of pressure wave-
fields, including 60%, 20%, 5%, 1%, and 0.5% of all gridpoints, and
compare results with the discovered form from the complete pres-
4 C O N C LU S I O N S
sure wavefield points. The results are shown in Table 5, where the
error
represents relative error between coefficients, which is defined We explored a new methodology to discover wave equations in a
as ξi − ξitr ue /ξitr ue in percent. As we can see in the table, D-WE data-driven way. This novel implementation, dubbed D-WE, is pro-
enables the discovery of the structure of the acoustic wave equa- posed to directly discover a wave equation from spatial-temporal
tion, even for extremely sparse data (e.g. 0.5% of all gridpoints). seismic wavefield observations. D-WE consists of two major com-
In most cases, the identified coefficients are almost exactly close to ponents: the NN and the GA. The NN accepts spatial-temporal
true ones, that is, the square of velocity (in this example equals 4), locations as inputs of the network to approximate the observation
and only for extremely sparse data, the accuracy of the coefficients displacement or pressure wavefields, which is used for calculating
is slightly reduced, which is acceptable. We use FD algorithms time and spatial derivatives and producing metadata. On the other
to numerically solve discovered equations derived from 60%, 5%, hand, GA serves to generate an expandable candidate functional
and 1% and volume data, respectively, and compare the resulting terms library to address the problem of an incomplete initial library.
wavefields with those obtained from simulating accurate acoustic The best wave equation is determined from the candidate library by
wave equation, as depicted in Fig. 5. As seen in Figs 5(b)–(e), it utilizing a physics-informed information criterion. The correspond-
is evident that the wavefield snapshots derived from the discovered ing coefficients of each term in the optimal form is identified by
equation exhibits a remarkably close resemblance to those obtained PINN, which is initialized by the NN. Test on the discovery of the 2-
from accurate acoustic wave equation (Fig. 5a) used in generating D acoustic wave equation demonstrates that D-WE can identify the
the observations. Furthermore, even for extremely sparse data, the correct wave equation and is robust to sparse and noisy wavefield
discovered equation demonstrates a high degree of accuracy in sim- data. This conclusion will hopefully pave the way to us utilizing
ulating seismic wave propagation (see Figs 5f and g), with only this approach for the discovery of more exotic wave equations that
negligible differences. describe wave propagation more inline with our observations.
AC K N OW L E D G M E N T S Dvorkin, J. & Nur, A., 1993. Dynamic poroelasticity: a unified model with
the squirt and the biot mechanisms, Geophysics, 58(4), 524–533.
The authors thank KAUST and the DeepWave sponsors for support- Hao, Q. & Greenhalgh, S., 2021. Nearly constant q models of the gener-
ing this research and granting permission to publish it. We thank alized standard linear solid type and the corresponding wave equations,
the editor Bertrand Rouet-Leduc and an anonymous reviewer for Geophysics, 86(4), T239–T260.
their valuable suggestions that led to many improvements in the Kingma, D.P. & Ba, J., 2014. Adam: A method for stochastic optimization,
manuscript. We also thank Hao Xu for his valuable comments and arXiv preprint arXiv:1412.6980.
suggestions. Kjartansson, E., 1979. Constant q-wave propagation and attenuation, J. geo-
phys. Res., 84(B9), 4737–4748.
Lejarza, F. & Baldea, M., 2022. Data-driven discovery of the governing
equations of dynamical systems via moving horizon optimization, Sci.
D ATA AVA I L A B I L I T Y
Rep., 12(1), 1–15.
Data associated with this research are available and can be obtained Maslyaev, M., Hvatov, A. & Kalyuzhnaya, A., 2019. Data-driven PDE
by contacting the corresponding author. discovery with evolutionary approach, In Computational Science–
ICCS 2019: 19th International Conference, Faro, Portugal, Springer
International Publishing. Proceedings, Part V 19, pp. 635–641. doi:
10.48550/arXiv.1903.08011.
REFERENCES

Raissi, M., Perdikaris, P. & Karniadakis, G.E., 2019. Physics-informed neu-
Alkhalifah, T., 2000. An acoustic wave equation for anisotropic media, ral networks: a deep learning framework for solving forward and inverse
Geophysics, 65(4), 1239–1250. problems involving nonlinear partial differential equations, J. Comput.
Ba, J., Nie, J.-X., Cao, H. & Yang, H.-Z., 2008. Mesoscopic fluid Phys., 378, 686–707.
flow simulation in double-porosity rocks, Geophys. Res. Lett., 35(4), Rudy, S.H., Brunton, S.L., Proctor, J.L. & Kutz, J.N., 2017. Data-driven
doi:10.1029/2007GL032429. discovery of partial differential equations, Sci. Adv., 3(4), e1602614,
Ba, J., Xu, W., Fu, L.-Y., Carcione, J. M. & Zhang, L., 2017. Rock anelas- doi:10.1126/sciadv.1602614.
ticity due to patchy saturation and fabric heterogeneity: a double double- Schaeffer, H., 2017. Learning partial differential equations via data
porosity model of wave propagation, J. geophys. Res., 122(3), 1949–1976. discovery and sparse optimization, Proc. R. Soc., A, 473(2197),
Biot, M. A., 1955. Theory of elasticity and consolidation for a porous doi:10.1098/rspa.2016.0446.
anisotropic solid, J. appl. Phys., 26(2), 182–185. Thomsen, L., 1986. Weak elastic anisotropy, Geophysics, 51(10), 1954–
Brunton, S. L., Proctor, J. L. & Kutz, J. N., 2016. Discovering governing 1966.
equations from data by sparse identification of nonlinear dynamical sys- Xu, H., Zeng, J. & Zhang, D., 2022. Discovery of partial differential equa-
tems, Proc. Natl. Acad. Sci., 113(15), 3932–3937. tions from highly noisy and sparse data with physics-informed informa-
Chen, Z., Liu, Y. & Sun, H., 2021. Physics-informed learning of governing tion criterion, Research, 6, doi:10.34133/research.0147.
equations from scarce data, Nat. Commun., 12(1), 1–13. Zhu, T. & Harris, J.M., 2014. Modeling acoustic wave propagation in
Cheng, S., Mao, W., Zhang, Q. & Xu, Q., 2021. Wave propagation in the heterogeneous attenuating media using decoupled fractional laplacians,
poro-viscoelastic orthorhombic two-phase media: plane-wave theory and Geophysics, 79(3), T105–T116.
wavefield simulation, J. geophys. Int., 227(1), 99–122.

C The Author(s) 2023. Published by Oxford University Press on behalf of The Royal Astronomical Society. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Robust data driven discovery of a seismic wave equation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robust data driven discovery of a seismic wave equation

Uploaded by

Copyright:

Available Formats

Geophys. J. Int. (2024) 236, 537–546 https://doi.org/10.

Robust data driven discovery of a seismic wave equation

Shijun Cheng and Tariq Alkhalifah

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

can we discover a sufficiently precise wave equation directly from

equation, a physics-informed neural network (PINN) (Raissi et al.

2.1 Problem description

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

Figure 3. An illustration of the process of crossover and mutation.

Figure 4. An illustration of moving horizon.

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

overlapping horizons Ti , i = 1, · · ·, Nh , we can calculate the corre-

where Nterm denotes the number of terms, σ j, k and μj, k represent

75% ut = −1.899 × 10−5 (uux uxxx + uuz uzzz ) MSEd =

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

Downloaded from https://academic.oup.com/gji/article/236/1/537/7424129 by guest on 02 April 2024

You might also like