Professional Documents
Culture Documents
1093/gji/ggad446
Advance Access publication 2023 November 15
GJI General Geophysical Methods
Accepted 2023 November 1. Received 2023 September 24; in original form 2023 April 29
SUMMARY
C The Author(s) 2023. Published by Oxford University Press on behalf of The Royal Astronomical Society. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
537
538 S. Cheng and T. Alkhalifah
2 METHOD
known? A feasible option is to use finite difference and other numer- derivatives, which is usually difficult to guarantee, especially for
ical methods to calculate time and spatial derivatives. Although nu- field data.
merical methods are efficient, the observations need to be on a reg- In contrast, NNs have proven their robustness in representing
ular grid and have high signal-to-noise ratio for accurate numerical noisy data (Xu et al. 2022). Hence, an alternative solution is that we
540 S. Cheng and T. Alkhalifah
Table 1. The evolution process of the preliminary library when the left-hand side of the equation is the
first-order time derivative.
Number of Genome and Fitness
generations translation
1 Genome:[1]{[1, 1], [0, 1, 3]} 135.62
Translation: ut = ξ1 (u 2x + u 2z ) + ξ2 (uu x u x x x + uu z u z z z )
20 Genome:[1]{[1, 1, 2], [0, 2, 2], [0, 1, 1], [1, 1]} 110.37
Translation: ut = ξ1 (u 2x u x x + u 2z u zz ) + ξ2 (uu 2x x + uu 2zz )
+ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z )
40 Genome:[1]{[0, 0], [0, 2], [0, 1, 1], [1, 1]} 47.27
Translation:ut = ξ 1 u2 + ξ 2 (uuxx + uuzz )
+ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z )
60 Genome:[1]{[0, 0], [0, 2], [0, 1, 1], [1, 1]} 47.27
Translation:ut = ξ 1 u2 + ξ 2 (uuxx + uuzz )
+ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z )
Table 2. The evolution process of the preliminary library when the left-hand side of the equation is the second-order
time derivative.
Number of Genome and Fitness
generations translation
1 Genome:[2]{[2], [0, 0, 2], [0, 1, 1]} 47880.75
Translation:utt = ξ 1 (uxx + uzz ) + ξ 2 (u2 uxx + u2 uzz )
+ξ3 (uu 2x + uu 2z )
20 Genome:[2]{[2], [0, 0, 2], [0, 1, 3]} 47748.91
Translation:utt = ξ 1 (uxx + uzz ) + ξ 2 (u2 uxx + u2 uzz )
+ξ 3 (uux uxxx + uuz uzzz )
40 Genome:[2]{[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]} 47730.40
Translation:utt = ξ 1 (uxx + uzz ) + ξ 2 (u2 uxx + u2 uzz )
+ξ 3 (uux uxx + uuz uzz ) + ξ 4 (uux uxxx + uuz uzzz )
60 Genome:[2]{[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]} 47730.40
Translation:utt = ξ 1 (uxx + uzz ) + ξ 2 (u2 uxx + u2 uzz )
+ξ 3 (uux uxx + uuz uzz ) + ξ 4 (uux uxxx + uuz uzzz )
80 Genome:[2]{[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]} 47730.40
Translation:utt = ξ 1 (uxx + uzz ) + ξ 2 (u2 uxx + u2 uzz )
+ξ 3 (uux uxx + uuz uzz ) + ξ 4 (uux uxxx + uuz uzzz )
100 Genome:[2]{[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]} 47730.40
Translation:utt = ξ 1 (uxx + uzz ) + ξ 2 (u2 uxx + u2 uzz )
+ξ 3 (uux uxx + uuz uzz ) + ξ 4 (uux uxxx + uuz uzzz )
Table 3. The potential wave equations and the corresponding PIC when the Table 4. The potential wave equations and the corresponding PIC when the
left-hand side of the equation is the first-order time derivative. left-hand side of the equation is the second-order time derivative.
Potential wave equation PIC Potential wave equation PIC
ut = ξ 1 u2 + ξ 2 (uuxx + uuzz ) 0.028099
u t = ξ1 u 2 + ξ2 (uu x x + uu zz ) + ξ2 (u 2x + u 2z ) 0.0093211 utt = ξ 1 (uxx + uzz ) 0.000187
ut = ξ 1 (uuxx + uuzz ) 0.0075959 utt = ξ 1 (u2 uxx + u2 uzz ) 0.02694
u t = ξ1 u 2 + ξ4 (u 2x + u 2z ) 0.011005 utt = ξ 2 (uux uxxx + uuz uzzz ) 0.035408
u t = ξ1 u 2 + ξ2 (uu x x + uu zz ) + ξ3 (uu 2x + uu 2z ) + ξ4 (u 2x + u 2z ) 0.011924 utt = ξ 1 (u2 uxx + u2 uzz ) + ξ 2 (uux uxxx + uuz uzzz ) 0.040359
utt = ξ 1 (uux uxx + uuz uzz ) + ξ 2 (uux uxxx + uuz uzzz ) 0.050408
Discovery of wave equation 541
Table 5. Test on discovery of a 2-D acoustic wave equation with varying gene, to represent different order of derivatives. For example,
subsets of the total observations. ⎧
⎪
⎪ 0⇔u
Volume of data Discovered equation Error ⎨
1 ⇔ u x or u y or u z or u t
Gene: . (5)
100% utt = 3.99(uxx + uzz ) 0.25% ⎪
⎪ 2 ⇔ u x x or u yy or u zz or u tt
⎩
60% utt = 3.999(uxx + uzz ) 0.025% 3 ⇔ u x x x or u y y y or u z z z
20% utt = 3.989(uxx + uzz ) 0.28%
5% utt = 3.992(uxx + uzz ) 0.2% Here, number 0 represents displacement/pressure wavefield u; num-
1% utt = 3.941(uxx + uzz ) 1.48% ber 1, 2, and 3 are used to encode the first, second, and third order
0.5% utt = 3.969(uxx + uzz ) 0.78% spatial derivatives of the displacement/pressure wavefield, respec-
tively. Also, we use numbers 1 and 2 to represent first and second
order time derivatives of the displacement/pressure wavefield, re-
spectively. Then, we combine some genes to form gene modules,
can skilfully use automatic differentiation of the NN to calculate the which can be utilized to define functional terms. For example, the
derivatives during the process of backpropagation. For this purpose, functional terms in the LHS are represented as
we only need to train a deep fully connected backpropagation NN
[1] ⇔ u t
(shown in Fig. 1) using the following loss function Gene module: , (6)
[2] ⇔ u tt
Table 6. Test on discovery of a 2-D acoustic wave equation from data with more concise. Conversely, as decreases, the equation exhibits a
different noise level. more complex structure.
Noise level Discovered equation Error Once we obtain the fitness of all potential candidate equations,
we can select the genomes that better describes the wave propaga-
25% utt = 3.985(uxx + uzz ) 0.38%
tion system. In our case, the best half of the children are selected as
50% utt = 3.973(uxx + uzz ) 0.68%
100% utt = 3.97(uxx + uzz ) 0.75%
the next generation of parents, and all others genomes are replaced
200% utt = 3.904(uxx + uzz ) 2.4% by new random genomes. The process of crossover, mutation, and
300% utt = 3.76(uxx + uzz ) 6% selection is repeated for the new generation. When a certain prede-
400% utt = 3.517(uxx + uzz ) 12.08% fined iteration is reached, the preliminary library with a few terms in
the last generation is reserved. For this preliminary library, the com-
binations of all candidate functional terms are countable, which is
useful to evaluate each combination to further determine the equa-
coefficients can take on arbitrary values, and this does not impact tion. To do this, in the next section, we will use the PIC algorithm
our ability to discover the equation’s structure. (Xu et al. 2022) to discover the accurate structure of the wave
Subsequently, crossover and mutation are conducted under a cer- equation from the preliminary library.
tain probability to obtain next generation candidates. Cross-over
means swapping parts of gene modules of two genomes to gener-
ate their children (see Fig. 3a). Following the crossover, mutation
produces new genes, containing add, delete, and order genes (see
2.4 Physics-informed information criterion
Fig. 3b). It should be emphasized that crossover and mutation are
only applied to the RHS of the equation, whereas the LHS searches The PIC algorithm involves two types of measurements: redun-
for the time derivative order. This is reasonable for most wave equa- dancy and physical losses. Redundancy loss is used to measure the
tions. parsimony of the proposed equation and is based on the idea that
After mutation, we need to measure the quality of the genome the coefficients of redundant terms are unstable when applied to the
and then perform the selection process. The measurement index is observed data on moving windows of a given time step (Lejarza &
computed by a fitness function as follows: Baldea 2022). Therefore, we can utilize this technique, namely the
moving horizon, to calculate the average variation in coefficients
1 2
F= equ L − equ iR ξi + · len (genome), (10) for each combination to obtain the redundancy loss. As shown in
N the Fig. 4, the smooth wavefield snapshots generated by the NN
where N denotes all observation samples, equL denotes the LHS at different times are divided into Nh overlapping horizons Ti (i =
functional terms of the candidate wave equation, equ iR represent i 1, · · ·, Nh ). The Ti is denoted as the wavefield snapshots within a
th function term in the RHS, and the corresponding coefficients ξ i time range, such as [tmin + it, 12 (tmin + tmax ) + it ], where tmin
are calculated using singular value decomposition (SVD). It is worth and tmax represent the the minimum and maximum of the time do-
emphasizing that in this section and the next section, we are utilizing main of the generated snapshots, respectively, and t denotes the
the SVD method to calculate the coefficients corresponding to each length of horizons. For a candidate combination j (i.e. potential
function term. Although it may not achieve absolute precision, it can wave equation), the corresponding vector of coefficients ξ ij in hori-
be relied upon. To avoid redundancy in the discovery equation, we zon Ti can be obtained solving equation equ i,L j − ξ ij · equ i,R j = 0,
use an l0 penalty on the number of terms in the discovered equation. where equ i,L j and equ i,R j are the values of the LHS and RHS terms
Here, len(genome) denotes the length of the genome, and is a for a potential wave equation j in horizon Ti , respectively. For
hyperparameter. In general, as increases, the equation becomes the combination j , when we obtain all the coefficient vectors in
Discovery of wave equation 543
and the PDE loss (14). However, this solution is not exact, especially for noisy data.
Ny
Nz
Nt
In contrast, PINN provides a reliable framework to identify the
Nx
1 coefficients. Hence, we use the PINN to obtain the values of the co-
MSE p =
N x N y N z Nt i=1 j=1 k=1 l=1 efficients, which is also initialized by the previously trained network
2 NN (Fig. 1), while the loss function is reset to
equ L ( xi , y j , z k , tl ) − ξ · equ R ( xi , y j , z k , tl ; θ ) , (14)
Ny
x N Nzt N
1
where λd and λp are hyperparameters, which control the contribution L(θ, ξ ) =
N x N y N z Nt i=1 j=1 k=1 l=1
of data and PDE losses to the total loss, respectively. Here, the data
loss comes from the average squared error (MSE) between the
2
equ L ( xi , y j , z k , tl ) − ξ · equ R ( xi , y j , z k , tl ; θ, ξ ) .
observed data and the predicted one from PINN, whereas the PDE
loss is obtained by measuring the MSE between the LHS equ L (19)
and RHS ξ · equ R terms of the potential wave equation, which is
Here, the coefficients ξ are not the output of PINN, and thus, we de-
calculated on the metadata ( xi , y j , z k , tl ) generated from the NN.
fine it as additional trainable parameters of PINN, which is updated
It should be emphasized that the coefficients ξ are deduced by along with network parameters θ to minimize the loss function. We
computing equ L − ξ · equ R = 0 during each training process. initialize the values for the trainable coefficients ξ from solving the
(15)
where uˆ PINN NN
and uˆ refer to the normalized output of the metadata 3 NUMERICAL EXAMPLES
predicted by PINN and NN, respectively, which is determined by: To verify the feasibility and effectiveness of D-WE, we present an
u − u min
PINN example in discovering the 2-D acoustic wave equation:
uˆ PINN = , (16)
u max − u min u tt = v 2 (u x x + u zz ) , (20)
u NN − u min
uˆ NN = , (17) where we assume that the body force is absent, and v denotes veloc-
u max − u min ity. We consider wave propagation in a homogeneous medium and
where umax and umin denote the maximum and minimum of the obser- utilizes finite differences (FD) to generate the dataset. The medium,
vation data, respectively. The utilization of such a form of physical which we assume has a velocity 2 km s−1 , is discretized along 101
loss is based on the following fact: when physical constraints are gridpoints in both x and z directions with a grid spacing of 10 m. We
collect 121 snapshots of the pressure wavefield with a time interval
consistent with the data, the predicted results will exhibit significant of 2 ms from zero to 0.24 s. The wavefield is initiated by an isotropic
improvements (see eq. 12). However, if the physical constraints and Gaussian function at the centre of the model given by
data are not parallel, the performance of PINN will decrease. As a
u (i, j, 0) = exp −0.2 ∗ (i − 51)2 + ( j − 51)2 , i, j = 1, · · · , 101, (21)
result, if the underlying wave equation can effectively describe the
wavefield data, the predicted results of the trained PINN is closer to at time zero. In this test, the NN has three hidden layers with 50
the NN’s output, which is relatively accurate. Hence, the physical neurons in each layer. Refer to Xu et al. (2022), the activation
loss will be very small. function is set as a sine function. The maximum population size of
For each candidate combination j , the PIC is obtained by mul- genomes is 400, the maximum number of generations is taken as
tiplying the calculated redundancy and physical losses as follows: 100. The NN and PINN are trained by using an Adam optimizer
PIC( j ) = Lr ( j ) · L p ( j ). (18) (Kingma & Ba 2014). The hyperparameters (eq. 10), λd , and λp
(eq. 12) are set to 10−6 , 1, and 0.01, respectively.
It is worth noting that the PIC is not performed for all possible We first provide an example to illustrate the process of our method
combinations in the preliminary potential library, as calculating in discovering the acoustic wave equation from observed pressure
physical loss is time-consuming. Since the computational cost for wavefields. We randomly select 20% subsets from the complete
redundancy loss is cheap, we first derive all redundancy loss and volume of observed pressure wavefields to train the NN and then
select top Nb combinations with smaller redundancy loss. Follow- utilize them to discover the equation. The NN is trained for 30 000
ing that, we perform the PINN training on the Nb combinations and iterations. We simultaneously consider cases where the LHS of the
further combine redundancy and physical losses to present PIC. equation is given by a first-order and a second-order time derivatives.
Afterwards, we will discover the correct structure of wave equa- We generate the initial library on the LRS of the equation as {[0, 1,
tion with the smallest PIC. 3], [1, 1]}, corresponding to the form uu x u x x x + uu z u z z z + u 2x + u 2z .
By utilizing the GA (as illustrated in Fig. 2), the initial library will
evolve to produce a overcomplete library, including a lot of candi-
2.5 Identifying coefficients
date functional terms. In our case, we limit the number of candidate
Although we assume that the general structure of the wave equa- functional terms to 400, which constitutes the maximum population
tion has been obtained, we still need to determine the coefficients. size of genomes. We list the optimal genomes at some generations,
Certainly, we can obtain the coefficients by directly solving a typ- where Tables 1 and 2 correspond to the equations with first-order
ically overdetermined system of linear eq. (3). For example, we and second-order of time derivatives on the LHS, respectively. The
use the SVD method to calculate the coefficients in eqs (10) and first column represents the number of generation of evolution, the
Discovery of wave equation 545
second column indicates the optimal genome and the correspond- Furthermore, we demonstrate the robustness of D-WE to noisy
ing translated form of the potential equation, and the final column data, which is presented in Table 6. Here, Gaussian noise is added
represents their corresponding fitness scores. to the clean data u to obtain the noisy data u˜ = u + η · std (u ) ·
From Tables 1 and 2, we can see that as with the progress in evo- N (0, 1), where N(0, 1) denotes the standard normal distribution
lution, the optimal genome tends to stabilize. For example, when with mean 0 and standard deviation of 1, and η is the noise level.
the LHS of the equation is a first-order of time derivative, the opti- The results prove that D-WE is reasonably robust to high levels of
mal genome is {[0, 0], [0, 2], [0, 1, 1], [1, 1]}, while when the the noise. Surprisingly, D-WE still accurately discovers the structure
LHS of the equation corresponds to a second-order of derivative, of the equation for data with strong noise (e.g. 300% and 400%
the optimal genome is {[2], [0, 0, 2], [0, 1, 2], [0, 1, 3]}. How- noise level), and limited data. Here, we also numerically solve the
ever, if we are to stop here and choose the equation form based discovered equations at noise levels 25%, 100%, and 300%. Fig. 6
on fitness scores, it would be {[0, 0], [0, 2], [0, 1, 1], [1, 1]}. presents a comparison between the generated wavefield snapshots
Certainly, this does not match the accurate form of the acoustic and their corresponding ground truth (Fig. 6a). We can see that
wave equation. Therefore, as stated earlier, we consider the opti- our method yields highly accurate equations for observation data
mal genome from the GA at the maximum number of generations with low noise levels (Figs 6b–d). As the noise level rises, the
as a preliminary library. The combinations from this preliminary wavefield snapshots simulated by the discovered equations show
library can be countable. We can select arbitrary terms to form a increased signal leakage compared to the ground truth, but the
AC K N OW L E D G M E N T S Dvorkin, J. & Nur, A., 1993. Dynamic poroelasticity: a unified model with
the squirt and the biot mechanisms, Geophysics, 58(4), 524–533.
The authors thank KAUST and the DeepWave sponsors for support- Hao, Q. & Greenhalgh, S., 2021. Nearly constant q models of the gener-
ing this research and granting permission to publish it. We thank alized standard linear solid type and the corresponding wave equations,
the editor Bertrand Rouet-Leduc and an anonymous reviewer for Geophysics, 86(4), T239–T260.
their valuable suggestions that led to many improvements in the Kingma, D.P. & Ba, J., 2014. Adam: A method for stochastic optimization,
manuscript. We also thank Hao Xu for his valuable comments and arXiv preprint arXiv:1412.6980.
suggestions. Kjartansson, E., 1979. Constant q-wave propagation and attenuation, J. geo-
phys. Res., 84(B9), 4737–4748.
Lejarza, F. & Baldea, M., 2022. Data-driven discovery of the governing
equations of dynamical systems via moving horizon optimization, Sci.
D ATA AVA I L A B I L I T Y
Rep., 12(1), 1–15.
Data associated with this research are available and can be obtained Maslyaev, M., Hvatov, A. & Kalyuzhnaya, A., 2019. Data-driven PDE
by contacting the corresponding author. discovery with evolutionary approach, In Computational Science–
ICCS 2019: 19th International Conference, Faro, Portugal, Springer
International Publishing. Proceedings, Part V 19, pp. 635–641. doi:
10.48550/arXiv.1903.08011.
REFERENCES
C The Author(s) 2023. Published by Oxford University Press on behalf of The Royal Astronomical Society. This is an Open Access
article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.