Professional Documents
Culture Documents
Stochastic Simulations
29 January 2018
Cover Illustration: An example of stochastic simulation for part of a reservoir with two
rock types.
Suggested Citation: Madsen, L. J.; Ossiander, M. E.; Peszynska, M.; Bromhal, G.;
Harbert, W. Risk Reduction of CO2 Storage with Stochastic Simulations; NETL-TRS-1-
2018; NETL Technical Report Series; U.S. Department of Energy, National Energy
Technology Laboratory: Albany, OR, 2018; p 60.
NETL-TRS-1-2018
29 January 2018
NETL Contacts:
Grant Bromhal, Principal Investigator
Angela Goodman, Technical Portfolio Lead
David Alman, Executive Director, Acting, Research & Innovation Center
This page intentionally left blank.
Risk Reduction of CO2 Storage with Stochastic Simulations
Table of Contents
EXECUTIVE SUMMARY ...........................................................................................................1
1. INTRODUCTION....................................................................................................................2
2. METHODS ...............................................................................................................................3
3. OBSERVATIONS ....................................................................................................................7
4. CONCLUSIONS ......................................................................................................................9
5. REFERENCE .........................................................................................................................10
APPENDIX V: CASE STUDY: DATA FROM SEISMIC INVERSIONS AND ITS USE
WITH PCA-BASED TOOLS................................................................................................... V-1
I
Risk Reduction of CO2 Storage with Stochastic Simulations
List of Figures
Figure 1: Overview of PCA based tools. ........................................................................................ 3
Figure 2: From left to right, results of PCA, KPCA, and OPCA.................................................... 4
Figure 3: Conceptual example how OPCA transforms a noisy dataset into an (almost) binary one.
................................................................................................................................................. 5
Figure 4: Diagram showing how one can use various PCA-based tools can be used together. ..... 6
Figure 5: Dimensions of the porosity dataset. ................................................................................ 7
Figure 6: Diagram which shows how CoKPCA and OPCA were used to create channeled
stochastic simulations of porosity and permeability. .............................................................. 8
II
Risk Reduction of CO2 Storage with Stochastic Simulations
III
Risk Reduction of CO2 Storage with Stochastic Simulations
Acknowledgments
This work was completed as part of National Energy Technology Laboratory (NETL) research
for the U.S. Department of Energy’s (DOE) Pacific Coast Carbon Storage Initiative. The authors
wish to acknowledge Cindy Powell and Karl Schroeder (NETL Office of Research and
Development) for guidance, direction, and support.
IV
Risk Reduction of CO2 Storage with Stochastic Simulations
EXECUTIVE SUMMARY
The purpose of this project was to create an efficient and flexible tool based on principal
component analysis (PCA) for a generation of differentiable realizations of porosity and
permeability fields. Of particular interest were data which had significant connectivity between
patches of the same rock type, which is called “binary”. The efficiency requirement needed a so-
called kernel PCA, and the binary images were found with the method called optimization based
PCA (OPCA) for which interpretation is provided. The ability to honor given data at particular
locations is also incorporated. The tool worked very well and efficiently for two dimensional
(2-D) simulations. Furthermore, the tool was applied to a three-dimensional (3-D) dataset and
determined that conditioning can be used to maintain the connectivity between vertical layers.
Since kernel-based tools need snapshots, several tools that generate snapshots were also created.
These tools are based on PCA or on novel filtering techniques.
The next steps for this project include new techniques to combine PCA with upscaling, testing
models with more data, expanding OPCA and filtering, and extending techniques to work more
appropriately with non-Gaussian data.
1
Risk Reduction of CO2 Storage with Stochastic Simulations
1. INTRODUCTION
The technology of carbon dioxide (CO2) storage in the subsurface raises various questions
concerning the safety of injection and of permanent storage. The decision makers as well as the
general public want to be informed of risk involved with CO2 storage, and in particular of risk
involving leakage of CO2 from underground reservoirs. One way to estimate this risk is to
produce computer simulations of the processes involved and use them to predict or estimate
corresponding risks. In order to be realistic, such simulations need to be based on models built
from adequate data. The big concern is that the structure and properties of most reservoirs are
poorly known because of the very nature and challenges of subsurface interrogation. Therefore,
there is uncertainty involved with the process itself, in a large part due to the lack of detailed data
about subsurface.
Stochastic simulation is a common technique used in computer simulations when data is poorly
known, which use multiple guesses or stochastic realizations of the data, solve the problem with
this data, and then analyze the results to understand the variability of the quantities of interest
depending on the probability distribution of the data. This is the general framework of Monte
Carlo techniques. The key question is how to provide the stochastic realizations that capture, in
as few realizations as possible, the nature of the data.
In this project, the National Energy Technology Laboratory (NETL) partners were interested in
stochastic simulations of the porosity and permeability values in a reservoir. This study proposed
to develop tools that are based on the modern state of the art techniques from the family of PCA
(principal component analysis), also known as Karhunen-Loeve expansions. PCA is able to
provide stochastic realizations that capture the “essence” of the variability of the underlying
porosity or permeability field. The technique is well known and has gained interest in the last 15
years thanks to deep numerical analysis results associated with PCA representations as well as
efficient techniques for solving, e.g., reservoir simulation problems with these techniques. The
difficulties include the complexity of PCA as well as the fact that PCA produces only smooth
Gaussian fields and, therefore, is unable in general, to provide realizations of binary media. In
this project, tools (i.e. kernel principal component analysis, KPCA) were provided based on the
so-called kernel PCA that deal with the challenge of complexity. This work relies on theory
presented in current literature as well as on the use of data, to which this study further applied
conditioning to maintain connectivity; the corresponding tool is called connectivity KPCA
(CoKPCA). The binary media tool is called optimization based PCA (OPCA) and can work
alone to transform any realization to a binary one. Guidelines and additional tools were also
provided in case there is not enough hard data; these tools are in a collection which is called
SnapPCA.
2
Risk Reduction of CO2 Storage with Stochastic Simulations
2. METHODS
This section provides the first overview of PCA-based tools including the basic PCA, KPCA,
and the subsequent development of OPCA and CoKPCA, as well as of SnapPCA. The use of
data that was provided for testing is also addressed. Technical details are provided in Appendices
I, II, and III.
Ns snapshots
m, Cov, Ns snapshots
Nr, N
ND, Nr, N Nr, N
Nc points, data
OPCA
Figure 1: Overview of PCA based tools.
An example that illustrates the results of PCA, KPCA, and OPCA is provided below.
3
Risk Reduction of CO2 Storage with Stochastic Simulations
Any stochastic simulation has to begin with some knowledge about the probability distribution
of the underlying data. For Gaussian fields, the distribution is entirely determined by its mean
and covariance. This project was interested in the porosity phi(x) and permeability K(x) data
which it is assumed can vary with position x. Thus, the mean m(x) is also dependent on the
position x. It is assumed therefore that:
a) m(x) and the spatial covariance Cov(x,y) for the porosity field are known either
analytically or experimentally
In practice, the following information is required to construct stochastic realizations:
b) Desired spatial resolution of results Nd = Nx x Ny x Nz. Here Nx is the number
of grid points in x direction, and Ny and Nz are defined analogously
Note that most work in literature to date uses Nz=1
c) Desired number of realizations Nr
d) Desired degree of variability N which equals the number of terms of PCA
included in the realizations. Small N means only large-scale features are
maintained. Large N means small variations are included. N < ND and in practice
it is sufficient to use N much smaller than ND.
In addition, there may be some Nc spatial points where the data is known and:
e) The location and values of phi(x) at these conditioning points is given
Finally, if channeled data is of interest, one has to decide on the threshold so that multiple
rock types can be defined. A user has to make precise for the needs of OPCA
f) How the thresholds will be determined
PCA needs the assumptions above (a–d) to proceed. (An analytical model of Gaussian
covariance is implemented, which is easily adapted in 2D to represent anisotropy).
As mentioned, the complexity of PCA is of the order of O(ND x ND x ND) with storage
requirements of O(ND x ND), and thus PCA is not feasible for resolution far beyond 100 x 100 x
1. In this case, KPCA can be used. KPCA bypasses the difficulty of large complexity of PCA.
Details and examples are provided in Appendix I. The tool developed is called Task22_PCA.
KPCA requires snapshots (training images) which have the same resolution Nx x Ny x Nz, as
the desired field. Thus required are:
4
Risk Reduction of CO2 Storage with Stochastic Simulations
g) Ns snapshots which are sample realizations of the given field obtained either from
some geostatistical tool based on a given analytical model of covariance, or are
obtained as subsets of the original dataset, or are obtained by SnapPCA (see
below). In practice one wants Ns is to be significantly smaller than ND, but not
too small so that enough variability is maintained since this study must have N <
Ns.
KPCA requires g), c), d). The spatial resolution b) is automatically deduced from the snapshots.
See Appendix I for technical details on KPCA and in particular how the dimension reduction
(model reduction) is performed, and how the complexity is reduced to O(Ns x Ns x Ns).
CoKPCA allows realizations to be built which honor given hard data at locations determined in
e). In principle, PCA and conditioning as in Ossiander et al. (2014). The only tool for KPCA is
provided here. CoKPCA requires g), c), d), and e).
See Appendix I for discussion of CoKPCA.
The tools PCA, KPCA, OPCA, CoKPCA can be used together as shown in Figure 2.
The MATLAB script Task22_CoKPCA realizes KPCA and CoKPCA.
OPCA tools transform data produced by PCA or KPCA and produce binary images, i.e., every
point/pixel is associated with one of two possible rock types. See Appendix II for discussion of
OPCA.
OPCA uses any realization and based on information encoded in f) transforms it to a binary
image. See Figure 3 for an example of subsequent transformations of a smooth and noisy field to
one that is eventually only representing one rock type.
Figure 3: Conceptual example how OPCA transforms a noisy dataset into an (almost) binary
one.
5
Risk Reduction of CO2 Storage with Stochastic Simulations
SnapPCA
OPCA
Figure 4: Diagram showing how one can use various PCA-based tools can be used together.
6
Risk Reduction of CO2 Storage with Stochastic Simulations
3. OBSERVATIONS
Originally, this project focused on stochastic simulation tools that now comprise CoKPCA and
on methods to create binary images which preserve connectivity. After some reflection and
experiments with a nonlinear version of KPCA, it was determined that OPCA is a much simpler
and more effective tool than nonlinear KPCA.
As predicted by the theory, KPCA (and CoKPCA) work very well. However, their success
requires the availability of snapshots. It is then dependent on both the quality of the snapshots
and the ability to predict the probability distribution from a given set of snapshots.
The use of PCA is consistent with the theory of Gaussian fields. However, datasets are
frequently not Gaussian. The extension of this work to non-Gaussian fields arose when the tools
have been applied to a dataset. An overview of this case study is shown in Figure 5; it is
discussed in detail in Appendix IV.
In the late stage of this project a large porosity dataset was furnished by William Harbert. This
dataset was used to test the PCA based tools. In particular, layers of this 51 x 51 x 1,500 dataset
were extracted and used as snapshots for KPCA, see Figure 5 below. Further, 3 x 3 data points
were conditioned and used CoKPCA to obtain realizations that maintain the desired connectivity.
Since permeability data was needed, the well-known Carman-Kozeny (C-K) model was used to
calculate Kperm(x) from phi(x) (see Figure 6). However, since that the dataset contained a
mixture of background shale with patches of sandstone, it was clear that a different C-K model
would be more appropriate for each of the two rock types. Thus, a field r(x) was created using
OPCA with some predefined threshold, and later calculated Kperm(x) at each x as a function of
r(x) and phi(x).
7
Risk Reduction of CO2 Storage with Stochastic Simulations
Ns snapshots
Data Nr, N
Nc points, data
Phi(x)
CoKPCA OPCA
Realizations Realizations
of of
Phi(x) r(x)
Carman- Realizations
Kozeny of
K(x)
Figure 6: Diagram which shows how CoKPCA and OPCA were used to create channeled
stochastic simulations of porosity and permeability.
The results of this case study were very promising and are detailed in Appendix V. Further work
is needed and is discussed in Conclusions.
8
Risk Reduction of CO2 Storage with Stochastic Simulations
4. CONCLUSIONS
The tools created can be used to generate well distributed smooth realizations of the porosity and
permeability fields. While the use of PCA is not new, the implementation of CoKPCA
(conditional Kernel PCA) is new. This study originally planned to use a different technique than
OPCA to create binary images. However, it was determined that OPCA as described is the
simplest and most robust option.
Furthermore, this study applied the CoKPCA tool in a novel way to a complex dataset and
created 3-D well connected realizations. This was rather unexpected, but very promising. In a
typical scenario, the availability of one porosity dataset from seismic inversions is common. This
tool allows to create any number of stochastic realizations of this dataset to be created that
maintains the structure.
This study was also able to generate data both for porosity and permeability. The a-priori
knowledge that the dataset combined two (or more) rock types led to the generation of
categorical images of rock type which were used later by an algebraic/stochastic simulation to
generate permeability values.
Further and continued work would enhance the ability to create stochastic realistic and useful
realizations includes items listed below:
• The challenges of 3-D/large reservoir simulations frequently require upscaled datasets.
Since PCA based tools suffer from a similar curse of dimensionality, PCA based tools
should be developed that work well with upscaling. Thorough testing of PCA and
upscaling is needed to fully understand the challenges and opportunities.
• Many fields are not Gaussian. The development of theory to guide the combination of
transformed data with PCA to produce implementations in the non-Gaussian setting is
needed.
• OPCA as implemented is based on thresholding and works well for binary media and
channeled media. More rock types and more complex ways to determine how these
cluster and/or aggregate and/or disaggregate should be determined. Further work includes
theory and implementation of an extension of OPCA that account for connections
between cells and supports aggregation, and allows control of fuzzy boundaries and for
multiple rock types. There should also be a mechanism for users to provide expert-based
knowledge about rock types to aid in connecting porosity and permeability values.
• Filtering tools that are part of SnapPCA are very promising theoretical and practical
techniques which should be expanded, especially as concerned with the connection to the
variogram/covariance models and the implementation of conditioning.
• In order to create good quality robust tools work with more data should be accomplished
and analysis of which may inspire and inform future development.
9
Risk Reduction of CO2 Storage with Stochastic Simulations
5. REFERENCE
Ossiander, M. E.; Peszynska, M..; Vasylkivska, V. S. Conditional Stochastic Simulations of
Flow and Transport with Karhunen-Loève Expansions, Stochastic Collocation, and
Sequential Gaussian Simulation. Journal of Applied Mathematics 2014, 2014, 21.
DOI:10.1155/2014/652594
10
Risk Reduction of CO2 Storage with Stochastic Simulations
I-1
Risk Reduction of CO2 Storage with Stochastic Simulations
I-2
NETL Task 22 Report. Appendix I
PCA, KPCA, and CoKPCA
In this section we give technical details on the tools PCA, KPCA, CoKPCA, explain
the data needed to implement the tools and summarize results.
SUMMARY: The data needed for PCA, KPCA, CoKPCA is
(1) Grid definition and in particular N d
(2) The mean m and covariance C or its approximation C ∗
(a) C is defined by its analytical model
(b) C ∗ is defined by its snapshots in K0∗
(3) Desired number N r of realizations
(4) Given N c of conditioning points, their location B, and values KB .
The results of PCA, KPCA, and CoKPCA are the simulated values
(1) K (m) , m = 1, . . . N r,
Each K (m) is a vector of size N d, and if N c > 0, it agrees exactly with values at the
given locations in B.
I.A. Overview
In this section we give technical details on stochastic simulations using PCA (Principal
Component Analysis) and provide examples.
We assume that one is interested in the value of some random field K(x). (K usually
denotes permeability but our methods developed below are universal). We assume that
x denotes position and is a 2D or 3D vector. (For simplicity we assume 2D below; the
structure for 3D is analogous but the complexity is significantly higher).
Any stochastic simulation method must assume some knowledge of probability dis-
tributions. Figure I.1 shows examples of a heterogeneous field K(x) along with the
histogram of its data. This histogram does not include the information about spatial
variability and correlation. Significantly more information is given by a spatial varying
mean and covariance matrix, i.e., so-called two-point statistics. Further refinement can
be obtained using multi-point geostatistics, but this will not be discussed here.
In all (linear) PCA-based methods one assumes that the mean field m(x)
(2) m(x) := E[K](x)
and the covariance matrix
(3) CK (x, y) = cov(K(x), K(y)) = E[(K(x) − m(x))(K(y) − m(y))]
are known.
For stationary process CK (u, v) is a function of the distance (lag) |u − v| only so that
CK (u, v) = c(|u − v|), where c is some given function of |u − v|. The function c is also
known as the co-variogram. Note that each u, v has several components depending on
the spatial dimensions, and that |u − v| is the Euclidean distance between the points
u, v.
where DD0 is the Cholesky decomposition. Each Qp is a column vector, and D has
dimensions N d × N d.
In practice, one obtains the spectral decomposition Q numerically via Cholesky de-
composition or SVD. (In SVD, one seeks C = U ΣV 0 where both U, V are orthogonal
matrices and Σ is the matrix of singular values which are the same as eigenvalues of
C 0 C. For an spd matrix, Σ is the same as the matrix of eigenvalues.)
Since all PCA methods come with similar asymptotic computational complexity, in
this project we use svd because it is the best for finding accurately the dominant eigenval-
ues and eigenvectors. In addition, SVD is not sensitive to spurious negative eigenvalues
which may occur for poor quality experimental covariance C. In addition, it is known
to be very stable [3].
where Y (m) is a random realization of a vector of i.i.d. N (0, 1) r.v.’s with dimension
equal to the column dimension of D. See [1] for background.
The PCA and/or SVD are useful in constructing low rank approximations and ran-
dom realizations of K(x) which include only the first N dominant eigenvalues (as if we
set λN +1 = . . . λN d = 0). In other words, the expansion (7) is used to determine an
approximation to C
Nd
X N
X N
X
λp Qp Q0p λp Qp Q0p λp Qp λp Q0p = ĈN
p p
(10) C= ≈ =
p=1 p=1 p=1
where the number of terms N is chosen for desired accuracy (see below). While ĈN still
has the same dimension as C, its rank is at most N . The notation ĈN is chosen similarly
to that in linear algebra on the reduced SVD and QR decompositions of a matrix. While
we do not reduce dimension, by choosing ĈN we truncate the frequency of variability by
eliminating high frequency components of very small amplitudes given by λj , j > N .
0
√
Now we can write ĈN = D̂N D̂N where D̂N is rectangular. In fact, D̂N = Q̂ ΛN
√
where Q̂ contains only the first N columns from Q (eigenvectors of C), and ΛN is a
diagonal matrix of dimensions N × N . In other words, D̂N has dimensions N d × N .
In this context, we recall for reference that in SVD nomenclature the columns of Q̂
are the nonsilent eigenvectors of the column space pf ĈN .
Now (9) can have the truncated (reduced) form
I.F.1. Tool CoKPCA. The use of hard data and application of conditioning can be
done with either PCA as in [2] or with KPCA as is done in this report.
The only difference is that instead of D in (11) to arrive at (26) one uses D̂. Thus D
is replaced with D̂.
I.G. Examples
Here we provide examples of the use of tools PCA, KPCA, and CoKPCA. Further
example of CoKPCA follow from the use of actual data.
and it delivers realizations shown in Figures I.2, I.3. The parameters σ, ηx , ηy are
provided as input. x, y defined the grid. The parameter renergy defines the way the
number of terms N is calculated, the mean m is provided in randave, and nr is the
number of realizations.
PCA can be used to generate isotropic as well as anisotropic fields. It can also be
used easily to create fields whose principal directions are not aligned with coordinate
axis. (Not shown here).
Example 1: First we show an isotropic example, See Figure I.2.
Example 2: Next we show an anisotropic example varying the degree of variability
as well as correlation lengths in x and y directions. Figure I.3 presents three examples.
Bottom (small variability) was generated with
Task22 PCA(10,1,.1,linspace(0,8,80),linspace(0,4,40),0.95,0,20);
Middle (moderate variability) used
Task22 PCA(10,1,1,linspace(0,8,80),linspace(0,4,40),0.95,0,20);
Bottom (large variability) used
Task22 PCA(1,20,5,linspace(0,8,80),linspace(0,4,40),0.95,0,20);
7
Figure I.2. Three realizations m = 1, 2, 3 using PCA with Gaussian
covariance model, threshold .99 etax=.1 etay=.1 sigma=1 data set size
50x50, as in Example 1. Used Task22 PCA(.1,.1,1,50,50,0.99,0,5).
Columns correspond to m = 1, m = 2, m = 3.
I.G.2. Performance of PCA. Performance of PCA can be seen from these examples:
>Task22_PCA(.1,.1,1,10,10,0.8,0,5);
8
Entering SVD ... ... completed after t=0.00390813
Found 9 eigenvalues out of rank=100
>> Task22_PCA(.1,.1,1,50,50,0.8,0,5);
Entering SVD ... ... completed after t=13.6576
Found 8 eigenvalues out of rank=195
>> Task22_PCA(10,.5,.5,linspace(0,8,80),linspace(0,4,40),0.95,0,10);
Entering SVD ... ... completed after t=27.5473
It is clear time to construct SVD and to output realizations grows very fast. Moreover,
very quickly the problem is out of core.
Task22_CoKPCA(10,3)
Reading data ......finished.Elapsed time is 0.236841 seconds.
Entering kernel SVD with 20 snapshots ... ... time =0.000268184
9
Figure I.4. Examples of KPCA and CoKPCA. Left column: images
of realizations generated with KPCA based on 20 snapshots. Middle:
first snapshot used for conditioning. Right: realization obtained with
conditioning corresponding to the same vector Y (m) as in the left column.
Each row m = 1, 2, . . . 5 corresponds to a different Y (m)
.
In fact, the bulk of time is spent on I/O (input/output) and file processing, with
minimal overhead attributed to the SVD and creation of realizations.
References
[1] Robert J. Adler. An introduction to continuity, extrema, and related topics for general Gaussian
processes. Institute of Mathematical Statistics Lecture Notes—Monograph Series, 12. Institute of
Mathematical Statistics, Hayward, CA, 1990.
[2] V. Vasylkivska M. Ossiander, M. Peszynska. Conditional stochastic simulations of flow and trans-
port with Karhunen-Loeve expansions, stochastic collocation, and sequential Gaussian simulation.
Journal of Applied Mathematics, accepted, 2014.
[3] L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 1997.
11
Figure I.5. Examples of how to use stochastic simulations of field K(x)
to generate phi(x) and Kperm(x). Top left: K(x) generated by PCA as in
Example 1. Top middle: phi(x), generated as in Examples 5-7. Top right:
Kperm lognormal as in Example 5. Bottom left: Kperm(x) generated as
in Example 6. Bottom right: Kperm generated as in Example 7.
12
Risk Reduction of CO2 Storage with Stochastic Simulations
II-1
Risk Reduction of CO2 Storage with Stochastic Simulations
II-2
NETL Task 22 Report. Appendix II
Postprocessing using OPCA
and β should be in the range of K but this is not needed, however may lead to non-binary
results, see below.
We implement this postprocessing procedure for two applications: first, to derive an
“almost” piecewise constant (truncated) image using the function
x−α
(2) f (α, β, γ; x) := g(γ; )
β−α
where
x ≤ γ2 ,
0, γ
x− 2
(3) g(γ; x) = , γ2 < x ≤ 1 − γ2 , .
1−γ
1, x ≥ γ2
Second, we use OPCA further to project the truncated version of f (α, β, γ; x) to the
values between 0, 1 to indicate whether they fall in the categories “0”, “1”, or “somewhere
in between”. This is easily represented by
x−α
(4) r(α, β, γ; x) := h(γ; )
β−α
where
u − γ2
(5) h(γ, u) := min(max(0 + , 0), 1);
1−γ
2
Figure II.2. Truncating and postprocessing applied to data from PCA.
Each figure presents an original value, its truncation using (2), and its
“rock type” using (4). The range of K was [−2.58, 0.702]. The parameters
(α, β, γ) are indicated in the title
These functions f (x) and r(x) are implemented in MATLAB script Task22 OPCA.
Results are shown in Figure II.1. The tool is applied to a vector of values distributed
uniformly in [−5, 5].
II.A.2. Examples of OPCA using Task22 OPCA and data from PCA. Next as
an example we present results of postprocessing on a data set chosen from realizations
produced by PCA as described in Appendix I. We use the values of K(x) aggregated
in a vector and create their truncations and rocktype version. The results are shown in
Figure II.2.
Now we show the application of OPCA to the spatial field K(x) using parameters as
in Figure II.2. Each image is in a separate figure.
II.A.3. Summary of Task22 OPCA tool and outlook. The examples presented
above show several deficiencies apparent for results of OPCA. First, O-PCA (may) pro-
duce fuzzy boundaries. Second, if applied to a porosity data set, it will not maintain the
total porosity. Last, it does not allow naturally to aggregate regions clustered together.
Thus more work needs to be done. We outline some theoretical advances which exploit
OPCA in a way different than [1].
Figure II.4. Data K(x), its truncation k(x) and rock type r(x). Param-
eters used (−2, 0.5, .2) are indicated in titles
Figure II.6. Data K(x), its truncation k(x) and rock type r(x). Param-
eters used (−2., 0, 0.95)are indicated in titles
and
2g(x) = |x − m|2 − (x − µ)T Σ−1 (x − µ) + constant.
(Notice that this generalizes Example 1.) Then
∂g
= xi − mi − (Σ−1 (x − µ))i
∂xi
and
∂ 2g
= δi,j − (Σ−1 )i,j .
∂xi ∂xj
The natural requirement emerging here to guarantee existence of a minimum of g is:
(I − Σ−1 ) positive definite.
(Note: this is manifest as 1 − γ > 0 in Example 1.) Since Σ and thus Σ−1 must be
positive definite, with some matrix algebra, this can be seen as equivalent to requiring
the following.
(6) The eigenvalues of Σ must all be greater than 1.
If (6) holds, then g is minimized for x satisfying (I−Σ−1 )x = m−Σ−1 µ, corresponding
to
x = (I − Σ−1 )−1 (m − Σ−1 µ) = (I − Σ−1 )−1 m − (Σ − I)−1 µ.
Example 4: (Incorporating nearest neighbor interaction). In some cases it is
natural to require that X be ‘smooth’ locally, which roughly (pun intended) corresponds
to trying to minimize the sum of the squares of the differences of adjacent components.
This can accomplished within the framework of Example 3 as follows. For simplicity,
take µ = 0 and let
γ if i = j
(Σ−1 )i,j = ργ if |i − j| = 1
0 otherwise.
(Note: |ρ| < 1.) Then, using i xi xi+1 = i (x2i − 21 (xi − xi+1 )2 ),
P P
X
2g(x) = |x − m|2 − γ(|x|2 + 2ρ xi xi+1 ) + constant
i
2 2γρ X
= |x − m| − γ(1 + ρ)|x| + (xi − xi+1 )2 + constant.
2 i
For ρ > 0, the minimization of g will control the size of i (xi −xi+1 )2 and encourage local
P
agreement of the xi ’s. If ρ < 0, then the minimization will encourage local disagreement,
which should give a more irregular process.
Here both Σ−1 and I − Σ−1 have a relatively simple form. Recalling µ = 0, we again
have
x = (I − Σ−1 )−1 m.
(The values of γ and ρ must be such to guarantee I − Σ−1 positive definite.)
7
References
[1] H. X. Vo and L. J. Durlofsky. A New Differentiable Parameterization Based on Principal Component
Analysis for the Low-Dimensional Representation of Complex Geological Models. Mathematical
Geosciences, 46:775–813, 2014.
8
Risk Reduction of CO2 Storage with Stochastic Simulations
III-1
Risk Reduction of CO2 Storage with Stochastic Simulations
III-2
NETL Task 22 Report. Appendix III
Filtered Processes and Binary Fields
In this section we give some theory behind the simulation of stationary random fields
using a pair of filtering, or convolution, routines in Matlab developed for the project.
III.A. Overview
One method of simulating stationary random fields is through convolution with a
filter, or pattern, matrix. This technique can be used to simulate random fields with a
wide range of covariograms, isotropic and otherwise, and is particularly straightforward
to implement in Matlab. This method can be used to produce snapshots of simulated
fields. Characteristics of Gaussian random fields produced using this method are fairly
well understood, but random fields with a variety of other marginal distributions can
be simulated as well. In particular, random binary fields can be easily produced using
truncation.
The first part of this section focuses on simulation of Gaussian processes followed by
truncation to binary functions via the sign function. First a sketch of the theoretical
development is given, followed by some examples of simulated processes in 2 dimensions
using Matlab code. (The code is included in the appendix.) Extensions to other distri-
butions are briefly outlined. The second part of the section sketches the theory behind
using this simulation technique in tandem with conditioning.
The covariance then only depends on (k, l) and (k 0 , l0 ) through the absolute differences
(|k − k 0 |, |l − l0 |) and is given by the matrix of values
(4) σk,l = Cov(X0,0 , Xk,l ),
with the stationary variance given in particular by
X
(5) σ0,0 = Cov(X0,0 , X0,0 ) = h2i,j .
i,j
That is, the covariance function of the filtered process is itself given by a convolution
based on the filter matrix h. The dimensions of the filter matrix mandate the correlation
length, as σk,l is necessarily 0 whenever k > m or l > n.
simulation at levels -1, 0, 1 respectively. Notice that there is local smoothing of the pro-
cess, producing localized clumping in the truncations, but no overall trends or structure
in the fields.
Example 2. The filter matrix used here gives horizontal layering. The process has a
longer correlation length in the horizontal direction than the vertical direction, as can be
seen in the surface plot of the covariogram. The figure produced is given by using input
2 in Task22 Binaryfield.m. Simulations with vertical and diagonal layering respectively
are produced with input values 3 and 4.
Example 3. The filter matrix used here gives pronounced and coherent ridges in the
Gaussian simulation, corresponding to a clear channeling effect in the binary fields re-
sulting from truncation. The process has a relatively long correlation length, as seen in
the surface plot of the covariogram. The figure produced is given by using input 6 in
Task22 Binaryfield.m. Simulations with less pronounced channeling are produced with
input value 5.
Example 4. This example uses the Matlab function Task22 Gaussianfield.m with a
user selected matrix. Here the rand function in Matlab is used to generate the 2 by 6
matrix h with entries
0.1529 0.3131 0.7266 0.4520 0.8187 0.6281
0.1157 0.3351 0.7509 0.7400 0.9609 0.0484
Typing Xh= Task22 Gaussianfield(h); into the Matlab command window produces
the simulation Xh of a Gaussian random field with filter function (matrix) h, standardized
to have variance 1, together with the plots given in Figure 4. The correlation coefficients
for this process are given in the following matrix.
1.0000 0.7575 0.5448 0.3003 0.1101 0.0249
0.4459 0.3180 0.2613 0.1101 0.0397 0.0018
3
Figure III.2. Example 2. Horizontal layering produced using the hlayer
filter matrix; input 2 in the function Task22 Binaryfield.m.
4
Figure III.4. Example 4. Using the Task22 Gaussianfield.m function.
III.D. Discussion
The examples above barely scratch the surface in illustrating phenomena possible
in stationary random processes produced using filtering techniques. Illustrations for
the three examples described above were all produced using the accompanying Matlab
function Task22 Binaryfield.m. This function accepts as input the integers 1 through
7. Inputs 1 through 6 correspond to filter matrices described in Examples 1 through
3. Input 7 gives a complex geometric patterning in truncations. The Mathlab func-
tion Task22 Gaussianfield.m allows users to input any desired filter matrix in simulating
a 50 by 50 stationary mean 0 Gaussian random field which has been standardized to
have variance 1. This function also produces a figure analogous to that given by the
Task22 Binaryfield.m function. These functions can both be internally modified to pro-
duce simulations based on non-Gaussian random variables.
where
(8) Ze(m) = DB0
Σ−1 0 −1
B (a − µB ) + (I − DB ΣB DB )Z
(m)
In other words, start with Z (m) , an array of i.i.d. standard Gaussian random variables.
Linearly transform using the projection given in (8) to get Ze(m) and then filter Ze(m) .
Note: even in the non-Gaussian setting, this linear projection method gives simulations
satisfying (9). However, joint distributions will not be Gaussian.
In the context of filtering, the entries in the matrix DB are covariances of XB with
Z, and will thus either have value 0 or be equal to hi,j for some i, j. This conditioning
method becomes somewhat simpler to implement if the locations B, of the observations
are relatively scarce spatially. In particular, if the filter dimension is finite, and the
spacing between the observation exceeds the filter dimension, then the observations
themselves will be uncorrelated, giving
ΣB = σ0,0 I|B| .
The matrix DB will also have a relatively simple form, with rows corresponding to
embedded values of the filter matrix h. In one dimension, the conditioning is fairly
straightforward to implement in Matlab. An example in one dimension is illustrated in
Example 4; higher dimensions are somewhat more complicated to simulate and illustrate.
This is an area where further development is needed.
Example 5. This example in one dimension illustrates conditioning filtered processes.
Here the filter matrix is the vector (1,2,1), producing a locally smoothed Gaussian pro-
cess. The 3 pairs of locations and measurements conditioned on are given by: (10,
0.1712), (20, -1.2060), (30, -0.2791). Two additional conditioned simulations are run to
demonstrate the concept.
6
Figure III.5. Example 5. Conditioning a filtered process in one dimension.
7
Risk Reduction of CO2 Storage with Stochastic Simulations
III
Risk Reduction of CO2 Storage with Stochastic Simulations
Cartoon+noise
Cartoon
cartoon or O-KPCA
KPCA realization realization
Figure 4A: Examples of use of SnapPCA to create randomized cartoons. Left column: original cartoon.
Medium: randomized image using PCA. Right: Postprocessed image using OPCA.
IV-1
Risk Reduction of CO2 Storage with Stochastic Simulations
IV-2
Risk Reduction of CO2 Storage with Stochastic Simulations
APPENDIX V: CASE STUDY: DATA FROM SEISMIC INVERSIONS AND ITS USE
WITH PCA-BASED TOOLS
As mentioned in this report, a combination of the tools CoKPCA were used on a porosity dataset
provided by William Harbert from well log inversion. This dataset below is referred to as
HDATA.
The dataset was exported to ASCII and Excel from SEG format and arranged as 51 x 5 1x 1,500
= N = 3 901 500 data points. The dimensions were Xline 1942 to 1092, Inline 119 to 169. The
third dimension was travel time.
The data exhibited some channeled characteristics and the goal was to create stochastic
simulations which reproduced the features found in HDATA.
The dataset was analyzed and several spurious (negative) porosity values were found. A
histogram (Figure 5A) revealed the data was not Gaussian, but rather Laplace (double
exponential).
This histogram and some analysis reveals Mean of 0.0864, Std 0.0143, and Median 0.0864. This
study also identified several interesting features in the middle of the set between layers 451 and
551, and in some other parts as well. There appeared to be evidence of connectivity/channeling.
Since the dataset was huge, bona fide 3-D stochastic simulations could not be applied due to
complexity.
Instead, to generate stochastic realizations, it was assumed stationarity across layers/slices, and
applied CoKPCA using a collection of layers 451–551 as snapshots. This allowed realizations of
porosity phi(x) to be generated. In order to preserve connectivity across layers, this study
selected a few (9=3x3) conditioning points. (See Appendix I for details.)
Next this study generated Kperm(x) using different rock types, since the dataset had
characteristics of shale mixed with reef sandstone. To do so, the OPCA tool was applied with
both a small and with a large parameter gamma (see Appendix II). This allowed for
determination of the rock type r(x) for each point x.
Next, there was an application of Carman-Kozeny model using phi(x) and r(x).
V-1
Risk Reduction of CO2 Storage with Stochastic Simulations
V-2
Risk Reduction of CO2 Storage with Stochastic Simulations
V-3
Risk Reduction of CO2 Storage with Stochastic Simulations
V-4
Risk Reduction of CO2 Storage with Stochastic Simulations
Figure 5E: Top: original porosity data from layers 54, 55, 56. Bottom: Stochastic simulations of these layers
which maintain connectivity.
V-5
Risk Reduction of CO2 Storage with Stochastic Simulations
V-6
Risk Reduction of CO2 Storage with Stochastic Simulations
VI-1
Risk Reduction of CO2 Storage with Stochastic Simulations
VI-2
Sean Plasynski David Alman
Executive Director Executive Director, Acting
Technology Development & Integration Research & Innovation Center
Center National Energy Technology Laboratory
National Energy Technology Laboratory U.S. Department of Energy
U.S. Department of Energy
Brian Wall
John Wimer Associate Vice President for Research
Associate Director Interim Director, Oregon Sea Grant
Strategic Planning Oregon State University
Science & Technology Strategic Plans
& Programs Mark Williams
National Energy Technology Laboratory Research and Engineering Services
U.S. Department of Energy Program Manager
AECOM
Traci Rodosta
Strategic Planning
Science & Technology Strategic Plans
& Programs
National Energy Technology Laboratory
U.S. Department of Energy
Darin Damiani
Program Manager
Carbon Storage
Office of Fossil Energy
U.S. Department of Energy