Best Practices of Assisted History Matching Using Design of Experiments

J191699 DOI: 10.
2118/191699-PA Date: 8-May-19 Stage: Page: 1 Total Pages: 17
Best Practices of Assisted History Matching

Using Design of Experiments
Boxiao Li, Chevron Energy Technology Company; Eric W. Bhark, Chevron Asia Pacific E&P Company; and
Stephen J. Gross (ret.), Travis C. Billiter, and Kaveh Dehghani, Chevron Energy Technology Company
Summary
Assisted history matching (AHM) using design of experiments (DOE) is one of the most commonly applied history-matching techni-
ques in the oil and gas industry. When applied properly, this stochastic method finds a representative ensemble of history-matched res-
ervoir models for probabilistic uncertainty analysis of production forecasts. Although DOE-based AHM is straightforward in concept, it
can be misused in practice because the work flow involves many statistical and modeling principles that should be followed rigorously.
In this paper, the entire DOE-based AHM work flow is demonstrated in a coherent and comprehensive case study that is divided
into seven key stages: problem framing, sensitivity analysis, proxy building, Monte Carlo simulation, history-match filtering, produc-
tion forecasting, and representative model selection. The best practices of each stage are summarized to help reservoir-management
engineers understand and apply this powerful work flow for reliable history matching and probabilistic production forecasting.
One major difficulty in any history-matching method is to define the history-match tolerance, which reflects the engineer’s comfort
level of calling a reservoir model “history matched” even though the difference between simulated and observed production data is not
zero. It is a compromise to the intrinsic and unavoidable imperfectness of reservoir-model construction, data measurement, and proxy
creation. A practical procedure is provided to help engineers define the history-match tolerance considering the model, data-
measurement, and proxy errors.
Introduction
DOE and Its Applications in Subsurface Modeling. DOE helps people understand a complex process (or system) by conducting as
few experiments as possible while satisfying a user-defined accuracy requirement. In each experiment, the input variables are changed
methodically and the corresponding changes in the responses are observed. The specific design that regulates how the input variables
are changed in each experiment is called the experimental design. DOE has a wide range of applications in natural and social sciences
and engineering, such as physical (or chemical and biological) experiments, clinical trials, manufacturing tests, marketing pilots, and
policy making, where the cost of conducting even one experiment can be high.
In general, DOE achieves the following goals:
• Perform sensitivity analysis to quickly evaluate the effect of input variables on the response variables, and identify the most influ-
ential input variables (so-called heavy hitters or big hitters).
• Understand the relationship between the input and response variables quantitatively within a user-desired accuracy. A proxy (also
called response surface or surrogate model), which is much less expensive to evaluate than performing actual experiments, can be
created to approximate the relationship.
• Evaluate the probability distributions of the response variables using the uncertainties of the input variables by performing Monte
Carlo simulations on the proxies. This allows the estimation of the response-variable percentiles (e.g., the 10th, 50th, and 90th, or
P10, P50, and P90, respectively) that are needed for planning and decision making.
In subsurface modeling, DOE has proved its value in both static Earth modeling and dynamic reservoir simulation. It has been
widely applied in the petroleum industry for uncertainty analysis, history matching, and probabilistic forecasting of reservoir perform-
ance, and it underpins the decision-making processes in asset development and reservoir management.
History Matching. The modeling of subsurface fluid flow and the forecasting of reservoir production are usually performed using res-
ervoir simulators running dynamic models. Typical input variables include the static Earth model or the individual properties that con-
stitute the Earth model, the properties of the in-situ fluids, rock properties such as compressibility, rock/fluid properties such as relative
permeability and capillary pressure curves, the initial and boundary conditions of the reservoir (including aquifer models and the loca-
tion and trajectories of the wells), and the schedule of production and injection wells and their control strategies. Common response var-
iables include cumulative production, water cut (WCT), and the oil-production rate (OPR) and gas-production rate (GPR). Except for
well conditions, which are controllable and are often known to engineers, almost all input variables have a high degree of uncertainty
because of the limited spatial and temporal data collected.
For an oil/gas field that has been on production for some time (i.e., a brownfield), the production history can be used to reduce the uncer-
tainty in reservoir-model input variables. The term “history matching” refers to the practice that uses the historical production data of a res-
ervoir to deduce the reservoir properties that are not practical to measure directly. It works by identifying the reservoir properties that
enable the simulated production performance to match the observed history. Once history matched, the reservoir models will be used to pre-
dict production performance. History matching is an “ill-posed” problem: The relationship between input and response variables is not one
to one, and is highly nonlinear. In other words, different combinations of reservoir-property values might lead to the same production his-
tory. Although it is relatively easy to obtain one history-matched solution, finding all the possible solutions is practically infeasible.
History-matching methods are categorized into two types: deterministic and stochastic. Deterministic methods obtain one or some-
times a few history-matched models, often through manual trial and error (Williams et al. 1998) or by means of a deterministic optimi-
zation algorithm using adjoint (Chen et al. 1974; Li et al. 2003) or streamline-based sensitivities of responses to inputs (Vasco et al.
1999). In general, only a small fraction of all plausible history-matched models will be identified, and the likelihood of each model in a
Copyright V
C 2019 Society of Petroleum Engineers
This paper (SPE 191699) was accepted for presentation at the SPE Annual Technical Conference and Exhibition, Dallas, 24–26 September 2018, and revised for publication. Original
manuscript received for review 1 November 2018. Revised manuscript received for review 15 January 2019. Paper peer approved 18 January 2019.
2019 SPE Journal 1
ID: jaganm Time: 17:28 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

J191699 DOI: 10.2118/191699-PA Date: 8-May-19 Stage: Page: 2 Total Pages: 17
production forecast cannot be determined. In other words, deterministic history matching cannot identify whether a model provides a
P10, P50, or P90 prediction because such information needs to be derived from the complete set (or a representative ensemble) of
history-matched models. This is a major disadvantage because knowledge of a production-forecast likelihood is crucial to the decision-
making process. Even for evolutionary-algorithm-based AHM (Schulze-Riegert et al. 2002; Cheng et al. 2008), which can produce a
suite of history-matched models, this suite cannot be guaranteed to represent the probability distribution of all possible history-matched
solutions. In contrast, stochastic history matching identifies an ensemble of models that covers the uncertainty space effectively and
attempts to represent the distribution of all history-matched solutions. As a result, the likelihood of production forecasts from history-
matched reservoir models can be estimated and applied in probabilistic decision making.
DOE-Based AHM. Because DOE helps people understand the relationship between input and response variables within a user-desired
accuracy, proxies for actual simulation models (which are inexpensive to evaluate) can be created to approximate the relationship. If
the proxies are reasonably accurate, one can run Monte Carlo simulations using these proxies to obtain a large and representative suite
of history-matched solutions. Therefore, DOE-based history matching is considered a stochastic method and is one of the most com-
monly applied AHM techniques in the oil and gas industry (Castellini et al. 2004; King et al. 2005; Schaaf et al. 2009; Van Doren et al.
2012; Bhark and Dehghani 2014).
Although DOE-based AHM is conceptually straightforward to apply, the work flow involves many statistical and modeling princi-
ples that should be followed rigorously to obtain reliable history-matched solutions. Although abundant DOE publications are available
in the literature, we observed that many engineers still apply DOE incorrectly to some degree. The errors are typically caused by incom-
plete understanding, incorrect application, or impractical expectation of the underlying principles.
A common misperception is that a DOE work flow will be successful if the experimental design and the proxy model are chosen
wisely. Although these two elements are indeed important, a successful DOE application requires every piece of the work flow to be
executed properly. Mistakes do occur before experimental designs are selected and after proxies are constructed. In addition, currently
a DOE work flow is often performed by computer software. Although this improves efficiency, the danger is that engineers might rely
excessively on computers to learn the reservoir for them. A DOE work flow is never a replacement for expert knowledge and collabora-
tion, and all the results should be evaluated with critical geological and engineering judgment. The “press a button, wait for finish, trust
the outcome” attitude might introduce significant error to the analysis.
One major difficulty in any history-matching method is to define the history-match tolerance, which reflects the engineer’s comfort
level of calling a reservoir model “history matched” even though the difference between simulated and observed production data is not
zero. Such a tolerance is needed because of four intrinsic and inevitable limitations of history matching:
1. It is beyond one’s capability to enumerate all possible uncertainties and simulate all physical processes in a reservoir.
2. Numerical errors always exist, no matter how accurate the simulator might be.
3. Data measurement almost always contains error because of noise or bias in gauges and flowmeters.
4. Proxies are only approximations of the actual simulator.
The first two limitations are referred to as “model error” in this paper, the third as “data-measurement error,” and the fourth as
“proxy error.” Model and data-measurement errors are pertinent to any history-matching method, and proxy error is pertinent to DOE-
based AHM. Unfortunately, determining the history-match tolerance is unavoidably subjective. Without a clear work flow, subjectivity
can easily turn into arbitrariness.
This paper aims to assist reservoir-management engineers to understand the DOE-based AHM work flow and apply it correctly in
reliable history matching and probabilistic production forecasting. The paper extends our earlier work (Bhark and Dehghani 2014) in
two ways, which define the structure of this paper. First, the entire DOE-based AHM work flow is demonstrated in a case study that is
derived from a hypothetical reservoir. The case study offers a coherent and comprehensive illustration of the best practices of the AHM
work flow, which is divided into seven stages: problem framing, sensitivity analysis, proxy building, Monte Carlo simulation, history-
match filtering, production forecasting, and representative model selection. Second, a practical procedure is provided to help engineers
determine the history-match tolerance in consideration of the model, data-measurement, and proxy errors.
Although this paper focuses entirely on DOE-based AHM, the many best practices (particularly in problem framing, history-match
filtering, production forecasting, and representative model selection) are applicable to other stochastic history-matching methods such
as the ensemble Kalman filter (Aanonsen et al. 2009; Oliver and Chen 2011), ensemble smoother (van Leeuwen and Evensen 1996;
Skjervheim and Evensen 2011), and ensemble smoother with multiple data assimilation (Emerick and Reynolds 2013).
Case Study
The chosen case study demonstrates the entire work flow of DOE-based AHM using a hypothetical reservoir. This reservoir was an example
in our earlier work (Bhark and Dehghani 2014) but has been modified to enable a coherent and comprehensive illustration of the
best practices.
Problem Framing. The hypothetical Mermaid Field (Fig. 1) is an oil field in the deepwater Gulf of Mexico. Production began in
January 2000. As of January 2010, the field was under secondary recovery and staged field developments. The objective of the study
was to perform a probabilistic DOE-based history match to better understand the reservoir properties and evaluate the uncertainty of the
production forecast. The findings would help to determine a value-adding development strategy in the future.
Before DOE-based history matching, a manual history matching was performed, reflecting our best technical estimate of reservoir-
property uncertainties. We found that conducting a manual history match in advance is essential to the success of DOE-based history
matching. We manually adjusted the reservoir properties in a structured approach, performed simulation runs, and attempted to match
the simulated production curves with the historical curves as closely as possible. The phrase “structured approach” means that manual
history matching begins at a global scale, then progresses to flow units or layer groups, then to individual layers, and finally to individ-
ual wells. This hierarchy was performed for a pressure match first and saturation match later. Details of the structured approach can be
found in the seminal paper by Williams et al. (1998).
The key purpose of manual history matching is to form an understanding of the reservoir model. This understanding benefits DOE-
based history matching because it will not only facilitate the characterization of input variables, but also help to define the history-
match-error tolerance. Both aspects require a certain degree of subjectivity. The danger of DOE-based history matching, or of any
computer-automated/assisted work flow, is that we might rely excessively on an algorithm to learn the reservoir for us. The DOE work
flow is never a replacement for human expertise. Because subjectivity is unavoidable in history matching, we need to develop a good
2 2019 SPE Journal

sense of reservoir behavior, and the best way to achieve this is through structured manual history matching. Nevertheless, we observed
that overdoing manual history matching might result in an anchoring bias to the DOE-based work flow; therefore, we try to seek a bal-
ance. The objective of manual history matching is not to find one exact solution, but to explore different solution possibilities for an
extensive search using the DOE-based work flow.
MER13ST
Producer
Water injector
MER21ST2 MER07 MER07ST2 MBOCREG 1
Open node MER06
Closed node
MER02ST MBOCREG 2
Shut-in MBOCREG 3
MBOSEH
MER03ST2
MER03
MER12 MER19
MER11 MER19OH
MER12OH
MER19ST2 MBAQ5PE
MBAQ5SW
Fig. 1—Material-balance regions (indicated by different colors) of the hypothetical Mermaid Reservoir (Bhark and Dehghani 2014).
The response variables of the study are history-match errors (denoted by eHM ) of the metrics shown in Table 1. The metric name is
composed of two parts. The first part is the well name if the data are taken from wells, or FIELD if the data are at field scale. The
second part is the property name. SWP stands for shut-in-well pressure, WPR for water-production rate, and OPC/WPC/GPC for cumu-
lative oil/water/gas production, respectively. The methodology of DOE-based AHM is to use DOE to find the relationship between the
response variables (eHM ) and the input variables (uncertain reservoir properties), and then to solve for the reservoir properties associated
with the smallest eHM values. As listed in Table 1, there are 22 eHM objective functions, each corresponding to one history-match-error
metric. Note that having too many response variables increases the difficulty of AHM. This is a general challenge of all AHM tech-
niques. Engineering judgment should be applied, and only the response variables that are most relevant to the frame and scope of the
AHM study should be included.
Well-Level Data: SWP Well-Level Data: WPR Field-Level Data

1 MER03ST2_SWP 12 MER03ST2_WPR 20 FIELD_OPC
2 MER03_SWP 13 MER03_WPR 21 FIELD_WPC
3 MER06_SWP 14 MER06_WPR 22 FIELD_GPC
4 MER07ST2_SWP 15 MER07ST2_WPR
5 MER07_SWP 16 MER07_WPR
6 MER11_SWP 17 MER11_WPR
7 MER12_SWP 18 MER13ST_WPR
8 MER12OH_SWP 19 MER21ST2_WPR
9 MER13ST_SWP
10 MER19_SWP
11 MER21ST2_SWP
Table 1—History-match-error metrics of the Mermaid Reservoir (MER) DOE-based history-matching study.
The formula used to compute eHM for each of the metrics in Table 1 is
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
Xn h i.Xn
obs sim 2
eHM ¼ i¼1
w i ðd d Þ i i¼1 i
w ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð1Þ
where dobs stands for observed data, dsim stands for simulated data, and i is each representative data point along the time axis
(i ¼ 1; 2; …; n, where n is the total number of representative data points). The weight wi for each representative data point is, by default,
always assumed to be equal to unity. However, if a certain period of the historical data (such as more-recent data) requires a stricter
history-match quality than other periods, the wi of this period can be assigned larger values. Further, for some data types, real-time sen-
sors recording frequently resulted in a large amount of data points that were also noisy. In this case, it was necessary for us to select and
use representative data points along the time axis. However, some data fluctuations within a trend represent actual reservoir behavior,
so caution was taken when removing noise. In our case, representative data points were selected through visual inspection. To be cau-
tious, the result was confirmed with the entire asset team. Besides removing noise, we corrected any known bias in the historical
data measurement.
An alternative to this approach is to combine all the 22 history-match-error metrics into one objective function using a weighted
sum. We did not follow this alternative because of its two major drawbacks. First, determining the weights is subjective and does not
fully consider the tradeoffs among different objective functions. Second, the combined error metric loses its physical meaning, thus
making it less straightforward to understand using engineering judgment.
2019 SPE Journal 3

The water-injection rates (WIRs) and OPRs were used as the historical well controls for the injectors and producers, respectively.
The GPR of each producer was not included as a history-match-error metric because the reservoir remained above the bubblepoint
throughout the production period. The three field-scale error metrics, FIELD_OPC/WPC/GPC, were used as confirmation after complet-
ing the well-level history matching.
We (an integrated subsurface team with backgrounds in geology, geophysics, petrophysics, drilling and completions, reservoir, pro-
duction, and facility engineering) determined the list of reservoir-model input variables for the DOE-based history match, along with
their uncertainty ranges and distributions, as shown in Table 2. Note that some variables are simplified or desensitized compared with
the actual study. The available measurement and analog data, and the experience from manual history matching, were considered. In
the list, the regional property multipliers each correspond to one of the six material-balance regions that in total comprise the complete
reservoir volume (see Fig. 1). There are two downdip aquifer regions named MBAQ5PE and MBAQ5SW. The remaining four regions
are in the oil column.
Parameter Symbol Low Midrange High Type Distribution

1 Oil/water contact OWC OWC 1 OWC 2 OWC 3 Continuous Uniform
2 Rock compressibility RC RC 1 RC 2 RC 3 Categorical 0.1/0.6/0.3
Water/oil relative permeability
3 WEXP WEXP 1 WEXP 2 WEXP 3 Continuous Uniform
Corey parameter
4 KX_AQ5PE 0.25 1.0 3.0 Continuous Triangular
Aquifer-permeability multiplier
5 KX_AQ5SW 0.25 1.0 3.0 Continuous Triangular
6 KX_OCSEH 0.25 1.0 3.0 Continuous Triangular
7 KX_OCREG1 1.0 2.0 3.0 Continuous Triangular
Oil-column-permeability multiplier
10 PV_AQ5PE 0.5 5.0 6.0 Continuous Triangular
Aquifer-volume multiplier
11 PV_AQ5SW 0.8 1.0 1.5 Continuous Triangular
12 PV_OCSEH 0.8 1.0 1.5 Continuous Triangular
13 PV_OCREG1 0.8 1.1 1.5 Continuous Triangular
Oil-column-pore-volume multiplier
16 Dummy variable DUMMY –1 0 1 Continuous Uniform
Table 2—Mermaid Reservoir model-input variables and their uncertainty ranges and distributions. The low case is the minimum for all
distributions; the midrange case is the mean for uniform distributions and mode for triangular distributions; and the high case is the maximum
for all distributions.
The water/oil relative permeability curves of the reservoir are

WEXP
Sw Swir 1 Sw Sorw OEXP
krw ¼ krwro ; krow ¼ krocw : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð2Þ
1 Swir Sorw 1 Swir Sorw
The rock is water-wet, and the wettability is adjusted by the Corey parameters WEXP and OEXP. We assumed that these two para-
meters are strongly dependent and follow the relationship: OEXP ¼ 5–WEXP (the model was simplified to demonstrate the treatment
of strongly dependent variables). This relationship was hardwired in the reservoir-simulation input files, and OEXP was removed from
the list of input variables. All other input variables in the list are independent; otherwise, a variable correlation matrix would have
been determined.
When defining the uncertainty range (i.e., low, midrange, or high) of an input variable, the low and high levels were determined
before the midrange level. Otherwise, the low and high levels tend to be anchored around the midrange level, and thus the uncertainty
might be underestimated (resulting in so-called anchoring bias). We applied transformation when an input-variable distribution was
severely skewed.
Note that a dummy variable is included in the input variable list, as shown in the last row of Table 2. This dummy variable was
designed to have no influence on the simulated responses. Its sole purpose was to test if the DOE work flow was executed correctly. If a
false signal showed this variable to be influential, it indicated that the work flow was executed incorrectly.
We did not intend to have too many or too few input variables in this study. Too many input variables inflate the dimension of the
problem, making it impractical to solve efficiently. In addition, too many input variables might stress the experimental design’s
capacity, leading to suboptimal results. Most DOE studies have at maximum 20 to 30 input variables; otherwise, the computational
burden for Earth-modeling and reservoir-simulation software might become too severe. On the other hand, too few input variables
might lead to superficial analysis. We also considered the constraints in time, computational resources, and software capability when
determining the input variables.
The limitation in the amount of input variables is often regarded as the greatest disadvantage of DOE-based AHM compared with
gradient-based optimization using adjoint sensitivities (Chen et al. 1974; Li et al. 2003) and ensemble-based AHM methods, such as the
ensemble Kalman filter, ensemble smoother, or ensemble smoother with multiple data assimilation (van Leeuwen and Evensen 1996;
Aanonsen et al. 2009; Oliver and Chen 2011; Skjervheim and Evensen 2011; Emerick and Reynolds 2013), which do not have a theo-
retical upper limit on the number of input variables. However, often DOE-based AHM is still applied in practice for two reasons. First,
ensemble-based and gradient-based methods have their own limitations and assumptions, such as a Gaussian assumption, difficulty in
handling categorical variables (for ensemble-based methods), and convergence to local optima only (for gradient-based methods).
4 2019 SPE Journal

Second, the constraint on the number of input variables forces engineers to make scrupulous selections through geological and engi-
neering analysis of the reservoir, an effort that cannot and should not be replaced by computers. Besides, the number of input variables
can often be reduced successfully to the level suitable for DOE-based AHM using parameterization techniques. Parameterization cap-
tures the most salient features of the data in a lower-dimensional space, removing variable autocorrelation or redundancy by implicit or
explicit grouping. This allows fine-scale reservoir heterogeneities to be considered in DOE-based AHM. Common parameterization
techniques include geostatistical approaches (e.g., controlling heterogeneity by the variogram model, correlation length, training image,
and geobody size and orientation), regional porosity/permeability multipliers, pilot points (LaVenue et al. 1995; Doherty 2003),
principal-component analysis, and other domain-transformation methods (Gavalas et al. 1976; Jafarpour and McLaughlin 2009). If
reservoir-property adjustments at finer scales are needed after DOE-based AHM, a second AHM technique, such as streamline-based
(Datta-Gupta and King 2007; Bhark et al. 2012; Watanabe et al. 2017) or adjoint-based (Van Doren et al. 2012; Joosten et al. 2014),
can be used to refine the quality of a history match.
Sensitivity Analysis. The objective of sensitivity analysis is to evaluate the effect of input variables on the response variables (eHM ),
and to identify the most influential input variables (so-called heavy hitters or big hitters). The experimental designs used for sensitivity
analysis are called screening designs. They apply a reasonably limited number of modeling runs to infer the influence of each input vari-
able and input-variable interaction (so-called main effects and interaction effects) on the response variables.
We used the one-variable-at-a-time design to create tornado plots (Fig. 2). Note that the tornado plot has many limitations. It has a
poor coverage of the uncertainty space, its analysis is anchored in midrange values, and it cannot test the interaction between input vari-
ables. We only applied it as a first-blush sensitivity analysis to gain a quick understanding of the behavior of each input variable. In
addition, we used it to check if all low, midrange, and high values of the input variables are passed successfully from the DOE software
to the reservoir simulator. If an input variable is missing a bar in the tornado plot, it means either that this variable is not influential or
that some problem exists in the software communication. We also used the tornado plot to confirm that no false signal was detected
from the dummy variable (i.e., that the experimental design was executed correctly).
eHM : MER12_SWP eHM : MER13ST_SWP

KX_OCREG2 WEXP
PV_OCREG1 KX_OCREG1
KX_AQ5PE OWC
PV_OCREG2 PV_OCREG2
PV_AQ5PE PV_OCREG1
RC PV_OCREG3
KX_OCREG3 KX_OCREG3
OWC KX_OCREG2
KX_OCREG1 PV_AQ5PE
KX_AQ5SW KX_AQ5SW
PV_AQ5SW KX_AQ5PE
PV_OCREG3 Low RC Low
WEXP High KX_OCSEH High
KX_OCSEH PV_AQ5SW
PV_OCSEH PV_OCSEH
DUMMY Baseline value: 20.46 DUMMY Baseline value: 203
Fig. 2—Tornado plots of two response variables (eHM of MER12_SWP and MER13ST_SWP) of the Mermaid Reservoir. Tornado
plots of all the response variables were reviewed.
Subsequently, a folded Plackett-Burman design (40 runs in total) was performed to evaluate the main effects of the input variables
(Schmidt and Launsby 1989; Montgomery 2008). This generates more-accurate results than a tornado plot.
All 40 Plackett-Burman simulations finished successfully. The simulated production curves were plotted and compared with the his-
torical data (Fig. 3). There are no apparent outliers, and the distribution of the curves does not show any anomalies (e.g., too wide, too
narrow, skewed, or multimodal). The historical production curves are contained within the spread of the simulated curves, and the
shapes of the simulated and historical curves are similar. If either of the two preceding conditions are not satisfied, obtaining a history-
matched solution is unlikely.
MER03 MER03
WPR
SWP
Time Time
Fig. 3—Examine simulation results and compare with historical data. The gray curves are all the simulation results. The green
curve is the result of one of the simulations. Purple squares represent the historical data. All the available historical data
were reviewed. Here, only SWP and WPR for Well MER03 are shown.
Commonly used and robust sensitivity-analysis methods include t-test and sampling-based approaches. The t-test results, typically
compiled in a Pareto chart, compute the statistical significance of each term in a regression equation such as a linear or linear-
interaction (bilinear) equation. The test is widely applied and well-recognized in the industry. A significance level p should be specified,
2019 SPE Journal 5

which represents the possibility of mistakenly labeling an insignificant input variable as a heavy hitter. Typically, p is set as 0.05.
Strictly speaking, an input variable not identified as a heavy hitter does not suggest that it is immaterial. It only means that no statement
can be made with respect to the statistical significance of this variable within the uncertainty range that is examined. Many regression-
analysis texts cover the t-test (Chatterjee and Hadi 2015), and therefore the theory is omitted here. In contrast to the t-test, sampling-
based approaches directly work on the results of the modeling runs without running regression. The rank-correlation chart and common
mean F-test are two popular methods. Helton et al. (2006) provides a review of sampling-based sensitivity-analysis methods. Yeten
et al. (2005), Fenwick et al. (2014), and Sarma et al. (2015) also discuss applications in the petroleum industry.
In this study, our selected folded Plackett-Burman design can evaluate only the main effects of the input variables. Therefore, it is
appropriate only to apply linear regression (i.e., one term per input in the regression equation) for sensitivity analysis using the t-test,
Xk
eHM ¼ b0 þ b x þ ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð3Þ
i¼1 i i
where k is the number of input variables (k ¼ 16), xi is each input variable, i ¼ 1; 2; …; k, eHM is the response variable, b is the regres-
sion coefficient, and is the regression error term (also called the residual). bi xi represents the main-effect term of each input variable i.
In total, 22 linear proxies were built for the 22 eHM metrics. If additional time and resources are available for sensitivity analysis, an
improvement is to use an experimental design that analyzes both the main effects and two-way interactions, paired with a linear-
interaction (bilinear) regression equation for the t-test,
Xk Xi¼k Xj¼k
eHM ¼ b0 þ bx þ
i¼1 i i i¼1
b x x þ ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ð4Þ
j¼iþ1 ij i j
where bij xi xj represents two-way-interaction terms. Suitable experimental designs include the fractional factorial design (resolution V,
256 runs in total) and the D-optimal design using Eq. 4 as the generating equation. Eq. 4 requires at least Nmin ¼ kðk 1Þ=2 þ k þ 1 ¼
137 runs for a curve fit, and the number of D-optimal design runs should be at least 1:5Nmin ¼ 206. Because several useful DOE text-
books are available (Schmidt and Launsby 1989; Montgomery 2008), it is unnecessary to describe the theory of each experimental
design here.
The t-test results, summarized in Pareto charts, were plotted and reviewed for all 22 response variables (Fig. 4 shows two of these
Pareto charts). Heavy hitters were determined using a significance level p of 0.05. The dummy input in the t-test, DUMMY, was used
to detect the noise of the test. DUMMY was observed to have a small influence and was not identified as a heavy hitter for any response
variable. Note that although it is normal for the t-test to show a small dependency of each response to a dummy variable (i.e., noise),
the tornado plot should show zero influence. Juxtaposing all 22 Pareto charts helped us compare the heavy hitters for each response
variable. Different sensitivity-analysis methods (e.g., tornado plot, t-test, and sample-based method) were compared for each response
variable. Major inconsistencies, if discovered, were discussed and understood. We examined the sensitivity-analysis results closely and
discussed them extensively with subject-matter experts (SMEs). Finally, we decided to remove five insignificant input variables from
further study: KX_AQ5SW, PV_AQ5SW, KX_OCSEH, PV_OCSEH, and RC, because they were not shown to be statistically signifi-
cant for all 22 response variables. The dummy variable was also removed. Variable removal was achieved by fixing their values at their
midrange levels. We emphasize that removing nonheavy hitters requires engineering judgment. In this case study, we were able to
physically make sense of why these parameters have little effect on the response.
eHM : MER06_SWP eHM : MER06_WPR

KX_OCREG1 PV_OCREG1
PV_OCREG1 WEXP
WEXP OWC
KX_OCREG2 KX_OCREG1
OWC PV_OCREG2
Significance limit: p = 0.05
PV_OCREG2 KX_OCREG3
PV_AQ5SW
Significance limit:
KX_OCREG3
PV_AQ5PE KX_AQ5PE
p = 0.05
KX_AQ5PE KX_OCREG2
RC PV_AQ5PE
PV_OCSEH DUMMY
Negative t-statistics RC Negative t-statistics
PV_AQ5SW
DUMMY Positive t-statistics KX_AQ5SW Positive t-statistics
KX_AQ5SW PV_OCREG3
KX_OCSEH PV_OCSEH
PV_OCREG3 KX_OCSEH
Fig. 4—Results of a t-test for two of the 22 response variables (eHM of MER06_SWP and MER06_WPR) with significance level
p 5 0.05. The t-test results were examined for all 22 response variables.
Proxy Building. The objective of proxy building is to apply DOE to study the relationship between input variables (reservoir-model
parameters) and response variables (eHM ) more thoroughly than through sensitivity analysis. Accordingly, a more comprehensive exper-
imental design is used to sample the uncertainty space, and proxy models are built to approximate the relationship between input and
response variables. The proxy models will be applied later in the Monte Carlo simulation. The experimental designs used for proxy
building are called modeling designs. Common choices include the Latin hypercube design (Burkardt 2005) (or space-filling design),
the Latin hypercube design combined with the D-optimal design, the D-optimal design (using a linear, bilinear, or full-quadratic for-
mula), the Box-Behnken design, and the central-composite design. Schmidt and Launsby (1989) and Montgomery (2008) provide
theoretical details.
In this study, a Latin hypercube design of 500 runs was used as the experimental design for proxy building. We randomly selected
70% of the runs and applied them for proxy training, and the remaining 30% were applied for blind testing. Alternatively, we could
have used 10-fold cross validation for blind testing (James et al. 2013). Scatter plots (Fig. 5) were reviewed to confirm that the Latin
hypercube design sampled the uncertainty space evenly without apparent patterns and clusters. The scatter plots also confirmed that the
blind-test runs were selected randomly.
The simulated production curves were plotted and compared with the historical data. This is the same practice as shown in Fig. 3,
and thus the plots are omitted here. A small fraction of producers in a small fraction of simulation runs failed to honor the rate controls
6 2019 SPE Journal

because the minimum bottomhole-pressure (BHP) limit was hit (the reservoir simulator requires a minimum BHP limit be specified
when setting up rate controls for producers). These simulation runs were examined and rerun, and the problems were corrected. No
apparent outliers were found in the production curves, and the distribution of the curves does not show any anomaly (e.g., too wide, too
narrow, skewed, or multimodal). The historical production curves are contained within the spread of the simulated curves, and the
shapes of the simulated and historical curves are similar.
KX_OCREG3
PV_OCREG3
KX_OCREG2 PV_OCREG2
Fig. 5—Scatter plots of different input variables. Gray points are training runs, and blue points are blind-test runs. Scatter plots
confirm that the Latin hypercube design samples the uncertainty space evenly, and that the blind-test runs were
selected randomly.
Parametric and nonparametric proxies are two common proxy types. Parametric proxies include linear, linear-interaction (bilinear),
quadratic (contains quadratic and linear terms), full-quadratic (contains quadratic, linear, and cross terms), and best-subset regression.
Nonparametric proxies include Kriging (Deutsch and Journel 1992; Yeten et al. 2005), thin-plate spline (Li and Friedmann 2005),
neural network, random forest, support-vector regression, and gradient boosting. Hastie et al. (2009) provides theoretical details.
In this study, four types of proxies were chosen for each eHM metric: linear-interaction (bilinear), full-quadratic, thin-plate spline,
and neural-network proxies. The blind-test results of all four proxies were compared for each eHM metric. The root-mean-squared errors
(RMSEs) and the correlation coefficients (R) between the proxy-estimated response values and the actual response values were com-
puted. The scatter plots with a 45 line and the residual plots were also reviewed (Figs. 6a and 6b). The spline proxy outperformed all
other proxies for each eHM metric in that the correlation coefficient (R) was much higher and the RMSE was much lower for the blind-
test runs, although full-quadratic and neural-network proxies came close to spline accuracy for some eHM metrics. R and RMSE repre-
sent the proxy accuracy (or error) when the proxy is used in prediction. Therefore, they are important criteria in determining the proxy
quality. The reason that the spline proxy works best is probably that the true response surface is severely sinuous, as shown in Fig. 8c of
Bhark and Dehghani (2014). Because the sinuosity represents real reservoir behavior rather than noise, connecting all training data
points with a thin-plate spline proxy might be a good strategy. Although the residual plots of the spline proxies were not ideal for some
eHM metrics (ideally, the training and blind-test points should be distributed above and below the zero-residual line in a totally random
manner), no strong nonrandom patterns such as “smiley face” were found; otherwise, the proxy is poor and should not be used. Fig. 6c
illustrates an example of the “smiley face” pattern. Finally, note that in this case study the spline proxy happened to be the most accu-
rate for all eHM metrics. However, in general, the most accurate proxy type should be selected individually per response variable. For
example, if the spline proxy is best for Metric 1 but the linear-interaction proxy is best for Metric 2, then the spline proxy should be
used for Metric 1 and the linear-interaction proxy used for Metric 2.
eHM : MER06_SWP (Spline) eHM : MER06_SWP (Spline)

Proxy Estimate
0
Residual
Residual
Actual Proxy Estimate Proxy Estimate

(a) (b) (c)
Fig. 6—Plots used in proxy blind testing: (a) scatter plot with a 458 line comparing proxy-estimated response values with actual
simulator-computed response values; (b) residual plot. Such plots were examined for all response variables. Gray points are train-
ing runs, and blue points are blind-test runs. (c) An undesired “smiley face” residual-plot pattern.
Some proxies contain hyperparameters. For example, the hyperparameters of a neural network include the number of hidden layers
and the number of hidden-layer nodes. Adjusting them can improve proxy accuracy but can also lead to overfitting. In our case, we
used fixed hyperparameters for the neural network (one hidden layer with four nodes) and did not adjust them. Had we adjusted them,
we would have randomly selected 60% of the simulation runs to train the neural network, 20% to find the best hyperparameter values,
and the remaining 20% to blind test the proxy accuracy. Alternatively, we could have randomly selected 70% of the simulation runs to
train the neural network, found the best hyperparameter values through 10-fold cross validation, and then used the remaining 30% for
blind testing. All hyperparameters should be documented clearly.
2019 SPE Journal 7

Although the spline proxy outperformed the others, its accuracy was still poor for some eHM metrics. The lowest R was less than 0.7.
In fact, it is well-known that the relationship between input variables (reservoir-model parameters) and response variables (eHM metrics)
for history-matching problems can be highly nonlinear and nonmonotonic. Therefore, it is within expectation that certain eHM metrics
cannot be well-characterized by proxies. We attempted a few other proxies with unsuccessful results and decided to add 500 runs to the
original 500-run Latin hypercube design, for 1,000 runs in total. Simulation runs were completed, and the proxies rebuilt. Again, the
spline proxy outperformed all other proxy types for each eHM metric. The R and RMSE of the spline proxies are summarized in
Table 3. The proxies for which R 0.8 and < 0.85 are marked by an asterisk (*), and those for which R < 0.8 are marked by a double
asterisk (**). As observed from Table 3, the accuracy of some proxies remains unsatisfactory.
MER MER MER MER MER MER MER MER MER MER MER
03ST2 03 06 07ST2 07 11 13ST 21ST2 12 12OH 19
R
* * ** *
0.81 0.87 0.81 0.91 0.88 0.94 0.93 0.94 0.93 0.72 0.84
SWP * * ** *
RMSE 73 20 69 98 25 25 77 88 7 113 14
R
* *
0.82 0.83 0.96 0.95 0.94 0.95 0.94 0.94 – – –
WPR * *
RMSE 331 90 156 144 117 194 284 574 – – –
*
The proxies for which R ≥ 0.8 and < 0.85.
**
The proxies for which R < 0.8.
Table 3—R and RMSE of the best proxy for each eHM metric.
We had two choices. The first was to discard poor proxies and not use them in history-match filtering. The second was to improve
proxy accuracy to the maximum extent possible (even if it remains unsatisfactory), and to retain them for Monte Carlo simulation.
Monte Carlo simulation results are later used for history-match filtering, which is in fact designed to consider proxy errors. We chose
the second option.
Monte Carlo Simulation. There were 20,000 Monte Carlo samples drawn randomly from the a priori distributions (Table 2). They
were evaluated by spline proxies for all eHM metrics. If the input variables are correlated with each other (assuming linear dependency),
a correlation matrix (determined during problem framing) should be honored in Monte Carlo sampling. The sampling results were
summarized as shown in Table 4. We compared the input-variable distributions of the Monte Carlo samples with the a priori distribu-
tions. For each input variable, the two distributions were alike. To test whether the response-variable distributions are stabilized in Monte
Carlo simulation, we repeated Monte Carlo simulation using 30,000 samples and compared the 30,000-sample and 20,000-sample
distributions. We observed that the response-variable distributions were stabilized at 20,000 Monte Carlo samples. These checks con-
firmed that the number of Monte Carlo samples was adequate; otherwise, more samples should be added.
Input Variables Response Variables Evaluated by Proxies

Sample eHM: MER03ST2_SWP eHM: MER21ST2_WPR
No. KX_AQ5PE … KX_OCREG3 (Spline) … (Spline)
1 0.697 … 0.872 77.421 … 1,118.43
2 1.869 … 1.623 126.125 … 2,434.17
... ... ... ... ... ... ...
20,000 1.013 … 2.267 164.084 … 1,480.62
Table 4—Monte Carlo simulation results. The name of the proxy used to evaluate each response variable is in parentheses.
History-Match Filtering. After obtaining the 20,000 Monte Carlo samples, we applied filtering and identified a group of samples for
which the history-match error for all eHM metrics was considered “close enough” to zero. The steps through which the filtering criteria
were determined are described in the following.
First, the production curves from all simulation runs were plotted and overlaid with the historical production data. For example, the
simulated SWP curves for Well MER03 in all simulation runs are shown in Fig. 7. The gray lines are the simulated curves, and the
purple squares represent the historical data. The historical data were denoised by selecting representative data points along the time
axis. The data were also believed to contain no known bias. After plotting all simulated SWP curves for Well MER03, each with a dif-
ferent eHM of MER03_SWP, we gradually removed SWP curves from the plot using a decreasing filtering threshold. For example, in
the plot shown in Fig. 7a, only simulated curves for which the eHM of MER03_SWP is 140 are retained. In the plots in Figs. 7b and 7c,
the threshold decreases from 140 to 70 to 40. We reviewed and discussed the eHM filtering thresholds for different plots in Fig. 7,
reflected upon the experience obtained during manual history matching conducted before this DOE study, and eventually decided to use
70 as the eHM threshold. The detailed logic is explained in Appendix A. This threshold (denoted by T 0 ) reflected our subjective tolerance
of the history-match error subject to an imperfect reservoir model and potential unknown data-measurement error. Namely, all the simu-
lation runs that have eHM 70 for the SWP of Well MER03 can be confidently considered as history matched for the metric
MER03_SWP. These steps were repeated for all eHM metrics. The corresponding thresholds T 0 are recorded in Table 5.
Because the full range of eHM is evaluated by performing Monte Carlo simulation using proxies rather than on actual reservoir simu-
lations, the eHM value is subject to proxy error (denoted by n). The standard deviations of proxy errors (denoted by rn ) for all eHM met-
rics are summarized in Table 6. Assuming rn follows a normal distribution with zero mean, the 95% confidence interval of the proxy
error is approximately 2rn . For some eHM metrics for which the proxy errors are homoscedastic (i.e., the error variances of different
proxy-estimated eHM values are similar), the proxy RMSE was used to represent the rn . However, for other eHM metrics for which the
proxy errors are heteroscedastic (i.e., the error variances differ), the standard deviation of the local proxy error at eHM ¼ T 0 was used.
This is illustrated in Fig. 8, which is a scatter plot that compares the actual response variable (eHM ) with the proxy-estimated response
variable for only blind-test runs (indicated by the dots). The proxy error in Fig. 8 is heteroscedastic: The dots cluster at small eHM and
8 2019 SPE Journal

spread at large eHM . Therefore, 2 RMSE (indicated by the dashed lines) cannot represent the 95% confidence interval when eHM is
small. For example, suppose that for this history-match metric T 0 is 100. The value of 2rn should be two times the standard deviation
of the local proxy error at eHM ¼ 100. The “local” region is indicated by the red box in Fig. 8. Although undesired, mild heteroscedas-
ticity such as in Fig. 8 is sometimes inevitable.
eHM : MER03_SWP < 140 eHM : MER03_SWP < 70 eHM : MER03_SWP < 40
SWP
Time Time Time

(a) (b) (c)
Fig. 7—Finding the threshold of the history-match error (eHM ) for MER03_SWP, subject to an imperfect reservoir model and poten-
tial unknown data-measurement error. This threshold is denoted by T 0. The lines are simulated production curves, and the purple
squares are historical data. The red rectangles show different T 0 values for eHM of MER03_SWP: 140, 70, and 40.
03ST2 03 06 07ST2 07 11 13ST 21ST2 12 12OH 19
SWP 100 70 120 120 130 90 160 130 50 90 50
WPR 300 180 400 450 375 600 100 400 – – –
Table 5—Value of threshold T0 for each eHM metric, which reflects the subjective tolerance of the history-match error subject to an imperfect
reservoir model and potential unknown data-measurement error.
03ST2 03 06 07ST2 07 11 13ST 21ST2 12 12OH 19
SWP 25 10 27.5 20 7.5 10 77 88 2.5 17.5 10
WPR 200 48 156 144 117 150 175 574 – – –
Table 6—Standard deviation of proxy error (rn) for each eHM metric.
300
250 ≈2 × RMSE
Proxy-Estimated eHM
200
150
100
50
0
0 50 100 150 200 250 300
Actual eHM
Fig. 8—Scatter plot that compares the actual response variable (eHM ) with the proxy-estimated response variable. When the proxy
error is heteroscedastic, use the standard deviation of the local proxy error at eHM 5T 0 as the rn for calculating the history-match
filtering threshold. In this example, T 0 5100, so use the local-proxy-error standard deviation in the red-box region as rn .
The final history-match filtering threshold (denoted by T ) for each eHM metric is summarized in Table 7. T ¼ T 0 þ 2rn , where T 0
reflects our subjective tolerance of the history-match error subject to an imperfect reservoir model and potential unknown data-
measurement error, and 2rn approximates the 95% confidence interval of the proxy error. Our logic in history-match filtering and the
definitions of model, data-measurement, and proxy errors can be found in Appendix A.
The T thresholds in Table 7 were used to filter the Monte Carlo samples. We obtained approximately 250 samples as history-
matched candidates. The posterior distributions were compared with the a priori distributions (defined in Table 2) for all input variables.
Fig. 9 presents three examples of comparison of a priori (shown in gray) and posterior (shown in blue) distributions (Bhark and
Dehghani 2014). In Fig. 9a, the right tail of the posterior distribution appears to be cut off. This typically suggests that the a priori distri-
bution might be biased toward lower values. In Fig. 9b, the posterior and a priori distributions are nearly identical. There are three pos-
sible reasons: the uncertainty range of the a priori distribution is too narrow; this input variable is not influential to the history-match
2019 SPE Journal 9

error; and the assessment of the a priori distribution is very accurate. In Fig. 9c, the posterior distribution is narrower than the a priori
distribution, indicating a reduction in the uncertainty and that history matching is effective. However, if the posterior distributions are
too narrow (i.e., most of the history-matched models are similar and input uncertainty collapses), it is likely that the uncertainty space
was not explored fully.
03ST2 03 06 07ST2 07 11 13ST 21ST2 12 12OH 19
SWP 150 90 175 160 145 110 314 306 55 125 70
WPR 700 275 712 738 609 900 450 1,548 – – –
Table 7—History-match filtering threshold (T*) for each eHM metric. T ¼ T 0 þ 2rn .
Frequency
Input Variable 1 Input Variable 2 Input Variable 3

(a) (b) (c)
Fig. 9—Comparing the posterior distribution (blue) with the a priori distribution (gray) (Bhark and Dehghani 2014): (a) the tail of the
posterior is cut off; (b) the posterior and the a priori distributions are similar; (c) uncertainty is reduced in the posterior. Note that
the history-matched models should still be diversified enough after uncertainty reduction.
The uncertainty was confirmed to be reduced after history matching. Actual simulations were performed on the history-matched can-
didates, which is necessary because they were obtained from proxies. The simulated production curves were plotted, and the simulation
runs that failed to match the historical data were removed. Here, we used T 0 (and not T* because proxy error no longer existed because
actual simulations were run) as the threshold for determining the final history-matched models. They were reviewed by SMEs in differ-
ent disciplines to confirm that each is geologically and physically plausible. The final history-matched ensemble contained 110 models.
The posterior distributions of the final ensemble were compared with the a priori distributions again to confirm that uncertainty was
reduced after history matching and that the uncertainty was not overly reduced.
During the entire history-match filtering process, we performed several iterations of fine tuning until the results were satisfactory.
For example, we revisited the static Earth model and the dynamic reservoir-simulation model. When new information, data, concepts,
and ideas were obtained or formed, the model was reconstructed accordingly. Iterative model reconstruction with new knowledge is
sometimes called “pro-cycling” (Larue and Hovadik 2012) or “model maturation” (Van Doren et al. 2012; Joosten et al. 2014). When
necessary, we discussed with the integrated subsurface team and adjusted the characterization of input variables and their uncertainties;
re-examined the historical data, well status, and completions with the oilfield operators; repeated proxy building using more simulation
runs; revisited the filtering thresholds; reperformed history-match filtering; and re-examined the posterior distributions.
Production Forecast. The history-matched models were used to make production forecasts. We selected the forecast response varia-
bles as OPC and WPC and net present value (NPV) at 1,600 days after the end of production history.
In the historical period, the WIRs and OPRs were used as the well controls for the injectors and producers, respectively. The well
controls imposed during the forecast period depend on the future operating strategy and were unknown at the time of the study. There-
fore, for production forecasts, the objective is to explore the uncertainty of the response variables assuming that the wells continue pro-
ducing in a “business-as-usual” manner. This requires the well rate and pressure to smoothly transition from the historical period to the
forecast period, honoring the trend at the end of the historical period. We used the simulated pressure drawdown for each well at the
end of the history period as the control for each well in the forecast period (Bhark and Dehghani 2014). We also imposed any mini-
mum/maximum pressure/rate constraint associated with facility limits. Alternatively, we could have calibrated the productivity index of
each well at the end of the production history and then used the measured tubinghead pressure or BHP at the end of the history as the
well control in forecast simulation.
All simulation runs were completed successfully. The simulated production curves were plotted and reviewed (Fig. 10). The transi-
tion from history to forecast was smooth. The curve distributions were examined, and no obvious outliers were detected. The spread of
the production forecast showed variations between different history-matched models. Note that OPRs were used as the historical well
controls for the producers during history matching; therefore, all simulation runs should exactly match OPR in the historical period, as
shown in the right panel of Fig. 10.
Representative Model Selection. The 110 forecast simulation runs were reviewed to select models that are representative of the P10,
P50, and P90 outcomes for all response variables (i.e., OPC, WPC, and NPV at Day 1,600 from the end of the production history).
First, the target P10, P50, and P90 percentiles were read from the cumulative distribution function of each response variable. Then, a
Minimax algorithm (Sarma et al. 2013) was used to examine the 110 models and identify those that were closest to the target percentiles
for all response variables. We placed more weight on matching the P10, P50, and P90 percentiles of NPV than on matching OPC and
WPC because NPV is the key decision driver.
Figs. 11 and 12 illustrate a set of P10/P50/P90 models selected by the algorithm. The three models are referred to as the red, green,
and blue models. They are close to the target P10, P50, and P90 percentiles (displayed by the thin blue dashed lines in Fig. 11) for
NPV, OPC, and WPC, particularly for NPV because more focus was placed on it during the search. Finding three models that match
10 2019 SPE Journal

exactly the target percentiles for all the three response variables is difficult because the exact P10, P50, and P90 models of NPV might
be far from the P10, P50, and P90 values of OPC and WPC. The red, green, and blue models are the best possible models for a match
of the target percentiles for all three response variables. In addition, the input variable values of the red, green, and blue models are as
diversified as possible, as shown in Fig. 12. Should an input variable be strongly correlated with a response variable, the correlation
would be honored in model selection, especially for heavy hitters. We discussed the model selection results with SMEs and obtained
their support. In addition, because model selection is often nonunique, we selected another P10/P50/P90 model set as an alternative.
MER21ST2 MER21ST2
SWP
OPR
History Forecast History Forecast
Time Time
Fig. 10—Review of forecast simulation results. The transition from history to forecast is smooth. The spread of production curves
in the forecast period indicates reservoir-property variations between different models. The production curves of all wells and the
field-scale production curve were examined.
1 1 1
P90 P90 P90
Cumulative Distribution
0.8 0.8 0.8
0.6 0.6 0.6

P50 P50 P50
0.4 0.4 0.4
0.2 0.2 0.2

P10 P10 P10
0 0 0
NPV OPC WPC
Fig. 11—Cumulative distribution functions of the response variables. The red, green, and blue symbols are the representative
models for the P10, P50, and P90 percentile targets.
1 1 1 1 1
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0 0 0 0 0
WEXP PV_AQ5PE PV_OCREG1 PV_OCREG2 PV_OCREG3
1 1 1 1 1
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0 0 0 0 0
KX_AQ5PE KX_OCREG1 KX_OCREG2 KX_OCREG3 OWC
Fig. 12—Cumulative distribution functions of the input variables and the individual input values for the red, green, and blue
representative models.
Conclusions
AHM using DOE is one of the most accepted methods for history matching and probabilistic production forecasting. It is conceptually
straightforward but can be misused if the underlying statistical and modeling principles are not honored. In this paper, the best practices
of all seven stages of the DOE-based AHM work flow are summarized thoroughly. These seven stages are problem framing, sensitivity
analysis, proxy building, Monte Carlo simulation, history-match filtering, production forecasting, and representative model selection.
The best practices ensure a reliable history match and production forecast with the following four steps:
2019 SPE Journal 11

1. The response variables of AHM (history-match-error metrics) and the input variables (reservoir-model parameters) and their uncer-
tainties are characterized properly.
2. The uncertainty space and the relationship between the input and response variables are explored effectively and efficiently.
3. History-match filtering criteria are defined reasonably, and the history-matched solutions are obtained representatively.
4. Probabilistic production forecasts and the selection of discrete models (e.g., P10, P50, P90 models) are performed appropriately.
A practical procedure to determine the history-match filtering criteria is devised. It assists engineers in defining the history-match-
error tolerance subject to the imperfect reservoir model, data-measurement error, and proxy error. Although the subjective nature of his-
tory matching cannot be fully eliminated, the procedure alleviates the arbitrariness in history-match filtering and fosters better under-
standing of the reservoir.
The DOE work flow is never a replacement for expert knowledge and collaboration. The framing of a DOE study requires SMEs with
interdisciplinary backgrounds; the connection between static Earth modeling and dynamic production simulation requires strong collabo-
ration between Earth scientists and reservoir engineers; and the unavoidable subjectivity in history matching requires SME experience
gained from structured manual history matching. All results of a DOE study should be evaluated with critical geological and engineering
judgment. Engineers should never rely excessively on (semi)automated history-matching algorithms to learn the reservoir for them.
The dependence on proxies and the limitation in the amount of input variables are two main disadvantages of DOE-based AHM. To
alleviate the negative consequence of proxy dependency, proxies must be blind tested before use, and the proxy error must be recorded.
The blind-test runs should be in sufficient quantity, randomly selected, and distributed across the entire uncertainty space. In addition,
proxy errors should be considered in defining history-match filtering criteria. Finally, actual simulations should be performed to verify
the history-matched models that are selected from proxy-based Monte Carlo sampling.
The constraint on the number of input variables forces engineers to understand reservoir-flow behavior and drive mechanisms to
make a scrupulous selection of the inputs. This effort cannot and must not be replaced by algorithms. In practice, the number of input
variables can often be reduced successfully to the level suitable for DOE-based AHM using parameterization techniques. If reservoir-
property adjustments at finer scales are required after DOE-based AHM, a second AHM method such as streamline-based (Datta-Gupta
and King 2007; Bhark et al. 2012; Watanabe et al. 2017) or adjoint-sensitivity-based (Van Doren et al. 2012; Joosten et al. 2014) can be
efficiently applied to fine tune the history-matching results.
Nomenclature
dobs ¼ observed data
dsim ¼ simulated data
eHM ¼ history-match error, a function of the difference between the observed data dobs and the simulated data dsim
eHM,0 ¼ history-match error, where the observed data are the original data unadjusted by the potential unknown data-measurement
error e
eHM.e ¼ history-match error, where the observed data are adjusted by the potential unknown data-measurement error e
R ¼ correlation coefficient
T ¼ tolerance of history-match error subject to an imperfect reservoir model
T0 ¼ tolerance of history-match error subject to an imperfect reservoir model and potential unknown data-measurement error
T* ¼ tolerance of history-match error subject to an imperfect reservoir model, potential unknown data-measurement error, and
proxy error
wi ¼ weight of each representative data point i along the time axis
b ¼ coefficient of regression equation
d ¼ confidence interval caused by proxy error n and potential unknown data-measurement error e
e ¼ potential unknown data-measurement error that has not been accounted for when the observed data are denoised and corrected
for known bias
e0 ¼ effect of e on history-match error, e0 ¼ eHM,e eHM,0
¼ error term of regression equation
n ¼ proxy error
rn ¼ standard deviation of proxy error
Acknowledgments
The authors thank Adwait Chawathe, Alexandre Castellini, Andrew Harding, Burak Yeten, David Larue, Hao Cheng, Kurt Kaczmarick,
Ning Liu, Shusei Tanaka, Stephen Johnson, Will da Sie, Xian-Huan Wen, and Yanfen Zhang for the inspiring discussions. The authors
also thank Chevron for permission to publish this paper.
References
Aanonsen, S. I., Nævdal, G., Oliver, D. S. et al. 2009. The Ensemble Kalman Filter in Reservoir Engineering—A Review. SPE J. 14 (3): 393–412. SPE-
117274-PA. https://doi.org/10.2118/117274-PA.
Bhark, E., Rey, A., Datta-Gupta, A. et al. 2012. A Multiscale Workflow for History Matching in Structured and Unstructured Grid Geometries. SPE J.
17 (3): 828–848. SPE-141764-PA. https://doi.org/10.2118/141764-PA.
Bhark, E. W. and Dehghani, K. 2014. Assisted History Matching Benchmarking: Design of Experiments-Based Techniques. Presented at the SPE
Annual Technical Conference and Exhibition, Amsterdam, 27–29 October. SPE-170690-MS. https://doi.org/10.2118/170690-MS.
Burkardt, J. 2005. IHS: Improved Distributed Hypercube Sampling, http://people.sc.fsu.edu/~jburkardt/cpp_src/ihs/ihs.html (accessed 1 May 2019).
Castellini, A., Landa, J. L., and Kikani, J. 2004. Practical Methods for Uncertainty Assessment of Flow Predictions for Reservoirs With Significant History—
A Field Case Study. Presented at ECMOR IX–9th European Conference on the Mathematics of Oil Recovery, Cannes, France, 30 August–2 September.
Chatterjee, S. and Hadi, A. S. 2015. Regression Analysis by Example. Hoboken, New Jersey: John Wiley & Sons.
Chen, W. H., Gavalas, G. R., Seinfeld, J. H. et al. 1974. A New Algorithm for Automatic History Matching. SPE J. 14 (6): 593–608. SPE-4545-PA.
https://doi.org/10.2118/4545-PA.
Cheng, H., Dehghani, K., and Billiter, T. C. 2008. A Structured Approach for Probabilistic-Assisted History Matching Using Evolutionary Algorithms:
Tengiz Field Applications. Presented at the SPE Annual Technical Conference and Exhibition, Denver, 21–24 September. SPE-116212-MS. https://
doi.org/10.2118/116212-MS.
12 2019 SPE Journal

Datta-Gupta, A. and King, M. J. 2007. Streamline Simulation: Theory and Practice, Vol. 11. Richardson, Texas: Textbook Series, Society of
Petroleum Engineers.
Deutsch, C. V. and Journel, A. G. 1992. GSLIB: Geostatistical Software Library and User’s Guide. New York City: Oxford University Press.
Doherty, J. 2003. Ground Water Model Calibration Using Pilot Points and Regularization. Groundwater 41 (2): 170–177. https://doi.org/10.1111/j.1745-
6584.2003.tb02580.x.
Emerick, A. A. and Reynolds, A. C. 2013. Ensemble Smoother With Multiple Data Assimilation, Comput Geosci 55 (June): 3–15. https://doi.org/
10.1016/j.cageo.2012.03.011.
Fenwick, D., Scheidt, C., and Caers, J. 2014. Quantifying Asymmetric Parameter Interactions in Sensitivity Analysis: Application to Reservoir Model-
ing. Math Geosci 46 (4): 493–511. https://doi.org/10.1007/s11004-014-9530-5.
Gavalas, G. R., Shah, P. C., and Seinfeld, J. H. 1976. Reservoir History Matching by Bayesian Estimation. SPE J. 16 (6): 337–350. SPE-5740-PA.
https://doi.org/10.2118/5740-PA.
Hastie, T., Tibshirani, R., and Friedman, J. 2009. The Elements of Statistical Learning, second edition. New York City: Springer Series in Statistics, Springer.
He, J., Reynolds, A. C., Tanaka, S. et al. 2018. Calibrating Global Uncertainties to Local Data: Is the Learning Being Over-Generalized? Presented at the
SPE Annual Technical Conference and Exhibition, Dallas, 24–26 September. SPE-191480-MS. https://doi.org/10.2118/191480-MS.
Helton, J. C., Johnson, J. D., Sallaberry, C. J. et al. 2006. Survey of Sampling-Based Methods for Uncertainty and Sensitivity Analysis. Reliab Eng Syst
Safe 91 (10–11): 1175–1209. https://doi.org/10.1016/j.ress.2005.11.017.
Jafarpour, B. and McLaughlin, D. B. 2009. Reservoir Characterization With the Discrete Cosine Transform. SPE J. 14 (1): 182–201. SPE-106453-PA.
https://doi.org/10.2118/106453-PA.
James, G., Witten, D., Hastie, T. et al. 2013. An Introduction to Statistical Learning With Applications in R. New York City: Springer.
Joosten, G. J. P., Altintas, A., Van Essen, G. et al. 2014. Reservoir Model Maturation and Assisted History Matching Based on Production and 4D Seis-
mic Data. Presented at the SPE Annual Technical Conference and Exhibition, Amsterdam, 27–29 October. SPE-170604-MS. https://doi.org/10.2118/
170604-MS.
King, G. R., Lee, S., Alexandre, P. et al. 2005. Probabilistic Forecasting for Mature Fields With Significant Production History: A Nemba Field Case
Study. Presented at the SPE Annual Technical Conference and Exhibition, Dallas, 9–12 October. SPE-95869-MS. https://doi.org/10.2118/95869-MS.
Larue, D. K. and Hovadik, J. 2012. Rapid Earth Modelling for Appraisal and Development Studies of Deep-Water Clastic Reservoirs and the Concept of
“Procycling”. Pet Geosci 18 (2): 201–218. https://doi.org/10.1144/1354-079311-033.
LaVenue, A. M., RamaRao, B. S., De Marsily, G. et al. 1995. Pilot Point Methodology for Automated Calibration of an Ensemble of Conditionally Simu-
lated Transmissivity Fields: 2. Application. Water Resour Res 31 (3): 495–516. https://doi.org/10.1029/94WR02259.
Li, B. and Friedmann, F. 2005. Novel Multiple Resolutions Design of Experiment/Response Surface Methodology for Uncertainty Analysis of Reservoir
Simulation Forecasts. Presented at the SPE Reservoir Simulation Symposium, Houston, 31 January–2 February. SPE-92853-MS. https://doi.org/
10.2118/92853-MS.
Li, R., Reynolds, A. C., and Oliver, D. S. 2003. History Matching of Three-Phase Flow Production Data. SPE J. 8 (4): 328–340. SPE-87336-PA. https://
doi.org/10.2118/87336-PA.
Montgomery, D. C. 2008. Design and Analysis of Experiments. Hoboken, New Jersey: John Wiley & Sons.
Oliver, D. S. and Chen, Y. 2011. Recent Progress on Reservoir History Matching: A Review. Computat Geosci 15 (1): 185–221. https://doi.org/10.1007/
s10596-010-9194-2.
Sarma, P., Chen, W. H., and Xie, J. 2013. Selecting Representative Models From a Large Set of Models. Presented at the SPE Reservoir Simulation Sym-
posium, The Woodlands, Texas, 18–20 February. SPE-163671-MS. https://doi.org/10.2118/163671-MS.
Sarma, P., Yang, C., Xie, J. et al. 2015. Identification of “Big Hitters" With Global Sensitivity Analysis for Improved Decision Making Under Uncer-
tainty. Presented at the SPE Reservoir Simulation Symposium, Houston, 23–25 February. SPE-173254-MS. https://doi.org/10.2118/173254-MS.
Schaaf, T., Coureaud, B., Labat, N. et al. 2009. Using Experimental Designs, Assisted History-Matching Tools, and Bayesian Framework To Get Proba-
bilistic Gas-Storage Pressure Forecasts. SPE Res Eval & Eng 12 (5): 724–736. SPE-113498-PA. https://doi.org/10.2118/113498-PA.
Schmidt, S. R. and Launsby, R. G. 1989. Understanding Industrial Designed Experiments. Colorado, Springs, Colorado: Air Academy Press.
Schulze-Riegert, R. W., Axmann, J. K., Haase, O. et al. 2002. Evolutionary Algorithms Applied to History Matching of Complex Reservoirs. SPE Res
Eval & Eng 5 (2): 163–173. SPE-77301-PA. https://doi.org/10.2118/77301-PA.
Skjervheim, J. and Evensen, G. 2011. An Ensemble Smoother for Assisted History Matching. Presented at the SPE Reservoir Simulation Symposium,
The Woodlands, Texas, 21–23 February. SPE-141929-MS. https://doi.org/10.2118/141929-MS.
Van Doren, J., Van Essen, G., Wilson, O. B. et al. 2012. A Comprehensive Workflow for Assisted History Matching Applied to a Complex Mature Res-
ervoir. Presented at the SPE Europec/EAGE Annual Conference, Copenhagen, Denmark, 4–7 June. SPE-154383-MS. https://doi.org/10.2118/
154383-MS.
van Leeuwen, P. J. and Evensen, G. 1996. Data Assimilation and Inverse Methods in Terms of a Probabilistic Formulation. Mon Weather Rev 124 (12):
2898–2913. https://doi.org/10.1175/1520-0493(1996)124%3C2898:DAAIMI%3E2.0.CO;2.
Vasco, D. W., Seongsik, Y., and Datta-Gupta, A. 1999. Integrating Dynamic Data Into High-Resolution Reservoir Models Using Streamline-Based Ana-
lytic Sensitivity Coefficients. SPE J. 4 (4): 389–399. SPE-59253-PA. https://doi.org/10.2118/59253-PA.
Watanabe, S., Han, J., Hetz, G. et al. 2017. Streamline-Based Time-Lapse-Seismic-Data Integration Incorporating Pressure and Saturation Effects. SPE J.
22 (4): 1261–1279. SPE-166395-PA. https://doi.org/10.2118/166395-PA.
Williams, M. A., Keating, J. F., and Barghouty, M. F. 1998. The Stratigraphic Method: A Structured Approach to History Matching Complex Simulation
Models. SPE Res Eval & Eng 1 (2): 169–176. SPE-38014-PA. https://doi.org/10.2118/38014-PA.
Yeten, B., Castellini. A., Guyaguler, B. et al. 2005. A Comparison Study on Experimental Design and Response Surface Methodologies. Presented at the
SPE Reservoir Simulation Symposium, Houston, 31 January–2 February. SPE-93347-MS. https://doi.org/10.2118/93347-MS.
Appendix A—Define History-Match Filtering Criteria

In DOE-based AHM, before history-match filtering, proxies are constructed to approximate the relationship between the response varia-
bles (eHM ) and input variables (reservoir properties to be solved in history matching). Monte Carlo simulation is subsequently per-
formed using these proxies, fully exploring the possible values of eHM . The objective of history-match filtering is to identify the Monte
Carlo samples that match the historical data and to remove those that do not.
Define Tolerance of History-Match Error Subject to Imperfect Reservoir Model. To understand how to determine the history-
match filtering criteria, consider an ideal scenario where a perfect history match can be performed. The following characters would apply:
1. Perfect a priori assessment of input-variable uncertainty: All reservoir uncertainties are identified during problem framing, and
their ranges and a priori distributions are assessed flawlessly.
2019 SPE Journal 13

2. Perfect physics simulator: All physics in the reservoir can be simulated with zero numerical error.
3. Perfect data measurement: All gauges and meters are accurate, and data-measurement error is zero.
4. Infinite computational power: Simulations can be finished quickly and neither DOE nor proxies are needed.
If the history-match errors (eHM ) of all the simulations are sorted in descending order, the outcome would look like Fig. A-1. eHM is
nonnegative, and when it equals zero, the simulated values match the observation data exactly. Note that multiple history-match-error
metrics might exist, such as the BHP, WCT, and GPR of each well. For each metric, there might be a suite of history-matched simula-
tion models, indicated by the red arrows in Fig. A-1a, and represented by the circles in Fig. A-1b. The final ensemble of history-
matched models is the intersection of each individual suite, shown by the intersection of circles in Fig. A-1. The actual reservoir is one
of the history-matched models in this final ensemble. Because history matching is an ill-posed problem, different values of reservoir
properties might lead to the same production history.
eHM
eHM for BHP
for WCT eHM, WCT = 0
eHM
for GPR
e HM
0
=
,B
PR
HP
,G
=0
M
eH
0
Simulation No.
(a) (b)
Fig. A-1—Ideal scenario of history matching. (a) Each history-match-error metric (eHM ) has a suite of history-matched models, as
illustrated by the red arrows. (b) The final ensemble of history-matched models is the intersection of each individual suite, illus-
trated by the yellow-shaded area.
Fig. A-1 represents the ideal scenario of history matching, which does not occur in practice. Consider now a subideal scenario that is
closer to reality, with the following characteristics:
1. Imperfect a priori assessment of input-variable uncertainty: Not all reservoir uncertainties can be identified, and their ranges and
distributions cannot be assessed flawlessly.
2. Imperfect physics simulator: The simulator cannot model all physics in the reservoir, the simulation results are subject to numeri-
cal error (or numerical dispersion), and upscaling might introduce additional error in simulation results.
3. Perfect data measurement: All gauges and meters are accurate, and data-measurement error is zero.
4. Infinite computational power: Simulations can be performed quickly, and neither DOE nor proxies are needed.
The imperfectness in a priori uncertainty assessment and in simulated physics represents the so-called “model error.” This reflects
the intrinsic and unavoidable limitation of the reservoir model to be history matched: It is beyond one’s capability to enumerate all pos-
sible uncertainties and identify all physical processes in a reservoir, and even if it were possible, it would be impractical to consider
them all in the study. Besides, numerical errors always exist in simulations, and upscaling might be an additional error source. With the
presence of model error, even if a suite of models with eHM ¼ 0 can still be identified for each eHM metric, they might not intersect, as
illustrated in Figs. A-2a and A-2b. Indeed, experience suggests that it is easier to history match one metric but much harder to match
all metrics simultaneously. Even if they do intersect, there is no guarantee that the actual reservoir model is contained within the inter-
section. Therefore, a threshold eHM value, denoted by T, should be defined for each individual eHM metric, as shown in Fig. A-2a. A
model for which the eHM is T should be accepted for this eHM metric. If a model is accepted for all metrics, it becomes a member of
the final ensemble of history-matched models, as depicted in Fig. A-2c. Essentially, the filtering criterion is relaxed from eHM ¼ 0 to
eHM T as a compromise to the inherent and inevitable limitation of the reservoir model.
eHM
for BHP
eHM, WCT
eHM TBHP
eHM, WCT < TWCT
for WCT eHM =0
for GPR
TGPR
Reject eHM, BHP eHM, GPR e PR
TWCT HM
Accept < ,B ,G
=0 =0 T HP e HM T GPR
BH <
P
Simulation No.
(a) (b) (c)
Fig. A-2—History matching in the presence of model error only. (a) A threshold T should be defined for each eHM metric to relax the
filtering criterion from eHM 5 0 to eHM £ T . (b) If eHM 5 0 is the criterion, it is possible that no history-matched solution can be found.
(c) With the relaxed criterion, the final ensemble of history-matched models is the yellow-shaded area.
It is a common misunderstanding that T reflects the engineer’s estimate of the model error. Model error cannot be estimated because
this would require the “perfect history-match scenario” (described previously) as a reference. Instead, one can only evaluate his/her tol-
erance of eHM subject to the imperfect reservoir model. The tolerance T reflects the engineer’s comfort level of calling a model “history
matched” even though its eHM is not zero. Unfortunately, the subjectivity in determining the threshold T is unavoidable.
14 2019 SPE Journal

The following steps prescribe a practical way to estimate the threshold T for each eHM metric, using Fig. 7 as an example. Note that
in the case study, it is T 0 , not T, that is determined from Fig. 7. However, for simplicity, assume T is the target here. The difference
between T and T 0 is discussed later.
• Step 1: Pick an eHM metric, such as eHM of SWP of a well.
• Step 2: Start from a large T value.
• Step 3: Plot all the simulated production curves for which eHM T, and overlay the historical curve. The historical curve should
be denoised: only the representative data points are kept and plotted. eHM computation should use the representative data points of
the historical curve. In Fig. 7a, the initial T is 140. All simulated SWP curves for which eHM 140 are plotted (the gray lines),
and the historical SWP is overlaid (the purple squares).
• Step 4: Determine if all the simulated production curves are reasonably close to the historical curve. The “reasonably close” feeling
comes from manual history matching, which should be conducted before DOE-based history matching. We discuss this below.
• Step 5: If the simulated curves are not reasonably close to the historical curve, then tighten the T value until it is satisfactory, as
illustrated in Figs. 7b and 7c. This T value represents the engineer’s comfort level of calling a simulated curve “history matched”
even though its eHM is not zero.
• Step 6: Repeat Steps 1 through 5 for all other eHM metrics.
The sense of “reasonably close” is gained from manual history matching, where one manually adjusts the reservoir properties in a struc-
tured manner (Williams et al. 1998), performs simulation runs, and tries to match the simulated production curves with the historical curves
as closely as possible. During this process, one will experience the difficulty in matching all metrics simultaneously. For example, when the
reservoir properties are tuned to better match the WCT of a well, the GPR match might worsen. After many attempts, one develops a sense
of which reservoir input variables are needed for a better history match. In addition, one forms a reasonable expectation of how close the
simulated and historical curves can ultimately be. This expectation is quantified by the T value in Steps 1 through 6 discussed previously.
When comparing the simulated curves with the historical data, as depicted by the gray lines and purple squares in Fig. 7, pay atten-
tion to their relative position. eHM is always zero when the gray lines and purple squares match exactly. Fig. 7 is used to relate the size
of the cloud of gray lines to an eHM value. When one feels that the gray cloud surrounding the purple squares is small enough, the corre-
sponding eHM value is the value of the threshold T.
In reality, the recording of historical data almost always contains data-measurement error. However, the historical curves should be
denoised, and any known data bias in the historical data measurement should be corrected. The denoised and corrected historical data
should be used throughout the entire history-matching study. Therefore, when defining the T value as in Fig. 7, treat the historical
curves as free of measurement error. Any potential remaining (unknown) measurement error that has not yet been accounted for during
denoising and bias correction is considered in the next subsection.
Finally, it is worth noting that He et al. (2018) also concluded that imperfect uncertainty assessment and physics simulation should
be considered in history matching. Similar to the T threshold, a correction factor s is derived in He et al. (2018). The concept of the
T threshold in our work (i.e., tolerance of eHM subject to an imperfect reservoir model), is called “modeling error” in their paper (not to
be confused with “model error” in our work).
Consider Data-Measurement and Proxy Errors and Perform Filtering. In DOE-based history matching, the full range of eHM is
evaluated by performing Monte Carlo simulation using proxies rather than on actual reservoir simulations. Therefore, the eHM value is
subject to proxy error. In addition, although the historical curves have been denoised and corrected for known data bias, they might still
not reflect reality because of the remaining noise in measured data that has not been removed; error introduced by the engineer during
denoising; unknown data bias; unknown physical impairments and changes to the wells and facilities; and production allocation errors
between wells. The presence of these data-measurement and proxy errors, along with the model error described previously, represents
the challenges of history matching in practical application:
1. Imperfect a priori assessment of input-variable uncertainty.
2. Imperfect physics simulator.
3. Imperfect data measurement: Gauges and meters might have data-measurement error because of noise or bias; some of the error
can be corrected, but some cannot.
4. Limited computational power: DOE and proxies are needed as a result, and proxies have errors.
If the eHM values of all proxy-based Monte Carlo samples are sorted in descending order, the outcome would look like Fig. A-3. The
solid blue curve indicates the proxy response of eHM from Monte Carlo simulation, and dashed red curves represent the confidence
interval (d) caused by proxy error and potential unknown data-measurement error. The red- or green-shaded regions are where the
entire range of the confidence interval is above or below the T value, respectively. Therefore, the Monte Carlo samples should be
rejected in the red region and accepted with confidence in the green region. However, the decision of rejection or acceptance is not so
straightforward in the middle unshaded region. For example, at the blue dot, the proxy says that eHM > T, but the true eHM might be < T
because the lower confidence interval is below T.
Proxy response (from

eHM
Monte Carlo sampling)
for WCT
Confidence interval (δ )
caused by proxy error and
potential unknown data-measurement error
Proxy says eHM here is >T,
but the true eHM may be <T Area with high certainty that
TWCT the true eHM is >T.
Reject with confidence.
δWCT Area with high certainty that
the true eHM is <T.
0
Accept with confidence.
Proxy-Based Monte Carlo Sample No.
Fig. A-3—Data-measurement and proxy errors complicate history-match filtering. In the red- or green-shaded regions, the Monte
Carlo samples can be rejected or accepted with confidence, respectively. In the middle unshaded region, however, rejection or
acceptance is not a straightforward decision.
2019 SPE Journal 15

The solution is to also accept the middle region. This amounts to raising the T value to T by the confidence interval (d), which is a
function of the potential unknown data-measurement error (e) and the proxy error (n): T ¼ T þ d. A Monte Carlo sample for which
eHM is T should be accepted for this eHM metric, as illustrated in Fig. A-4. Every eHM metric has its own T , T, e, n, and d. If a
Monte Carlo sample is accepted for all eHM metrics, it becomes a history-matched-model candidate. Because all candidates are eval-
uated by proxies, they should be verified by simulation before being accepted as the final ensemble of history-matched models. This is
depicted in the right panel of Fig. A-4. Although raising T to T will inevitably include some Monte Carlo samples that should in fact
be rejected, these samples are unlikely to pass the final verification, and therefore should not jeopardize the quality of history match.
The potential unknown data-measurement error (e) is the error on gauges and meters and cannot be used directly to adjust the thresh-
old T. Instead, the effect of e on the history-match error should be used, which is denoted by e0 . Denote eHM;e as the history-match error
where the observed data equal diobs þ ei (i ¼ 1; 2; …; n, where n is the number of representative data points along the time axis). If eHM;0
is denoted as the original history-match error where the observed data are diobs (i ¼ 1; 2; …; n), then e0 is defined as eHM;e eHM;0 . Each
ei is a random variable and can be correlated depending on whether the error is pure noise or systematic bias. Therefore, e0 is also a
random variable. Note that e0 represents the data-measurement error that has not yet been accounted for during denoising and bias cor-
rection. It is unknown, and thus engineers should assume its value using their understanding of and confidence in the data quality.
eHM, WCT
eHM < T*WCT
for WCT
Proxy response (from
Monte Carlo sampling)
e HM *BHP
PR
Confidence interval (δ )
,G
<T
PR
,B
HM
caused by the potential
T*
G
T*WCT
HP
e
<
δ WCT = f (εWCT, ξWCT) unknown data-measurement
TWCT error (ε) and proxy error (ξ)
Verify using
Area of acceptance simulation
0 History
Candidates
Proxy-Based Monte Carlo Sample No. matched
Fig. A-4—Practically realistic history-matching scenario. Raise the T value to T* by the confidence interval (d), which is a function
of the potential unknown data-measurement error (e) and proxy error (n). Accept the Monte Carlo samples for which eHM is £ T .
Repeat for all eHM metrics. The Monte Carlo samples that satisfy the filtering criteria of all eHM metrics become candidates. If they
are verified by simulation, then they become the final ensemble of history-matched models.
Proxy error (n) is measured during proxy blind testing. It is often assumed that n follows a normal distribution with mean of zero
and standard deviation of rn . Assuming e0 and n are independent random variables, the d that adjusts T to T is evaluated as the 95%
confidence interval of e0 þ n.
When the history-matched candidates are verified by running simulations, proxy error no longer exists, and therefore, the threshold
for determining the final history-matched models should be T 0 , which is defined as T plus the 95% confidence interval of e0 .
Final Remarks. The history-match filtering procedure described here cannot completely remove subjectivity and approximations,
which are the nature of history matching. The filtering procedure does not provide rule(s) for subjectivity. Rather, its objective is to alle-
viate the arbitrariness in making subjective decisions. In addition, note that the history-match filtering process can be iterative, as
described in the History-Match Filtering subsection in the Case Study section.
Engineers should use their discretion when deciding the filtering criteria and strategy. For example, sometimes one cannot determine
the threshold T and the potential unknown data-measurement error (e0 ) separately. Because e0 represents the error that has not been
accounted for during data denoising and bias correction, it is estimated with some degree of assumption and subjectivity, similar to T.
In this case, rather than attempting to differentiate between T and e0 , one can determine the threshold T 0 directly for all history-match
metrics following the logic of Fig. 7. T 0 represents one’s tolerance of the history-match error subject to the imperfect reservoir model
and the potential unknown data-measurement error. The history-match filtering threshold T is evaluated as T 0 plus the 95% confidence
interval of n. If n follows a normal distribution with zero mean, T ¼ T 0 þ 2rn . When the history-matched candidates are verified by
running simulations, the threshold for determining the final history-matched models should be T 0 , not T .
Boxiao Li is a reservoir engineer at Chevron Energy Technology Company. His research interests include reservoir simulation,
uncertainty analysis, history matching, optimization, and artificial intelligence for conventional and unconventional hydro-
carbon reservoirs and geological carbon dioxide sequestration. Li holds a bachelor’s degree in environmental science and
engineering from Shanghai Jiao Tong University, Shanghai, and master’s and PhD degrees in energy-resources engineering from
Stanford University.
Eric W. Bhark is a reservoir engineer at Chevron Asia Pacific E&P Company, with research interests in numerical and analytical
reservoir-flow-model calibration and asset-development optimization. He currently resides in Indonesia, where he supports local-
asset management of light-oil-waterflood fields, deepwater-gas/condensate fields, and heavy-oil steamfloods. Before joining
Chevron, Bhark worked as a hydrogeologist for Intera, specializing in groundwater flow and radionuclide-transport modeling.
He holds a PhD degree in petroleum engineering from Texas A&M University, a master’s degree in hydrology from New Mexico
Institute of Mining and Technology, and a bachelor’s degree in geology and geophysics from Boston College.
Stephen J. Gross is a retired senior petroleum-engineering consultant, formerly with Chevron Energy Technology Company. He
joined Chevron in 1983. Within the 35 years of his career, Gross provided reservoir-engineering support for US and international
assets and projects, including in Wyoming, California, Oklahoma, Kansas, Indonesia, Angola, Papua New Guinea, Kuwait, Saudi
Arabia, and Kazakhstan. He holds bachelor’s and master’s degrees in petroleum engineering from Texas A&M University.
16 2019 SPE Journal

Travis C. Billiter has worked for 23 years for Chevron’s Upstream business in the Danish and UK sectors of the North Sea, Kazakhstan,
and deepwater Gulf of Mexico. He has worked in Chevron’s Technology Center and in their worldwide business units. Billiter’s
interests lie in the areas of asset development and optimization, advanced DOE, and coaching/mentoring. He holds a bachelor’s
degree in petroleum engineering from Marietta College and a PhD degree in chemical engineering from Texas A&M University.
Kaveh Dehghani is a Chevron Fellow and the reservoir simulation and consulting team leader at Chevron Energy Technology
Company. Previously, he worked as a reservoir-engineering adviser with TengizChevroil in Kazakhstan, as petroleum engineering
manager with Chevron Middle East/North Africa, and as a staff research scientist at Chevron Petroleum Technology Company.
Before joining Chevron, Dehghani was an associate professor with the Department of Petroleum Engineering at the University of
Alaska. His interests include probabilistic forecasting and modeling, laboratory, and field application studies of oil-recovery proc-
esses in fractured and heterogeneous reservoirs. Dehghani holds a bachelor’s degree from Abadan Institute of Technology,
Iran, and master’s and PhD degrees from the University of Southern California, all in petroleum engineering.
2019 SPE Journal 17

Best Practices of Assisted History Matching Using Design of Experiments

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Best Practices of Assisted History Matching Using Design of Experiments

Uploaded by

Copyright:

Available Formats

J191699 DOI: 10.

2118/191699-PA Date: 8-May-19 Stage: Page: 1 Total Pages: 17

Best Practices of Assisted History Matching

2019 SPE Journal 1

ID: jaganm Time: 17:28 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

2 2019 SPE Journal

ID: jaganm Time: 17:28 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

Well-Level Data: SWP Well-Level Data: WPR Field-Level Data

2019 SPE Journal 3

ID: jaganm Time: 17:28 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

Parameter Symbol Low Midrange High Type Distribution

The water/oil relative permeability curves of the reservoir are

4 2019 SPE Journal

ID: jaganm Time: 17:28 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

eHM : MER12_SWP eHM : MER13ST_SWP

2019 SPE Journal 5

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

eHM : MER06_SWP eHM : MER06_WPR

6 2019 SPE Journal

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

eHM : MER06_SWP (Spline) eHM : MER06_SWP (Spline)

Actual Proxy Estimate Proxy Estimate

2019 SPE Journal 7

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

Input Variables Response Variables Evaluated by Proxies

8 2019 SPE Journal

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

Time Time Time

2019 SPE Journal 9

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

Input Variable 1 Input Variable 2 Input Variable 3

10 2019 SPE Journal

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

2019 SPE Journal 11

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

12 2019 SPE Journal

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

Appendix A—Define History-Match Filtering Criteria

2019 SPE Journal 13

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

14 2019 SPE Journal

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

Proxy response (from

2019 SPE Journal 15

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

16 2019 SPE Journal

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

2019 SPE Journal 17

ID: jaganm Time: 17:29 I Path: S:/J###/Vol00000/190030/Comp/APPFile/SA-J###190030

You might also like