You are on page 1of 3

# 3.4.2 Instability.

The
estimated variogram represents the arithmetic average of the
squared differences of variable pair values at a particular lag distance. Because it
uses the square of the difference, any large difference between a given pair is
magnified. If pairs exhibit a large difference, the squared difference may have a
significant impact on the arithmetically averaged variogram value. This effect
may change the variogram value disproportionately at a particular lag distance,
resulting in instability of the estimated variogram. This instability may prevent
capturing the underlying variogram structure that may be present and also
causes fluctuations in the estimated variogram as lag distance increases. The
instability must be minimized to model the variogram.
he two methods commonly used to minimize fluctuations are to increase the
possible number of pairs for a given lag distance or to remove certain pairs for a
given lag distance. The previous section discussed the first possibility: increasing
the possible number of pairs for a given lag distance by use of appropriate
tolerance values with respect to the distance and the direction. That discussion
showed that increasing the number of pairs for a given lag distance does
improve stability of the variogram.
An alternative for improving the stability of the estimated variogram is to
examine the possible pairs used for estimation of the variogram for a given lag
distance. The difference between the two point values in a pair is what affects
the variogram. If the difference is very large, the squared difference can have a
significant impact on the estimated variogram. If we can eliminate certain
extreme" pairs that have a significant impact on the variogram computations,
we may be able to obtain a better estimate of the variogram that is less affected
by these extreme pairs.
Scatter plots arc one way to examine these extreme pairs. 6 Plotting one data
point of a pair vs. the other data point from the same pair may reveal the
differences between the two data points. If the match between the two points is
exact, the point falls on a 45 line. On the basis of the scatter plots, certain pairs
can be removed, and the variogram can be recomputed for a given lag distance.
Alternatively, a certain percentage of the pairs showing the maximum deviation
can be removed to create more uniformity in the analysis and to eliminate sub-
jectivity in deciding which pairs should be removed. Forex- ample, for every lag
distance, 10% of all pairs in the order of showing the maximum deviations can
be removed. Under these circumstances, the variogram represents a truncated
mean of the differences squared for a particular lag distance. Such truncated
means are often used in statistics to reduce the adverse effects of erratic values
(e.g., in figure skating in the Olympic Games, the two extreme scores are
removed from the final tally). This procedure also has the advantage of being
objective.
Field Example 3.3. Analyze the Flow Unit 3 porosity data and re-estimate the variogram by removing the effect of ex -
treme pairs.
Solution. For this exercise, we assume that the average lag interval is still maintained at 1,400 ft, with a tolerance of 700
ft. Fig. 3.10 shows the scatter plots for two different lag intervals; the first and the sixth. For the first lag intervals, we
have 42 pairs (see Tabie 3.2). However, we show only 21 pairs here. The other 21 pairs are symmetrical with the first 21
pairs, and do not change the variogram estimation [i.c., a(i<i) vs.a(2) is symmetrical to*(H2) vs.x(ui) and use of both
pairs does not add any additional information because the difference squared would be the same for both the pairs]. In
the scatter plot in Fig. 3.10a, we can remove two pairs that can be considered as extreme. The points surrounded by
open boxes show these pairs. The choice of what data pairs to consider as extreme is arbitrary. In Fig. 3.10b, which
shows 216 pairs, the decision becomes more difficult. The figure shows five pairs that are considered extreme.
However, with the large number of pairs in the figure, it is hard to decide what pairs to remove as extreme. It is easy to
come up with 8 or 15 pairs that could be considered extreme. This process can become extremely difficult and
cumbersome. As a result of the extremely subjective nature of removing individual pairs on the basis of the scatter plot
(unless it is so obvious that anyone would remove it), we have not attempted to estimate the variogram after subjective
removal of certain pairs for each lag interval.
Instead, we adopted the approach of removing a certain percentage of pairs from the total number of pairs. Fig. 3.11
shows an estimated variogram plot after removal of 5 and 10% of the extreme pairs. Note that extreme refers to the
pairs showing the largest differences: pairs lhat show the smallest differences are not considered to be extreme. This is
because the pairs that show a large difference affect the average much more than the pairs that show a small
difference.
In Fig. 3.11, the overall sill (maximum variogram value) decreases as the extreme pairs arc removed. This is to be
expected because the average of the squared differences is smaller after removal of pairs showing the largest
differences. The effect is more pronounced after removal of 10% of the extreme pairs than after removal of 5% of the
extreme pairs. Unfortunately, the overall structure of the variograms (including the fluctuations) is largely unaffected by
this removal. Because the goal is to capture the spatial structure and not necessarily the exact sill value, removal of
extreme pairs has not added any new information to our understanding of spatial relationships in this case.
Note that, although we did not achieve the desired smoothness in this case by removing a certain number of pairs,
this technique may be able to be applied to other data sets. It is im portant to remember that the overall objective is to
capture the most interpretable structure. Therefore, it is important to try different modifications and techniques to capture
that structure. The modification that gives the most interpretable structure should be used for further analysis.
3.4.3 Influence of Outliers. Outliers arc hard to define. In a conventional sense, outlier data are data points that fall
outside the norm. For a normal distribution, a data point falling outside the mean plus or minus three standard
deviations can be considered an outlier. However, for distributions that cannot be described by parametric distribution
functions, it is hard to define precisely what constitutes outlier data. Specifically, if data exhibit several-orders-of-
magnitude variations, it is difficult to define the value beyond which data can be considered as outliers. For example,
permeability data at well locations typically exhibit sevcral-ordcrs-of-magnitude variations. This becomes evident when
the coefficient of variation (the ratio of standard deviation to mean) is >2. For permeability data, a typical value of
coefficient of variation is in the range of two to five. Under such a large variation, it is very difficult (and subjective) to
define anomalous data that can be considered as outlier data.
Outlier data can significantly affect the variogram estimation. As previously explained, use of an extreme value in va-
riogram estimation can amplify the effect because the squared difference between a data pair is used. If the difference
between a given pair is several orders of magnitude, the squared difference is large enough to influence the estimated
variogram at a particular lag distance. Remember that the variogram is an arithmetic average of squared differences;

Fig. 3.10Scatter plot at (a) a first lag Interval and (b) a sixth lag
interval.
Fig. 3.9Rose diagram of lag distances in different
directions.
therefore, one large squared difference can significantly alter the variogram value. This may create instability in the
variogram estimation and also may prevent us from clearly identifying the spatial structure for a particular variable.
The simplest way to deal with the outlier information that causes this instability is to remove the data point from the
estimation process. If sufficient physical reason exists for removal, we can simply remove the data point or points and
re- estimate the variogram. In the absence of a satisfactory reason, it is hard to justify removal of a particular data point
or points for mere mathematical convenience. If a particular data point is eliminated, valuable information could be lost
that might be hard to find otherwise. Specifically, when the sample data set shows several-orders-of-magnitude varia-
tions, it is hard to eliminate only certain data points.
A better way to deal with these variations in the sample data is to use some type of nonlinear transformation to
minimize the variation. In this section, we discuss many of the commonly used transforms to minimize the effect of
outliers or extreme values. Note, however, that use of nonlinear transforms may create additional difficulties during the
estimation process. Chap. 4 discusses these difficulties.
Log Transform. The most commonly used transform is to use the logarithm of the sample value. By taking either
natural (base e) or base 10 logs, the order-of-magnitude variations arc translated into variations in the integer part of the
log of the variable. This should minimize the effect of extreme and or- der-of-magnitude variations within the data points.

Field Example 3.4. This field example examines the effect of a log transform on the variogram estimation for two
variables. The porosity for Flow Unit 3 is not used in this example because the porosity data do not show significant
enough variations to necessitate use of a nonlinear transformation. Instead, we use initial-potential (IP) data, collected
from several wells within the field, as the variable. We also use net kh as another variable. Net-Wi value in each well is
determined by adding core permeabilities (collected at I -ft intervals) over the entire pay-zone interval. This value
reflects the contribution from all the flow units at a particular well location.
Fig. 3.12 shows the spatial locations for IP data and the associated IP values at each location. Fig. 3.13 shows net-
Wi values at the same locations. We have a total of 48 values of both IP and net-kh data. Fig. 3.14 shows the
histograms for the IP and net-kh data. Both variables show a significant number of values at a lower range and a long
tailing. The variation in the values is over several orders of magnitude. The IP data range from 10 B/D to as high as
2,800 B/D, and the net-A/i data range from 7.0 to 36,347.0 md-ft. The coefficient of variation for the IP data is 0.93, and
the coefficient of variation for the nct-A/i data is 1.33. If we had considered individual foot-by- foot permeability data
instead of nel-kli data, we would have observed a much higher coefficient of variation. However, "averaging over the
entire pay zone interval reduces the coefficient of variation substantially.
This field example illustrates the application of a log transform for both IP and net-A7i data, with attention to only iso-
tropic variograms. Similar results can be obtained for anisotropic variograms as well. An average lag interval of 1,400 ft
with a distance tolerance of 700 ft is assumed. These values are the same as those used in Field Examples 3.2 and 3.3.
Because the data are collected from the same field with approximately the same density, we can assume that the lag
interval and the lag tolerance do not change significantly.
Fig. 3.15 compares the conventional variogram and the va- riogram of the log-transformed data. The conventional
variogram, especially at large lag distances, shows significant variations. In contrast, the variations exhibited by the
transformed data are small. The only exception is at the largest lag distance. The changes in the variogram of the
transformed variable are much more gradual than in the conventional variogram. Clearly, the log transform has
minimized the fluctuations in the estimated variogram values.
This effect is even more pronounced for the net-A7i data, which exhibit a higher coefficient of variation than the IP
data. As Fig. 3.16 shows, the conventional variogram hardly shows any discernible spatial structure, fluctuations domi -
nate the variations, and it is hard to capture any gradual trend in the data. In contrast, the log-transformed variable
shows a nicely developing spatial structure. Starting with a very small value, the estimated variogram increases and
reaches a sill value at approximately 7,000 ft. Beyond that, the variogram is fairly constant. As before, the only
exception to this gradual trend is the estimated value at a lag distance of 14,000 ft. Otherwise, the log transform clearly
has helped to identify the variogram structure for the net-A/i data.
Overall, for both the IP and net-AVi data, the log transform has resulted in better identification of the spatial structure.
It is safe to state that the higher the variations are in the original data set, the greater the impact of the log transform on
the estimated variogram. As stated earlier, if the goal is to capture the spatial structure that is exhibited by the sample
data, the log transform may be a useful tool, especially for the data showing order-of-magnitude variations.