You are on page 1of 14

Application of the bootstrap method

in the site characterization program

Z. Sebestyén

Institute of Mathematics and Informatics, Szent István University


Páter K. u. 1, H-2103 Gödöllő, Hungary
Email: sebi@mszi.gau.hu

Abstract. In this paper we will demonstrate the use and efficiency of the bootstrap on the
example of the Boda Claystone Formation. The tools of the classical statistics are often not
applicable because they strongly depend on some conditions that are not fulfilled. Explicit
mathematical formulas for standard errors and confidence intervals with respect to a
parameter either require some specific (generally normal) distribution, or they do not exist
at all. Hypothesis tests may also be carried out only if some conditions are satisfied. Using
the bootstrap method one can simulate the unknown distribution of an arbitrary statistic by
its bootstrap replicates, hence any characteristics (standard error, confidence intervals, test
significance levels) can be obtained through direct empirical calculations. We apply the
bootstrap for the chemical composition data of rock samples from the BCF. Firstly we
investigate the distribution of 8 chemical components in a rock sample group of few
elements, computing standard errors and confidence intervals for the mean, the standard
deviation and the skewness of these distributions. Then two groups of rock samples from
different sampling regions are compared by some hypothesis tests.

Introduction

Traditional statistical methods for the chemical composition

The results of the classical statistical evaluation concerning the chemical composition
of the BCF was presented in the previous section. Firstly we give an outline of the data
system and the applied methods in order that this paper be comprehensible in itself.
The program has provided data of chemical compositions of 202 rock samples, 168 of
which originated from the BCF, the others from the boundary zones. (Henceforth we will
deal only with the BCF samples.) Concentrations of 14 chemical components (SiO2, Al2O3,
TiO2, FeO, Fe2O3, FeOtot, MgO, CaO, Na2O, K2O, MnO, P2O5, CO2 and ignition loss) were
measured by X-ray fluorescent method. Most of the samples can be classified into five
main rock types. The majority (102 samples) belongs to albitic claystones. 30 of the

51
samples are albitic siltstones, 13 of them are siltstones, 7 of them are albitic dolostones and
6 of them are albitites. The samples constitute four groups according to sampling regions.
There are 66 samples from the Alpha gallery (AG) driven at 1050m depth from the surface,
33 from surface deep boreholes (SD), 35 from surface shallow boreholes (SS) and 34 from
surface excavations (SE).
One of the aims of the research program was the investigation of the geological
homogeneity of the formation. Various classical statistical methods have been applied to
the chemical composition data, including univariate and multivariate analysis. The
calculations have been carried out using the SPSS package. At first the basic univariate
statistics were calculated (such as mean, standard deviation, standard error of the mean,
confidence intervals for the mean, skewness and density histogram) for the whole
formation and for each rock type. An essential point was the comparison of the regional
groups. The chemical composition of samples from the SE and SS groups differs from
those from the AG and SD groups in some measure, that is naturally due to the weathering,
so it seemed reasonable to compare only the AG and the SD groups. For this purpose the
nonparametric Mann-Whitney and Kolmogorov-Smirnov tests have been applied to
compare the probability distributions of the concentrations of the main chemical
components. The parametric (t- and F-) tests could not be applied because of the lack of
normality.

The necessity of applying bootstrap

Concerning the potential role of the BCF, it is particularly important to draw reliable
statistical inferences based on the available data. Suppose we have a sample of size n from
an unknown probability distribution. We estimate a parameter of the distribution from the
sample values by a suitable estimator. The reliability of an estimation can be characterized
by its bias, standard error and confidence intervals for the parameter of interest. Generally
we have to estimate these characteristics based on the sample values, too.
The few tools classical statistics gives us mainly concern the sample mean, the
estimator of the expectation of a distribution. It is well known that the standard error of the
mean can be estimated by

52

, (1)
n
where * denotes the sample standard deviation. If the distribution is normal, ]0,1[ is a
given number, a 100(1)% level confidence interval for the expectation is

   
 x  t , x  t , (2)
 n n

where x is the sample mean and t is the critical value from the Studentian distribution
(with n1 degrees of freedom).
Unfortunately there are virtually no mathematical formulas of standard error for any
other estimator, except perhaps for special types of distributions. Formula (2) yields only if
the probability distribution is normal. Though the interval estimate (2) is robust to non-
normality if the sample size is large enough (greater than 30), one can make considerable
error in case of small sample sizes. Explicit formulas of confidence intervals for other
parameters exist only in some special cases. The basic deficiency of classical statistics is
the fact that the applicability of its tools strongly depends on some conditions with respect
to the probability distribution. Most of the hypothesis tests also require these assumptions,
mostly the normality.
These deficiencies are especially serious when applying the traditional methods for
a geological problem. It is usually expensive to produce the sufficient number of rock
samples and analyze them in laboratories, so the sample size is often less than 30.
Secondly, the probability distributions that occur in geology are often non-normal. These
difficulties can be avoided by the bootstrap. It is a computer-based method, which provides
reliable statistical inferences also for small sample sizes, irrespective of the distribution
type. The bootstrap method and its basic ideas were first published by Efron (1979). The
monographs of Efron and Tibshirani (1993), Davison and Hinkley (1997) discuss many
tools of the bootstrap and their applications.

53
A brief review of the bootstrap method

Suppose we want to estimate a parameter of an unknown probability distribution from


a sample x = (x1, x2, …, xn) . For this purpose we calculate the estimate s(x), where s is
some real valued function of n real variables. Clearly, s(x) is a random variable, whose
distribution is unknown, but it can be approximately represented by bootstrap simulations.
Draw a sample of n elements with replacement from the original sample (x1, x2, …, xn).
Thus we have a new sample x* of size n, for which we can also apply the same function s,
obtaining a bootstrap replicate s(x*) of the estimate s(x). This resampling can easily be
carried out with a simple computer program, which only requires a good quality random
number generator. Running the simulations N times, we get N replicates s(x1*),…, s(xN*).
It follows from the law of large numbers that the empirical density histogram of the
replicates gives a good approximation of the (theoretical) density function of the estimator,
if N is large enough (more than 1000) and n  10 . (For less than about 10 sample values
neither the traditional, nor the bootstrap method can give competent results.)

Estimation of standard error and confidence intervals

Since the distribution of s(x) has been reproduced by the replicates, suitable estimates
of standard error and confidence intervals can also be obtained from them (Efron 1981).
The bootstrap estimate of standard error of s(x) is the standard deviation of the replicates,
i. e.,
1
N 2
2
seboot =    s ( x i )  s (.)  /( N  1)  , (3)
 i 1   

where s(.)  i 1 s( xi ) / N . For N = 200 simulations (3) gives fairly good estimate of
N

standard error. When s(x) is the sample mean, one can compare the classical and the
bootstrap computations. We remark that seboot is slightly biased, so it is worthwhile in this
case to use the corrected bootstrap standard error, which equals n /( n  1) seboot, for the

comparison to (1).

54
The statistic s(.), the mean of the replicates, approximately equals the parameter
estimate for unbiased estimators. For a biased estimator the difference between s(.) and the
parameter estimate gives an estimate of bias.
Bootstrap provides different kinds of confidence intervals. The percentile interval is
the simplest of them and was firstly proposed by Efron (1979). Since the replications
approximately represent the true distribution of the estimator, it seems evident for a
100(1)% confidence interval to take the 100/2- and the 100(1/2)-percentile values of
the replications as the left and right endpoints, respectively. In our study we used 95% level
of confidence and computed 1000 replicates. That means the endpoints of the percentile
interval are the 25. and the 975. value in the increasing order of the replicates.
The BCa interval is more complicated, but more accurate than the percentile interval
(Efron 1987). BCa is an abbreviation of “bias-corrected and accelerated”. This method
automatically corrects the bias of the percentile interval, if the estimator is biased. The term
“accelerated” means greater accuracy of this method with the same number of simulations,
than that of the percentile interval. The BCa method is based on some percentile values of
the replicates, too, but they are different from those in the percentile method, depending on
the bias correction and the acceleration constants (Efron and Tibshirani 1993, p. 184-188).
As we shall see later, there is considerable difference between the two kinds of intervals,
when the estimator is biased, or the distribution is not symmetric. Unlike the standard error
estimation, construction of confidence intervals requires at least 1000 simulations.
We remark that the bootstrap has been scarcely applied in geology so far. Caers et al.
(1999) estimated the standard error and the bias of the mean square error and also
constructed percentile confidence interval for it. We will apply these tools of the bootstrap
for the chemical components of the 13 siltstone samples from the BCF.

Hypothesis testing with the bootstrap

The classical two-sample t-test and the F-test are often not applicable because the
normality assumption is not fulfilled. (Moreover, the t-test also requires the equality of
standard deviations.) On the contrary, their bootstrap correspondents do not depend on the
type of distribution and use the same method for the comparison of arbitrary parameters

55
(Efron and Tibshirani 1993, Chapter 16). Let x = (x1, x2, …, xn) and y = (y1, y2, …, ym) be
samples from the independent random variables X and Y, respectively, sn(x) and sm(y) be
the estimates of the same kind of parameters X and Y of the distribution of X and Y (for
example both of them are the sample mean). Suppose, for instance, sn(x)>sm(y). Our goal is
to check the null-hypothesis X =Y on 100(1)% significance level against the one-sided
alternative hypothesis X >Y. We produce N bootstrap replications of the statistic
sn(x)sm(y), each of them by drawing a sample x* of size n from x and a sample y* of size
m from y, each of them with replacement, then computing sn(x*)sm(y*). The achieved
significance level of the test is defined by the proportion of negative replications, i. e.,

ASL :
 
# i   1,..., N  : s n ( x i )  s m ( y i )  0
,
N
where #A denotes the cardinality of a finite set A. We accept the null-hypothesis, if ASL  .
That means there is not enough evidence for X >Y. The case sn(x)<sm(y) can be treated
similarly. In the next section we shall apply this method instead of the t- and F-tests for
comparing expectations and standard deviations of chemical components in the AG and SD
groups.
In order to decide whether some random variables have the same distribution or not,
we usually apply a homogeneity test. We compute x  y , the difference of the sample
means from the independent samples x = (x1, x2, …, xn) and y = (y1, y2, …, ym). This time
we produce a bootstrap replicate of this statistic by drawing a sample x* of size n and
independently a sample y* of size m, both of them from the unified sample (x1, x2, …, xn,
y1, y2, …, ym), then calculating the difference of their means. Suppose x  y . (The other
case can be treated similarly.) The achieved significance level of the test is the proportion

of the replicates greater than or equal to x  y , i. e.,

ASL :

# i  1,..., N : x i  y i  x  y
.

N
Let ]0,1[. The hypothesis that X and Y have the same distribution is acceptable on 1
significance level, if ASL   . (ASL   means the occurrence of “great” differences

56
between the sample means is unlikely, having assumed the identity of the distributions.) In
the next section we will apply this test, together with the classical Mann-Whitney and
Kolmogorov-Smirnov tests, to compare the chemical compositions of the AG and SD
groups.

Numerical results and discussion

Standard error estimations with respect to chemical components of the siltstones

As we have mentioned, bootstrap is a useful method when we have only few data, if a
mathematical formula is not applicable, or there is no suitable formula at all. We will apply
bootstrap for the siltstones, a small subset of the rock samples, having only 13 elements.
We chose 8 chemical components for demonstration and always made N=1000 bootstrap
simulations. Figure 1 shows the histogram of bootstrap replicates of the mean for the
concentration % of the component SiO2. Firstly we have calculated the descriptive
statistics, such as mean, standard deviation (both expressed in concentration %), skewness,
the Student confidence interval for the expectation and the standard error of the mean
(Table 1). The SPSS also calculates in some way a standard error of the skewness, which
depends only on the sample size, hence gives the same value 0.62 for all variables. We will
prove by the bootstrap that this calculation cannot be correct.
160

140

120

100

80

60

40
Frequency

20

0
57

57

58

58

59

59

60

60

61

61

62

62
.2

.7

.2

.7

.2

.7

.2

.7

.2

.7

.2

.7
5

Mean of SiO2 concentration (%)

Figure 1. Histogram of 1000 bootstrap replicates of the mean


SiO2 concentration (siltstone, 13 rock samples)

57
Table 2 contains the bootstrap standard error and the corrected bootstrap standard error
of the mean. We can see at once that the latter strongly agrees with the classical standard
error in Table 1. Based on this fact we can rely upon the results of the bootstrap estimate of
standard error for other statistics (standard deviation, skewness) as well, for which there is
no possibility to compare with mathematical formulas. The mean of the replicates nearly
equals the sample mean, due to its unbiasedness.

Statistic Std. Error

95% Confidence
Mean Interval for Mean Mean
(concen- Lower Upper Std. Skew- (concen- Skew-
tration %) Bound Bound Deviation ness tration %) ness
SIO2 59.74 57.80 61.68 3.21 .70 .89 .62
AL2O3 13.82 12.62 15.03 1.99 1.04 .55 .62
FE2O3 4.18 3.54 4.83 1.07 -.16 .30 .62
MGO .76 .22 1.30 .90 1.00 .25 .62
CAO 6.24 4.88 7.60 2.25 -.92 .62 .62
NA2O 5.50 5.04 5.96 .76 1.03 .21 .62
K2O 1.17 .77 1.57 .66 1.14 .18 .62
CO2 4.75 3.62 5.89 1.87 -1.18 .52 .62

Table 1. Descriptive statistics of the siltstone (13 rock samples)

The bootstrap standard error of the standard deviation can be found in the 2nd column
of Table 3. One can see that standard errors are relatively large (generally 15-20 percent of
the parameter estimates, see also the 4th column of Table 1). That means the sample
standard deviation has greater uncertainty than the sample mean.

Corrected
Percentile Interval BCa Interval
Mean of Bootstrap Bootstrap
replicates Std. Error Std. Error 2.5 97.5
SIO2 59.70 .88 .92 58.10 61.60 57.30 61.00
AL2O3 13.82 .52 .54 12.90 15.00 13.00 15.20
FE2O3 4.19 .27 .29 3.65 4.72 3.62 4.72
MGO .77 .23 .24 .35 1.29 .36 1.32
CAO 6.25 .58 .61 4.96 7.26 4.65 7.15
NA2O 5.49 .20 .21 5.12 5.91 5.18 5.99
K2O 1.17 .17 .18 .86 1.54 .89 1.63
CO2 4.78 .49 .51 3.72 5.66 3.33 5.50

Table 2. Bootstrap standard error and confidence intervals


for the mean (siltstones)

58
The bootstrap estimate of standard error of skewness gives different values (0.45 for
the Fe2O3, 0.87 for the Na2O, see Table 4). Notice that bigger standard errors correspond to
bigger values of skewness. The value 0.62 in Table 1 can only be considered as an average
standard error. We remark that the bootstrap standard error values are almost comparable
with the absolute values of the skewness. Thus the skewness is even more uncertain
estimator than the standard deviation.

Percentile Interval BCa Interval


Mean of Bootstrap
replicates Std. Error 2.5 97.5
SIO2 3.05 .55 1.90 4.00 2.40 4.60
AL2O3 1.84 .48 .90 2.80 1.20 3.00
FE2O3 1.01 .15 .70 1.29 .82 1.42
MGO .85 .16 .52 1.10 .65 1.21
CAO 2.11 .48 1.12 2.96 1.46 3.19
NA2O .70 .19 .35 1.06 .46 1.21
K2O .62 .13 .30 .84 .45 .89
CO2 1.75 .44 .98 2.57 1.16 2.95

Table 3. Bootstrap standard error and confidence intervals


for the standard deviation (siltstones)

Confidence intervals

The percentile and the BCa interval for the mean are contained in Table 2 and can be
compared with the traditional Student-type one in Table 1. There are some differences
between the classical and the bootstrap intervals, but these differences are not considerable
for variables having symmetric distribution. The component Fe2O3, for example, has almost

Percentile Interval BCa Interval


Mean of Bootstrap
replicates Std. Error 2.5 97.5
SIO2 .65 .52 -.33 1.75 -.26 1.92
AL2O3 .55 .88 -1.18 1.92 -.49 2.75
FE2O3 -.18 .45 -1.15 .78 -1.10 .81
MGO .94 .61 -.13 2.32 -6.4E-02 2.47
CAO -.75 .67 -2.09 .61 -2.56 .19
NA2O .53 .87 -1.15 1.92 -.34 2.51
K2O 1.08 .61 -2.6E-02 2.45 1.32E-02 2.50
CO2 -.76 .73 -2.04 .69 -2.78 -5.4E-02

Table 4. Bootstrap standard error and confidence intervals


for the skewness (siltstones)

59
the same percentile and BCa interval, and they are only slightly narrower than the Student
interval. On the other hand, those variables, whose skewness in absolute value is near to or
greater than 1, show substantial differences between the three kinds of intervals. In these
cases the bootstrap intervals, especially the BCa, are asymmetric to the parameter estimate,
and the mean is evidently situated farther from that endpoint, in which direction the
distribution is skewed. The components CaO and CO2 show negative assymetry, and their
BCa interval is substantially left to the percentile and the Student-type ones. (The BCa
method gives [3.33, 5.50] for the mean of the CO2, for example, contrary to [3.72, 5.66]
and [3.62, 5.89], which are the results of the percentile and the classical methods.) The
component K2O has a distribution with positive skewness that is why the BCa interval is
right to the other two types. In these cases the results of the BCa method has to be accepted,
since it automatically takes the asymmetry of the distribution into consideration.
We also have computed the 95% level bootstrap confidence intervals for the standard
deviation. Table 3 shows substantial differences between the two types, the endpoints of the
BCa intervals are always greater than those of the percentile ones. One reason for this
behaviour is the bias correction of the BCa method, but it can explain only some part of the
differences. Another reason is that the empirical distribution of the replicates does not give
a good approximation of the “true” distribution of the estimator, and the BCa method
corrects this error by the acceleration constant. These intervals are remarkably asymmetric
and considerably wide compared to the parameter estimate. In case of the Na2O and the
CO2 the right endpoint is nearly three times as much as the left one. This confirms the
statement, that the standard deviation estimation has much more uncertainty than the
mean, especially for small sample sizes. It follows from this fact, as we shall find out soon,
that we do not necessarily have to reject the hypothesis that the standard deviations of two
independent variables are equal, even if we measure substantial difference between the
sample standard deviations.
The two kinds of confidence intervals for the skewness are contained in Table 4. There
are considerable difference between them (except perhaps for the nearly symmetrically
distributed component Fe2O3), just like the case of the standard deviation. The width of the
BCa intervals shows that the skewness estimation is extremely uncertain: these intervals
often contain zero, even for greater skewness values, see e. g. the components CaO and

60
Na2O, with skewness around 1. This means it cannot be excluded that these variables have
symmetric distribution. We can say that skewness is the most uncertain among the three
statistics investigated in this paper. We also remark that the behavior of the skewness
estimate has not been studied by bootstrap yet.

Results of the hypothesis tests

In this section we would like to demonstrate the effectiveness of the bootstrap


hypothesis tests. Our aim is to compare the means, the standard deviations and the
distributions of the main chemical components in the AG and the SD groups. Normality
tests have shown that the majority of the chemical components does not have normal
distribution in both groups, so the classical t- and F-tests are not applicable. We apply their
bootstrap correspondents instead. The test for the equality of the expectations only requires
N bootstrap replicates of the statistic x  y , the achieved significance level is the

proportion of negative or positive replicates if x  y >0 or <0, respectively. The ASL values
can be found in Table 5. Carrying out the test on 95% significance level we obtain that the
expectations in the two regional groups can be regarded as equal for half of the chosen 8
chemical components (MgO, CaO, K2O and CO2).

Mean

SDa AG SD - AG ASL
SIO2 50.47 47.91 2.56 .043
AL2O3 15.45 16.62 -1.17 .016
FE2O3 6.25 7.77 -1.52 .000
MGO 4.11 3.95 .16 .330
CAO 5.86 5.83 .03 .543
NA2O 2.56 4.10 -1.54 .000
K2O 3.75 3.68 .07 .404
CO2 4.90 4.87 .03 .551
a. concentration %

Table 5. Bootstrap test for the equality of the means


(comparison of the regional groups SD and AG)

We have pointed out that the standard deviation usually has relatively wide confidence
interval. Thus a hypothesis test for their equality is expected to give positive answer for
more chemical components than a test for the equality of the expectations. The bootstrap

61
test can be carried out and the ASL can be computed similarly to the case of the mean. This
time we have to generate bootstrap replications of the statistic n(x)m(y), difference of
standard deviations of the samples x and y. Results of the test are contained in Table 6. The
hypothesis of the equality of the standard deviations in the AG and SD groups is acceptable
on 95% significance for all variables, even in such cases when there are great differences
between the sample standard deviations (these values are, for instance, 3.34 and 1.72 for the
component Al2O3). This means the regional groups AG and SD of the BCF can be
considered homogeneous with respect to variability.

Std. Deviation

SDa AG SD - AG ASL
SIO2 7.19 5.07 2.12 .231
AL2O3 3.34 1.72 1.62 .085
FE2O3 1.70 1.83 -.13 .310
MGO 1.87 1.46 .41 .300
CAO 4.45 2.66 1.79 .164
NA2O 1.22 1.17 .05 .498
K2O 1.33 1.24 .09 .365
CO2 5.76 3.35 2.41 .164
a. concentration %

Table 6. Bootstrap test for the equality of the standard deviations


(comparison of the regional groups SD and AG)

Finally we applied three homogeneity tests for the equality of the distributions, each of
them on 95% significance. The Mann-Whitney test and the two-sample Kolmogorov-
Smirnov test have been carried out by the SPSS. Results of these non-parametric tests can
be considered fairly reliable, since there are sufficient data available (both sample groups
have more than 30 elements). However, the Kolmogorov-Smirnov test tends to be sharper,
rejecting the homogeneity hypothesis for the Al2O3 and the CO2. The hypothesis is accepted
for other 3 components (MgO, CaO and K2O) by both tests (see Table 7) and it is also
accepted for the Al2O3 and the CO2 by the Mann-Whitney test. Table 7 also shows that the
bootstrap test accepts the hypothesis of the equality of distributions exactly for those
components, for which the equality of expectations is acceptable (see also Table 5). These
results are identical to those of the Mann-Whitney test, except for the Al2O3. It proves a

62
good accordance of the bootstrap method with the classical ones, in spite of their quite
different nature.

Difference
between
the AG Significance levels
and SD Mann- Kolmogorov- Bootstrap
means Whitney Smirnov ASL
SIO2 2.56 .000 .000 .021
AL2O3 -1.17 .071 .015 .021
FE2O3 -1.52 .000 .000 .000
MGO .16 .917 .052 .280
CAO .03 .165 .275 .433
NA2O -1.54 .000 .000 .000
K2O .07 .724 .206 .433
CO2 .03 .063 .023 .424

Table 7. Results of the homogeneity tests


(comparison of the regional groups SD and AG)

Conclusions

We have found out that the bootstrap method successfully supplements the deficiencies
of the classical statistical methods in more respects, providing the following advantages:
1. Reliable statistical inferences can be drawn for small sample sizes.
2. The standard error and a confidence interval of given level for an arbitrary statistic, as
well as the distribution of this statistic can be approximately determined.
3. The construction of confidence intervals and the application of hypothesis tests can be
carried out irrespective of the type of distribution.
4. The bootstrap revealed some errors in widespread statistical software packages (see the
standard error of skewness).
Concerning the chemical composition of the BCF, we can draw some substantial
conclusions. Firstly, the distribution of some components (especially of the K2O and the
CO2) is significantly asymmetric. Hence any calculation, which is based on the normality
assumption, is incorrect. As for the homogeneity of the formation, we obtained that half of
the chemical components have the same distribution in the surface deep boreholes and in
the exploration gallery. These regional groups have turned out to be homogeneous with
respect to variability of the chemical composition.

63
References

Caers, J., Beirlant, J. and Maes, M. A. 1999: Statistics for modeling heavy tailed
distributions in geology: Part I. Methodology  Math. Geology, vol. 31, p. 391-410.
Davison, A. and Hinkley, D. 1997: Bootstrap Methods and their Application, Cambridge
University Press, p. 582.
Efron, B. 1979: Bootstrap methods: Another look at the jackknife  Ann. Stat., vol. 7, pp.
1-26.
Efron, B. 1981: Nonparametric estimates of standard error: The jackknife, the bootstrap and
other methods  Biometrika, vol. 68, pp. 589-599.
Efron, B. 1987: Better bootstrap confidence intervals  J. Am. Stat. Assoc., vol. 82, pp.
171-200.
Efron, B. and Tibshirani, R. 1993: An Introduction to the Bootstrap, Chapman & Hall, New
York, London, p. 436.

64

You might also like