You are on page 1of 6

Estimation of Error Rates by Neans of Simulated

Bootstrap Distributions

K.-D. WERNECHE
and G. KALB

The Humboldt-University of Berlin, GDR


Charith Eye Clinic

Summary
The bootstrap error estimation method is investigated in comparison with the known n-method and
with a combined error estimation suggested by us using simulated and normally distributed “popu-
lations” in 15 and 30 characters, respectively. For small sample sim (below the double to three-
fold number of characters per class) the estimates resulting from the bootstrap method are on the
average too small and can no longer be. accepted. Significantly better results (with an essentially
lower calculation expenditure) are obtained for the n-method and the combined estimation. The
variability is essentially the same for all the t h p e methods.
This applies both in the 0888 of rather badly mparated and in the caw of very well separated
populations.
A bootatrap estimation modified by us also gives unsatisfactory resulte.
Key w&:Bootstrap method ; Diacriminance analysia; Modified bootstrapping;

Error estimation ; n-method ;Combined eatimation ;Simulation.

1. Introduction

In recent years many applications of the so-called Jackknife- or Crose-valida-


tion-methods have become known.
HABTIQAN ( 197 1) applied for the first time a variant of the jackknifing,the so-called
bootstrapping, to the determination of error rates for classification problems.
The bootstrap error estimation modified by E ~ (1979)
N in the cam of two
classes, compared to other error estimations, offers the bmic advantage that the
method includes a determination of the variability of the respective estimation.
However, the bootstrap error r a h , especially in the cme of small samplee, are,
according to our experience, generally too low, so the present paper checks this
error eatimation method and compares it with other known methods.

2. Efron’s boot.strapestimation

h t n observation vectors zij=(zij,,zij2, ...,zih)’ fmm K classes &(i=


l(l)K,
X
j= 1 ( l ) n ,n = 2: n,)be given, with each of the K aamplee be taken from a popula-
iP1
tion having a probability density &z-
). Based on this training sample (2)”
18.
288 K.-D. WEBNBCKE,
G. KaLB

= ki,z2,..., 5 )a classifier is to be constructed which divides the given feature


space R = Rp into K disjunct mbi18ts Ri,R2,..., 2 ., Required is the distribution
of the difference D=D[(f&), f&), ...,f 9 (g)), (Rt, R2,..., R,)] =F(R, f ) - S ( R ) .
Ir
F(R, f ) = I:P(&)r,i J f&) dz_ is the actual error defined in the caw of BAY= classification,
i,j-1 2A J
P(Ki) prior probabilities for allocation into the claw Ki, rij losses caused by mkclassification of an
E
element z_E&into a classKj; i * j , 2 p(g,)=lVi(~ddzstandaforfi(zi,22,-., z p ) dzpdzp-1
i*i
... &,). S ( R ) is the emor estimation according to the resubstitutionmethod.
According to E ~ B O(1979)
N we generate new samples (bootstrap samples) by
drawing randomly one observation E from the given Bet {z} and repeating this
sampling n times (drawing with replacement). Thus we obtain a total of K boots-
trap samples or a common sample {.*}.
A bootstrap estimation for li is now defined to
n n n
D* =D* [ ( f l ( z )f2(2_),
, ...,fg(~_)),
( f i r , Bf,...,as)]
=P(B*, 1)-S*(R)
K E
=( 2 mij- 2 nt$)/n
i.f =
! ij=l
*+? i* f
where mij(m$)is the number of sample elements zij(_z$) from the class K i , which
had by mistake been classified into rmme class K j (i+j) using the classifier gen-
erated from the bootatrap sample {.*}.
A repeated independent simulation of bootatrap samples provides a sequence
of independent realizations of bootstrap differences d:(Z= l(1)N) which can be
ueed for approximating the distribution of D*.
Accordingly
r N i N

are used aa estimations for the expected value and the variance of D*.
According to the definition of D one obtains a biaa-corrected eatimation for
n
p(R,f ) f r o m ~ * ( Bf ,) = ~ ( ~ ) + d * .
E ~ (1979)
N states that this corrected estimation doesn’t w m worth making
n
because’ the variability of the estimation S(R)with respect to the correction d* is
too high. We can confirm this but will come back to this matter later.
MCLACELAN (1980) haa investigated the efficiency of the bootstrap estimation
by means of simulation. LAUTEB(1985) proved the aaymptotic efficiency of the
bootstrap estimator aa compared to the R-and U-method.

3. Modified bootstrap estimation


In using the linear discriminance analysis aa a claasfication rule, we aaaume the
probability densities ti@)to be normal, 80 it aeems t o suggest itself to utilize this
assumption also for error estimation.
Simulated Bootatrep Dietribntions 289

From the given sample elements gij we generate, by simulation, new samples
(bootstrap samples) in the following way :

where mi€{ l , 2 , ..., ni} was taken at random and Si describes the diagonal matrix
from the standard deviations of the i t h class, zij is a p-dimensional normally
distributed random vector with the expected value 0 and the variance veotor 1
and with the correlations given by the sample, 2 a fixed vector with the compo-
nents c l = l / f i ( ~ = l ( l ) p ) .
XY thus has the expected value gi and the variance &. With these simulated
-81

samples the bootstrap method is carried out as described above.

4. Valuation of error estimations by means of Monte-Carlo-simulations

Based on the linear discriminance analysis as a classification rule, i.e. under the
condition of normal distribution for the probability densities f&), we want to
check the bootstrap estimations indicated above and to compare them with the
known n-method (estimation S(n) and with a combined error estimation S ( K )
suggested by us (cf. WRRNECKE and RUB, 1983).
Using real data (i.e. the mean and variance vectors as well as feature correla-
tions are given by a sample) we generate p-dimensional normally distributed
3-class random samples of certain size and regard them as “populations” from
which we draw, in a random process, 3-clam samples of a specified size (cf. WER-
NECKE, 1983).
For each sample we determine the allocation rule indicated, estimate the &880-
ciated classification error according t o the methods to be compared, and finally
classify all objects of the “population” into the given classes. This operation is
repeated according to a certain repetition rate.
The allocation of the “population”-vectors with the sample classifier directly
simulates t h e problem of determining the actual error rate P(R,f ) ; and we call
the error which regults &B the quotient of the wrongly allocated objects by the total
number of random vectors “allocation error”.
An error estimation method is to be regarded as a good one when the difference
between the sample error eatimation and the allocation error is as small as poesible,
since the most important component of each quality criterion is the valuation of
the actual error rate (VICTOR,1976).

4.1. Repetition rate

we determined the repetition rate according to RASCHet al. (1981) using a one-
sided confidence interval.
Since the estimation of the error rate is only important in comparison wit.h the
290 ’ G., KALB
K.-D. W E B ~ O K E

allocation error, we estimate the expected half width of this interval according to
the respective difference between the allocation error and the bootstrap estima-
tion from pilot studies.

As “populations” we generated normally distributed random samples divided


into three classes from 3 X 3 000 =9 000 random vectors, each in 15 and 30 features,
respectively. From those we draw randomly 3-class samples of the sizes 3 X 15,
3 ~ 3 ( p0= 15)and 3 ~ 3 ( p0=30).
Conditioned by theee sample sizes and pilot estimations the approximate mpe-
tition rates of 100 (p=15,3X15), 200 (p=30, 3 x 3 0 ) and 300 (p=15,3 x 3 0 )
remlted.
The following tables show the results for the respective samples sizes (the
number N of the bootstrap simulations waa fixed via the absolute deviation of the
standard deviations 8* according to I 8% -8%- I -= 0.2, which generally resulted in
10 s N < 2 0 . T h e values d* andF(R*, f ) were calculated using this N):
Below a certain sample size (which can be expected to be 2 to 3 times the number
A
of featurea per class) the bias-corrected bootstrap estimation S ( R )+ d* gives too
Table 1
Drawingof tampleaof thesize3~16frornthe“populationY’3x3000,p=15;
100 repetitions each (5:arithmetic mean, s: standard deviation, w : coefficient
of variation, e: related error with respect to the allocation error)
5

sG)+a* 21.56 5.31 24.40 28.96


-
a* 13.36 2.41 18.04
4.13 49.17 72.58
3.20 19.50 46.43
S(4 31.19 7.73 24.78 1.83
S(K) 128.64 7.03 24.55 6.50
-
~

alloc. error 30,63 3,53 11$2


Table 2
Drawing samples of the size 3 x30 from the “population” 3 x3000,
p=15; 300 repetitions each (2, 6 , u, e according to Table 1)
The reepective error retee for the “populations” 3x3000 and 15
characters mere S ( R )=20.09, S(n)=20.19 and S ( K )=20.17.

X S v co:03

22.67 4.45 19.63 12.57


8.91 1.69 18.97 -
13.76 3.36 24.42 46.93
17.90 3.21 17.93 30.97
26.13 4.96 18.98 0.77
24.49 4.57 18.66 5.55
26.93 1.84 7.10 -

,
Simulstd Bodatrep Dietriutions 29 1

Table 3
Drswing samplee of the aize 3x30 from the “populstion” 3x3000,
p=30; 200 repetitions esch (Z,8, u, e sccording to Table 1)
For 30 charactem the vslues S(B)=8.82, S(n)=9.09 and S(g)=9.06
resulted ss error rstea of the “populstiona” 3 ~3000.

sG)+a* 12.25 3.43 27.95 29.60


a* 9.44 1.92 20.34 -
2.83 2.05 72.44 83.76
~ 10.49 2.33 22.21 39.82
19.16 5.21 27.19 9.93
17.32 4.64 26.79 0.63
17.43 2.51 14.40 -
optimistic values, wich are essentially lower than allocation error and cannot be
accepted.
The often quoted low variability of this estimator has to be relativized in com-
A
parison with the methods S(n) and S ( R ) under our conditions; and it is of no
importance since different variances must be compared via the coefficient of
variation being comparable for the three estimators mentioned. The mite of the
n
estimators S ( R ) and P(R*, j ) are not suprising and a m undisputable for such
sample sizes.

4.3. conclusions

Since, especially for samples, i t is desirable to have a safe indicator and the
bootstrap estimation can only be expected to achieve results comparable to the
estimations S(n) and S ( K ) for adequately large samples at an eesentiallp higher
calculation expenditure, its application in practice eeems to be questionable.
Furthermore, a basic disadvantage of the bootstrap method is the fact that it
can practically not be applied for the determination of conditioned error estima-
tions (for this it would be necessary to carry out a full bootstrap estimation for
each conditioned error rate, which in cam of multiclaes problems would lead to
unjustified expenditures).
, The variant of “bootstrapping” suggested by us is actually always higher in the
c m investigated here than the simple bootstrap method (about 5 yo),however,
in cam of smaller sample sizes it is considerably lower than the n-method and the
combined estimation and can therefore not be recommended either.
Our statements apply both to rather badly separated “populations” ( p = 15)
and to well separated populations” ( p=30).

.Zwammenfassung
Anhsnd simuliertsr und normelverteilter ,,Grundgeasmtheiten“in 15 bzw. 30 Merkmelen wird die
Bootatrep-Schiitznng im Vergleich mit der bekannten n-niethode und einer von uns vorgeschla-
genen kombinierten Schiitzung untemucht.
292 0. KALB
K.-D. WEJ~NEC~E.

Fur kleine Stichprobenumfiange (nnterhalb der doppelten bie dreifachen Merkmalezahl pro
Klaaae) folgen am der BootstrapSchiifinng im Mittel zu kleine Schatzwerte, die nicht mehr zu
ekzeptieren aind. Weit bessere Ergebnisee (bei w-ntlich geringerem Rechenaufwand) werden fiir
n-Methode und kombinierte SohEtzung mhalten.
Die .Variabilitiitaller drei Verfahren fiillt im weaentlichen gleich groS aua.
Daa trifft mwohl im Fall eher achlecht getrennter ale auch bei sehr gut getrenntenPopulationen
zu.
Eine von une modifizierte Bootstrap-Sohatzung fiihrt ebenfalle zu unbefriedigenden Readtaten.

Refevewee

BAPTIST,R., 1977: Simulation multivariabr Stichproben in FORTRAN. EDV Med.u. Biol. 1


36-38.
EFBON, B., 1979: Bootatrap-Methods: Another Look at the Jackknife. Ann. Statist. 1-26.
HARTIQAN, J. A., 1971: Error analyeia by replaced samples. J. Roy. Statist. SOC.,B 88,98-110.
L~UTEB, H., 1986: An efficient estimator for the error rate in diecriminance analysh. Math. Op.
Forach. Statiat. Ser. Statist., No. 1.
MCLACEUN,G.J., 1980: The efficiency of Efron’e “Bootatrap” approach applied to error eatima-
tion in diecriminant analyak. J. Statist. Comp. Simul. ll.273-279.
RASCH, D.; HEBBENDOBBEB, (2.; BOOK,J.: Bnsca, K., 1981:Verfahrenabibliothek - Veranchapla-
nung und -auawertung. VEB Deutacher Landwirtaohaftaverlag, Berlin.
VICTOR,N., 1976: Probleme der Aurrwshl geeigneter Znordnungeregeln bei unvollefiindiger Infor-
mation, insbesondere fiir kategoriale Daten. Biometrica 8 t , 671-686.
WERNECICE, K.-D.. 1983: Fehlerachiitzung nnd Merkmalseelektion bei Bayea’echen Klaaaifiia-
toren. Math. Dieeertation B, BaUe/S.
WEBNECKE, I(.-D. and G.KALB,1983: Further reaulte in eatimating the claaaification error in
discriminance analyak. Biom. J. %, 24’7-258.

Mmmcript m i r e d : Bept. 2, 1986


Authors’ addreaa:
Dr. K.-D. WESNEC~S
Bereich Medizin (Charit6)
der Humboldt-Univerait zu Berlin
Augenklinik
SchumaMafrsSe 20/21
Berlin, 1040