2 views

Uploaded by Al Strings

- Project Management
- Davison Full Cv
- Template Merah
- Encyclopedia of Research Design, 3 Volumes (2010) by Neil J. Salkind.pdf
- UsingStdNormalDataAnalysisKEO_1107
- Sepsis y Signos Clinicos (1)
- Linear Minimum Variance Unbiased Estimation of Individual and Population slopes in the presence of Informative Right Censoring
- Introduction to Bayesian Analysis
- 6sA03
- Cy 23610614
- Extreme Value Theory Introduction DeHaanFerreira
- Importance Sampling
- Percentage Points.pdf
- Poison Distribution
- zhangwang-cv012
- Alpha Boot
- 2286420
- Report.pdf
- Estimation
- StochasticProcess

You are on page 1of 58

It establishes a general framework for simulation-based statisti-

cal inference. In simple situations the uncertainty of an estimate

may be gauged by analytical calculations leading, for example,

to the construction of condence intervals based on an assumed

probability model for the available data. The bootstrap repla-

ces complicated and often inaccurate approximations to biases,

variances and other measures of uncertainty by computer simu-

lations.

The idea of the bootstrap:

The random sample Y1 , . . . , Yn is generated by drawing ob-

servations independently and with replacement from the un-

derlying population (with distribution function F ) For each

interval [a, b] the probability of drawing an observation in

[a, b] is given by P (Y [a, b]) = F (b) F (a).

n large: The empirical distribution of the sample values is

close to the distribution of Y in the underlying population.

The relative frequency Fn (b) Fn (a) of observations in [a, b]

converges to P (Y [a, b]) = F (b) F (a) as n .

The idea of the bootstrap consists in mimicking the data ge-

nerating process. Random sampling from the true population

is replaced by random sampling from the observed data.

This is justied by the insight that the empirical distributi-

on of the observed data is similar to the true distribution

(Fn F for n ).

Literature: Davison, A.C. and Hinkley, D.V. (2005): Bootstrap

Methods and their Applications; Cambridge University Press

Inference@LS-Kneip 11

Setup:

Original data: i.i.d. random sample Y1 , . . . , Yn ; the distribu-

tion of Yi depends on an unknown parameter (vector)

The data Y1 , . . . , Yn is used to estimate estimator

(Y1 , . . . , Yn )

We are interested in evaluating the distribution of (resp.

) in order to provide standard errors, to construct con-

dence intervals, or to perform tests of hypothesis.

The bootstrap approach:

1) Bootstrap samples: Random samples Y1 , . . . , Yn are genera-

ted by drawing drawing observations independently and with

replacement from the available sample Y1 , . . . , Yn .

2) Bootstrap estimates: (Y1 , . . . , Yn )

3) In practice: Steps 1) and 2) are repeated m times (e.g. m =

2000) m values 1 , 2 , . . . , m

distribution of .

Inference@LS-Kneip 12

1.1 Why does the bootstrap work?

ptotic arguments. Usually the bootstrap does not provide very

good approximations for extremely small sample size. It must,

however, be emphasized that in some cases bootstrap condence

intervals can be more accurate for moderate sample sizes than

condence intervals based on standard asymptotic approximati-

ons.

Data: i.i.d. random sample Y1 , . . . , Yn ; Yi {0, 1} is dicho-

tomous, P (Yi = 1) = p, P (Yi = 0) = 1 p.

The problem is to estimate p.

Let S denote the number of Yi which are equal to 1. The

maximum likelihood estimate of p is p = S/n.

Recall: np = S B(n, p)

As n the central limit theorem implies that

n(p p)

L N (0, 1)

p(1 p)

n large: the distributions of n(pp) and pp can be appro-

ximated by N (0, p(1 p)) and N (0, p(1 p)/n), respectively.

For simplicity we will write distr( n(pp)) N (0, p(1p))

as well as distr(p p) N (0, p(1 p)/n).

Bootstrap:

Random sample Y1 , . . . , Yn generated by drawing observati-

ons independently and with replacement from Yn := {Y1 , . . . , Yn }.

Let S denote the number of Yi which are equal to 1.

Inference@LS-Kneip 13

Bootstrap estimate of p: p = S /n

The distribution of p depends on the observed sample Yn :=

{Y1 , . . . , Yn }!. A dierent sample will lead to a dierent distri-

bution. The bootstrap now tries to approximate the true distri-

bution of p p by the conditional distribution of p p given

the observed sample Yn . The bootstrap is called consistent if

asymptotically (n ) the conditional distribution of p p

coincides with the true distribution of pp (note: a proper scaling

is required!)

We obtain

P (Yi = 1) = P (Yi = 1| Yn ) = p,

P (Yi = 0) = P (Yi = 0| Yn ) = 1 p

and

E (p ) = E(p | Yn ) = p,

p(1 p]

V ar (p ) = E[(p p)2 | Yn ] =

n

The conditional distribution of np = S given Yn is equal

to B(n, p). In a slight abuse of notation we will write

or

distr(np |Yn ) = B(n, p)

Inference@LS-Kneip 14

As n the centrallimit theorem implies that the (condi-

n(p p)

tional) distribution ( p(1p) |Yn ) converges (stochastically)

to a N (0, 1)-distribution. Moreover, p is a consistent estima-

tor of p and therefore p(1 p) P p(1 p) as n .

This implies that asymptotically p(1 p) may be replaced

by p(1 p), and

n(p p)

The law of ( |Yn ) converges stochastically

p(1 p)

to a N (0, 1)-distribution

More precisely, as n

( )

n(p p)

sup |P |Yn ()| P 0,

p(1 p)

where denotes the distribution function of the standard

normal distribution.

We can conclude that for large n

distr( n(p p)|Yn ) distr( n(p p)) N (0, p(1 p))

as well as

Bootstrap consistent

Inference@LS-Kneip 15

Example 2: Estimating a population mean

Let Y1 , . . . , Yn denote an i.i.d. random sample with mean

and variance 2 . In the following F will denote the corre-

sponding distribution function.

n

Y = n1 i=1 Yi is an unbiased estimator of

Problem: Construct a condence interval

2

Y N (, n )

n

Estimation of 2 : S 2 = n1

1

i=1 (Yi Y )

2

This implies: n Y S tn1 , and hence

S S

P (tn1,1 2 Y tn1,1 2 )

n n

2

N (, n ). This is necessarily true if Y is normally distributed.

If the underlying distribution is not normal, then this condition

is approximately fullled is the sample size n is suciently large

(central limit theorem). In this case the constructed condence

interval must also be seen as an approximation

condence intervals.

Inference@LS-Kneip 16

The bootstrap approach:

Random samples Y1 , . . . , Yn are generated by drawing obser-

vations independently and with replacement from the availa-

ble sample Yn := {Y1 , . . . , Yn }.

n

Y1 , . . . , Yn estimator Y = n1 i=1 Yi

Means and variances of the conditional distributions of Yi

and Y given Yn :

E (Yi ) = E(Yi |Yn ) = Y ,

1

n

V ar (Yi ) = E[(Yi Y ) | Yn ] = S :=

2 2

(Yi Y )2

n i=1

Moreover,

E (Y ) = Y ,

V ar (Y ) = S 2 /n

theorem implies that the (condi-

tional) distribution of ( n(YS Y ) |Yn ) converges (stochasti-

cally) to a N (0, 1)-distribution.. Moreover, S 2 is a consistent

estimator of 2 and therefore S 2 P 2 as n . This

implies that asymptotically S may be replaced by , and

n(Y Y )

The law of ( |Yn ) converges stochastically

to a N (0, 1)-distribution

More precisely, as n

( )

n(Y Y )

sup |P |Yn ()| P 0,

where denotes the distribution function of the standard

normal distribution.

Inference@LS-Kneip 17

We can conclude that for large n

distr( n(Y Y )|Yn ) distr( n(Y )) N (0, 2 )

as well as

Bootstrap consistent

1 :

Determine 2 and 1 2 quantiles t 2 and t1 2 of the con-

ditional distribution of Y given Yn := {Y1 , . . . , Yn } (the

bootstrap distribution):

P (Y t 2 ) , P (Y > t 2 ) 1 ,

2 2

P (Y t1 2 ) 1 , P (Y > t1 2 ) ,

2 2

Here, P denotes probabilities with respect to conditional

distribution of Y given Yn := {Y1 , . . . , Yn }.

In practice:

Draw m bootstrap samples (e.g. m = 2000) and calculate

the corresponding estimates Y1 , Y2 , . . . , Ym .

Order the resulting estimates Y(1) Y(2) Y(m) .

Set t := Y([m+1]

) and t1 := Y([m+1][1 ]) .

2 2

Inference@LS-Kneip 18

A basic bootstrap condence interval:

By construction of t 2 and t1 2 we have

P (Y Y t 2 Y ) , P (Y Y t1 2 Y ) 1 .

2 2

We have seen that the bootstrap is consistent, and therefore

distr(Y Y |Yn ) distr(Y ) asymptotically. This implies

that for large n

P (Y t 2 Y ) , P (Y t1 2 Y ) 1 ,

2 2

and therefore

( )

P Y (t1 2 Y ) Y (t 2 Y ) 1

[2Y t1 2 , 2Y t 2 ]

In the older bootstrap literature the so-called percentile in-

terval

[t 2 , t1 2 ]

is usually recommended as a 1 condence interval.

The percentile interval can easily be justied if all underlying

distributions are symmetric, distr(Y Y |Yn ) distr(Y

Y |Yn ), distr(Y ) distr( Y ).

In practice the percentile interval is usually less precise than

the standard interval discussed above; there are however so-

me bias-corrected modications of the percentile interval which

allow better approximations.

Inference@LS-Kneip 19

General Setup: The nonparametric (naive) bootstrap

of Yi depends on an unknown parameter (vector)

The data Y1 , . . . , Yn is used to estimate

estimator (Y1 , . . . , Yn )

Bootstrap: Random samples Y1 , . . . , Yn are generated by

drawing observations independently and with replacement

from the available sample Y1 , . . . , Yn Bootstrap estimates

(Y1 , . . . , Yn )

distr( |Yn ) is used to approximate distr( )

nometrical problems. Indeed, it can be shown that under some

mild regularity conditions the bootstrap is consistent, if

1) Generation of the bootstrap sample reects appropriately

the way in which the original sample has been generated

(i.i.d. sampling!).

2) The distribution of the estimator is asymptotically normal.

More precisely,

single parameter ( IR): n( ) N (0, v 2 ); v -

standard error of n( )

multivariate parameter vector ( IRd ): n( )

Nd (0, V ); V - covariance matrix of n( )

Consistent Bootstrap: distr( n( )|Yn ) distr( n(

)) [and distr( |Yn ) distr( )] if n is suciently large.

Bootstrap condence intervals, tests, etc.

Inference@LS-Kneip 110

Note:

Standard approaches to construct condence intervals and

tests are usually based on asymptotic normal approximati-

ons. For example, if IR and n( ) N (0, v 2 ) one

usually tries to determine an approximation v of v from the

data. An approximate 1 condence interval is then given

by

v v

[ z1 2 , + z1 2 ]

n n

In some cases it is very dicult to obtain approximations v of

v. Statistical inference is then usually based on the bootstrap

In contemporary statistical analysis the bootstrap is frequent-

ly used even for standard problems, where estimates v of v

are easily constructed. The reason is that in many situa-

tions bootstrap it can be shown that bootstrap condence

intervals or tests are more precise than those determined

analytically based on asymptotic formulas.

The bootstrap may fail if one of the above conditions 1) or 2) is

violated. Examples are

The naive bootstrap will not work if the i.i.d re-sample Y1 , . . . , Yn

from Y1 , . . . , Yn does not properly reect the way how the

Y1 , . . . , Yn is generated from the underlying population (e.g.

dependent data; Y1 , . . . , Yn not i.i.d.).

The distribution of the estimator is not asymptotically nor-

mal (e.g. extreme value problems)

Inference@LS-Kneip 111

General approach: Basic bootstrap 1 condence in-

terval

Random sample Yn := {Y1 , . . . , Yn }; unknown parameter (vec-

tor)

We will assume that the bootstrap is consistent: distr( |Yn )

distr( ) if n is suciently large.

Determine

2 and 1

2 quantiles t 2 and t1 2 of the condi-

tional distribution of given Yn := {Y1 , . . . , Yn } (the boot-

strap distribution):

P ( t 2 ) , P ( > t 2 ) 1 ,

2 2

P ( t1 2 ) 1 , P ( > t1 2 ) ,

2 2

Here, P denotes probabilities with respect to conditional

distribution of given Yn := {Y1 , . . . , Yn }.

Consistency of the bootstrap implies that for large n

P ( t 2 ) , P ( t1 2 ) 1 ,

2 2

and therefore

( )

P (t1 2 ) (t 2 ) 1

[2 t1 2 , 2 t 2 ]

Inference@LS-Kneip 112

Example: Bootstrap condence interval for a median

Given: i.i.d. sample Yn := {Y1 , . . . , Yn }; Yi possesses a continuous

distribution with (unknown) density f .

We are now interested in estimating the median med of the

underlying distribution. Recall that the median is dened by

P (Yi med ) = P (Yi med ) = 0.5

ordered sample Y(1) Y(2) Y(n) , med is given by

Y( n+1 ) if n is an odd number

med = 2

(Y( n ) + Y( n +1) )/2 if n is an even number

2 2

Asymptotically we obtain

1

n(med med ) L N (0, )

4f (med )2

The problem is that the density f is unknown. In principle it

may be estimated by nonparametric kernel density estimation

and a corresponding plug-in estimate f(med ) may be used to

approximate the asymptotic variance. However, the bootstrap

oers a simple alternative.

Construction of a bootstrap condence interval:

Draw i.i.d. random samples Y1 , . . . , Yn from Yn and deter-

mine the corresponding medians med

Determine 2 and 1 2 quantiles t 2 and t1 2 of the condi-

tional distribution of med given Yn := {Y1 , . . . , Yn }.

Approximate 1 (symmetric) condence interval:

[2med t1 2 , 2med t 2 ]

Inference@LS-Kneip 113

1.2 Pivot statistics and the bootstrap-t me-

thod

In many situations it is possible to get more accurate bootstrap

condence intervals by using the bootstrap-t method (one also

speaks of studentized bootstrap condence intervals). The con-

struction relies on so-called pivot statistics.

Let Y1 , . . . , Yn be an i.i.d. random sample and assume that the

distribution of Y depends on an unknown parameter (or para-

meter vector) .

A statistics Tn T (Y1 , . . . , Yn ) is called pivot statistics, if

the distribution of Tn does not depend on any unknown

parameter.

A statistics Tn T (Y1 , . . . , Yn ) is called asymptotic pivot

statistics, if for suitable sequences an , bn of real numbers

the transformed statistics an Tn + bn possesses a well-dened,

non-degenerate asymptotic distribution, which does not de-

pend on the parameters of the unknown distribution of Y .

2 > 0, and E|Y |3 = < . If Y is normally distributed we

obtain

n(Y )

tn1

S

n

i=1 (Yi Y ) , where tn1 denotes Students t-

1

with S 2 = n1 2

Tn is a pivot statistics

Even if Y is not normally distributed, the central limit theorem

implies that

n(Y )

L N (0, 1)

S

Inference@LS-Kneip 114

In this case Tn is an asymptotic pivot statistics.

Bootstrap:

i.i.d. re-sample Y1 , . . . , Yn Y1 , . . . , Yn from Yn estimators

n

Y = n1 i=1 Yi and

n

S 2 = n11

i=1 (Yi Y )

2

n large approximately

n(Y Y ) n(Y )

distr( |Yn ) distr( ) N (0, 1)

S S

or

Y Y Y

distr( |Yn ) distr( )

S S

Therefore, the (conditional) distribution of Y S Y (given Yn )

Y

can be used to approximate the distribution of S .

Determine

2 and 1 2 quantiles 2 and 1 2 of the condi-

Y Y

tional distribution of S given Yn :

Y Y

Y Y

P ( ) , P ( > ) 1 ,

S 2

2 S 2

2

Y Y Y Y

P ( 1 ) 1 , P ( > 1 ) ,

S 2

2 S 2

2

In practice:

Draw m bootstrap samples (e.g. m = 2000) and cal-

Y Y

culate the corresponding estimates Z1 : 1S , Z2 :=

1

Y2 Y

Ym Y

S2 , . . . , Zm := Sm .

Order the resulting estimates Z(1) Z(2)

Z(m) .

Inference@LS-Kneip 115

Set := Z([m+1]

) and 1 := Z([m+1][1 ]) .

2 2

Y Y

P( 2 ) , P ( > 2 ) 1 ,

S 2 S 2

Y Y

P ( 1 2 ) 1 , P ( > 1 2 ) ,

S 2 S 2

This yields the 1 condence interval

[Y 1 S, Y S]

Inference@LS-Kneip 116

General construction of a bootstrap-t interval (unknown

real values parameter IR):

Random sample Yn := {Y1 , . . . , Yn }; unknown parameter (vec-

tor) . Assume that the estimator of is asymptotically normal,

( )

n( ) L N (0, v ) n 2

L N (0, 1)

v

and that a consistent estimator v v(Y1 , . . . , Yn ) of v is availa-

ble. One might then replace v by v to obtain

n L N (0, 1)

v

Obviously, n ()

v and

v are asymptotic pivot statistics.

Based on an i.i.d. re-sample Y1 , . . . , Yn from {Y1 , . . . , Yn },

calculate Bootstrap estimates and v .

Determine

2 and 1 2 quantiles 2 and 1 2 of the condi-

tional distribution of v given Yn .

Bootstrap-t interval

[ 1 v, v]

Inference@LS-Kneip 117

1.3 The Parametric bootstrap

where the distribution of Y is known up to some parameter vec-

tors , (e.g: Y is normal with mean and variance 2 ; Y follows

an exponential distribution with parameter ). The dierence to

the nonparametric bootstrap discussed above consists in the way

how to generate a bootstrap re-sample Y1 , . . . , Yn .

Let = (1 , . . . , p ) , and for some known F let F (y, , ) denote

the distribution function of Y as a function of , . F is assumed

to be known. For simplicity, we will concentrate on constructing

a condence interval for .

The parametric bootstrap now proceeds as follows:

The unknown parameter vectors , are estimated by the

maximum likelihood method. Likelihood estimators ,

An i.i.d re-sample Y1 , . . . , Yn is generated by randomly

drawing observations from a F (, , ) distribution (using

a random number generator)

, .

The conditional distribution of given F (, , ) is used to

approximate the distribution of the estimator .

In almost all cases of practical interest condence intervals based

on the parametric bootstrap are more accurate than standard

intervals based on rst order asymptotic approximations. The

parametric bootstrap usually also provides more accurate appro-

ximations than its nonparametric counterpart discussed above.

Of course, this requires that the underlying distributional as-

sumption is satised (otherwise, the parametric bootstrap will

lead to incorrect results).

Inference@LS-Kneip 118

Basic parametric bootstrap condence interval:

[2 t1 2 , 2 t 2 ],

2 and 1 2 quantiles of the

conditional distribution of given F (, , ).

Bootstrap-t intervals:

Assume that the standard error v(, ) of n( ) can be

determined in dependence of the parameter (vectors) , .

i.i.d re-sample Y1 , . . . , Yn generated by randomly drawing

observations from a F (, , ) distribution

Parameter estimates , as well as bootstrap approxi-

mations v( , ) of the standard error.

Bootstrap-t interval

2 and 1 2 quantiles

where 2 and 1 2 now denote the

of the conditional distribution of v( , )

given F (, , ).

accurate intervals:

Determine the

2 and 1

2 quantiles 2 and 1 2 of he

conditional distribution of v(, )

given F (, , ).

Asymptotically we obtain

P ( 2 1 2 ) 1

v(, )

v(,) 1 2

Inference@LS-Kneip 119

Example: Exponential distribution

Assume that Y follows an exponential distribution with parame-

ter . Density and distribution function are then given by

1 x/

f (y, ) = e , F (y, ) = 1 ex/

We have E(Yi ) = and V ar(Yi ) = 2 . The maximum likelihood

n 2

estimator of is given by = n1 i=1 Yi , and V ar() = n .

The parametric bootstrap can then be used to construct con-

dence intervals. The following procedure is straightforward, but

there also exist alternative approaches.

An i.i.d re-sample Y1 , . . . , Yn is generated by randomly dra-

wing observations from an exponential distribution with pa-

rameter .

Y1 , . . . , Yn Estimator

Calculation of

2 and 1

2 quantiles 2 and 1 2 with

P ( | 2 ) =

2

P ( | 1 2 ) = 1

2

where P () denotes probabilities calculated with respect to

the exponential distribution with parameter .

This yields

P ( 2 1 2 ) =

Condence interval: [

, ]

1

2 2

probability of this interval is exactly equal to 1 .

Inference@LS-Kneip 120

1.4 More on Bootstrap Condence Intervals

meter (vector)

We will assume that the bootstrap is consistent: distr( |Yn )

distr( ) if n is suciently large.

In the previous sections we have already dened basic bootstrap

condence intervals as well as bootstrap-t intervals.

[2 t1 2 , 2 t 2 ],

where t 2 and t1 2 are the

2 and 1

2 quantiles of the condi-

tional distribution of given Yn .

[ 1 v, v],

where 2 and 1 2 are the

2 and 1

2 quantiles of the condi-

tional distribution of v given Yn .

[t 2 , t1 2 ]

Inference@LS-Kneip 121

The so-called BCa method allows to construct better condence

intervals. The term BCa stands for bias-corrected and accelerated.

The BCa interval of intended coverage 1 is given by

[t1 , t2 ],

where t1 and t2 are the 1 and 2 quantiles of the conditional

distribution of given Yn , and

( )

+ z 2

1 = +

1 a( + z 2 )

( )

+ z1 2

2 = + ,

1 a( + z1 2 )

where is the standard normal distribution function, and where

z is the quantile of a standard normal distribution.

Note that the BCa interval reduces to a standard percentile in-

terval if = a = 0. However, a dierent choice of and a leads

to more accurate intervals:

The value of the bias-correction can be obtained from the pro-

portion of the bootstrap replications less than the original esti-

mate

( )

1

= P [ < ]

is based on Jacknife values of the estimator : For any i = 1, . . . , n

calculate the estimate i from the sample Y1 , . . . , Yi1 , Yi+1 , . . . , Yn

1

n

with the ith observation deleted. Let = n i=1 i and deter-

mine n

i=1 ( i )

3

a = n

6[ i=1 ( i )2 ]3/2

The BCa interval is motivated by theoretical results which show

Inference@LS-Kneip 122

that it is second order accurate.

Consider generally 1 condence intervals of the form [tlow , tup ]

of . Upper and lower bounds of such intervals are determined

from the data, tlow tlow (Y1 , . . . , Yn ), tup tup (Y1 , . . . , Yn ), and

their accuracy depends on the particular procedure applied.

(Symmetric) condence intervals are said to be rst-order

accurate if there exist some constant d1 , d2 < such that

for suciently large n

d1 d2

|P ( < tlow ) | , |P ( > tup ) | .

2 n 2 n

(Symmetric) condence intervals are said to be second-order

accurate if there exist some constant d3 , d4 < such that

for suciently large n

d3 d4

|P ( < tlow ) | , |P ( > tup ) | .

2 n 2 n

If the distribution of is asymptotically normal, then under some

additional regularity conditions it can usually be shown that

Standard condence intervals based on asymptotic appro-

ximations are rst-order accurate. The same holds for the

basic bootstrap intervals [2 t1 2 , 2 t 2 ] as well as for

the classical percentile method.

Bootstrap-t intervals as well as BCa intervals are second-

order accurate.

The dierence between rst and second-order accuracy is not

just a theoretical nicety. In many practically important situati-

ons second-order accurate intervals lead to much better approxi-

mations.

Another approach for constructing condence intervals is the

Inference@LS-Kneip 123

ABC method: ABC, standing for for approximate bootstrap

condence intervals, allows to approximate the BCa interval end-

points analytically, without using any Monte Carlo replications

at all ( reduced computational costs). The procedure works by

approximating the bootstrap sampling results by Taylor expan-

sions. It is then, however, required that (Y1 , . . . , Yn ) is a

smooth function of Y1 , . . . , Yn . This is for example not true for

the sample median.

Inference@LS-Kneip 124

1.5 Subsampling: Inference for a sample maxi-

mum

Data: i.i.d. random sample Yn := {Y1 , . . . , Yn }.

We now consider the situation that the Yi only takes values in a

compact interval [0, ] such that

P (Yi [0, ]) = 1.

Furthermore, Yi possesses a density f which is continuous on [0, ]

and satises f (y) > 0 for y (0, ], and f (y) = 0 for y [0, ].

The maximum of Yi is unknown and has to be estimated from

the data.

Similar type of extreme value problems frequently arise in eco-

nometrics. An example is the analysis of production eciencies

of dierent rms. The above situation may arise if we consider

production outputs Yi of a sample of rms with identical inputs.

A rm then is ecient if its output equals the maximal possible

value . Note that in practice usually more complicated problems

have to be considered, where production outputs dependent on

individually dierent values of input variables Frontier Ana-

lysis.

Consistent estimator of :

:= max Yi

i=1,...,n

distribution of is not asymptotically normal. Indeed, it can

be shown that n( ) follows asymptotically an exponential

1

distribution with parameter = f () :

1

n( ) L Exp( )

f ()

Inference@LS-Kneip 125

The naive bootstrap fails:

i.i.d. re-sample Y1 , . . . , Yn from {Y1 , . . . , Yn } bootstrap

estimator := maxi=1,...,n Yi

Unfortunately, the bootstrap is not consistent

The reason is as follows: = Y(n) , and hence = = Y(n)

whenever Y(n) {Y1 , . . . , Yn }. Some calculations then show

that for large n

P ( = 0) = P ( = 0|Yn ) 1 e1 ,

while P ( = 0) = 0!

One can conclude that even for large sample sizes distr(

|Yn ) will be very dierent from distr( ) Basic boot-

strap condence intervals are incorrect.

A possible remedy is to use subsampling. Similar to the ordinary

bootstrap, subsampling relies on i.i.d. re-sampling from Y, and

the only dierence consists in the fact that subsampling is based

on drawing a smaller number < n of observations.

Inference@LS-Kneip 126

Subsampling bootstrap:

Choose some < n

Determine an i.i.d. re-sample Y1 , . . . , Yk by drawing ran-

domly observations from {Y1 , . . . , Yn } bootstrap esti-

mator := maxi=1,...,k Yi

For the above problem subsampling is consistent.

If = n for some 0 < < 1, then

1

to a Exp( )-distribution

f ()

( ) 1

sup |P ( ) |Yn F (; )| P 0,

f ()

1

where F (; f () ) denotes the distribution function of an ex-

1

ponential distribution with parameter = f () .

The subsampling bootstrap works under extremely general con-

ditions, and it can often be applied in situations where the ordi-

nary bootstrap fails. However, it usually does not make any sense

to apply subsampling in regular cases, where standard nonpara-

metric bootstrap is consistent. Then subsampling is less ecient,

and condence intervals based on subsampling are less accurate.

In practice, a major problem is the choice of .

Inference@LS-Kneip 127

Condence interval based on subsampling:

Calculation of

2 and 1

2 quantiles t 2 and t1 2 with

P ( t 2 ) =

2

P ( t1 2 ) = 1

2

where P () denotes probabilities calculated with respect to

the conditional distribution of given Yn .

This yields

P (t 2 ( ) t1 2 ) 1 ,

P (t 2 n( ) t1 2 ) 1 .

[ + t 2 , + t1 2 ]

n n

Inference@LS-Kneip 128

1.6 Appendix

X(n) . The distribution of Xi possesses a distribution function F

dened by

F (x) = P (Xi x)

X. The empirical distribution function is then dened by

Properties:

0 Fn (x) 1

Fn (x) = 0 if x < X(1)

F (x) = 1 if x X(n)

Fn is a monotonically increasing step function

Inference@LS-Kneip 129

Example:

x1 x2 x3 x4 x5 x6 x7 x8

5,20 4,80 5,40 4,60 6,10 5,40 5,80 5,50

1.0

0.8

0.6

0.4

0.2

0.0

4.0 4.5 5.0 5.5 6.0 6.5

Inference@LS-Kneip 130

Theoretical properties of Fn

Theorem: For every x IR we obtain

F (x). The probability distribution of Fn (x) is thus givenn by

( m) n

P Fn (x) = = F (x)m (1F (x))nm , m = 0, 1, . . . , n

n m

Consequences:

E(Fn (x)) = F (x), i.e.. Fn (x) is an unbiased estimator of

F (x)

V ar(Fn (x)) = n1 F (x)(1 F (x)) the standard error of

Fn (x) decreases as n increaases. Fn (x) is a consistent esti-

mator of F (x)).

Theorem of Glivenko-Cantelli:

( )

P lim sup |Fn (x) F (x)| = 0 =1

n

xIR

Inference@LS-Kneip 131

1.6.2 Consistency of estimators

Intuitively this means that the distribution of n must be-

come more and more concentrated around the true value as

n . The mathematical formalization of consistency relies on

general concepts quantifying convergence of random variables.

Convergence in probability:

Let X1 , X2 , . . . and X be random variables dened on a pro-

bability space (, A, P). Xn converges in probability to X

if

lim P [|Xn X| < ] = 1

n

for every > 0. One often uses the notation Xn P X

weak consistency:

An estimator is called weakly consistent if n P

Convergence in mean square:

Let X1 , X2 , . . . and X be random variables dened on a pro-

bability space (, A, P). Xn converges in mean square to

X if

( )

lim E |Xn X| = 0

2

n

Notation: Xn M SE X

mean square consistency:

is mean square consistent if n M SE .

Inference@LS-Kneip 132

Strong Convergence (Convergence with probability 1):

Let X1 , X2 , . . . and X be random variables dened on a pro-

bability space (, A, P). Xn converges with probability 1

(or almost surely) to X if

[ ]

P lim Xn = X = 1

n

Notation: Xn a.s. X

Strong consistency (consistency with probability 1):

An estimator is strongly consistent if n a.s.

Xn M SE X implies Xn P X

Xn a.s. X implies Xn P X

2

We obtain E(X) = as well as V ar(X) = n

2

M SE(X) := E((X ) ) = V ar(X) =

2

n 0

n

X P as n

N (, (0, 18)2 ) with unknown mean but known standard deviation

= 0.18.

Random sample X1 , . . . , Xn Estimator X of .

2 2

Recall: X N (, n ) = N (, 0.18

n ).

n = 144 : standard error = 0, 015, M SE(X) = 0, 000225

Inference@LS-Kneip 133

n=9: P [ 0, 1176 X + 0, 1176] = 0, 95

n = 144 : P [ 0, 0294 X + 0, 0294] = 0, 95

n=9 n=144

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0

Inference@LS-Kneip 134

1.6.3 Convergence in distribution

Let Z1 , Z2 , . . . be a sequence of random functions with distri-

bution functions F1 , F2 , . . . , and let Z be a random variable

with distribution function F . Zn konverges in distribution to

Z if

n

Notation: Zn L Z

Theorem (Ljapunov): Let X1 , X2 , . . . be a sequence of inde-

pendent random variables with means E(Xi ) = i and variances

V ar(Xi ) = E((Xi i )2 ) = i2 > 0. Furthermore assume that

E(|Xi i |3 ) = i < .

( ni=1 i )

1/3

If n

( i=1 i2 )1/2

0 as n then

n

i=1 (Xi i )

n L N (0, 1)

( i=1 i2 )1/2

N (0, 1).

Important information about the speed of convergence to a nor-

mal distribution is given by the Berry-Esen theorem:

Inference@LS-Kneip 135

Theorem (Berry-Esen): Let X1 , X2 , . . . be a sequence of i.i.d.

random variables with mean E(Xi ) = and variance V ar(Xi ) =

E((Xi i )2 ) = 2 > 0. Then, if Gn denotes the distribution

function of n(X)

,

33 E(|Xi |3 )

sup |Gn (t) (t)|

t 4 3 n1/2

used in order to quantify the speed (rate) of convergence of a

sequence of numbers.

Let 1 , 1 , 3 , . . . and 1 , 1 , 3 , . . . be a (deterministic) sequence

of numbers.

The notation n = O(1) indicates that the sequence 1 , 2 , . . .

is bounded. More precisely, there exists an M < such that

n M for all n IN.

n = o(1) means that Zn 0.

Zn = O(rn ) means that |Zn |/|rn | = O(1).

Z = o(rn ) means that |Zn |/|rn | 0.

n n

Examples: i=1 i = O(n2 ), i=1 i = o(n3 )

Stochastic order symbols OP () and oP () are used to quantify

the speed (rate) of convergence of a sequence of random varia-

bles. Let Z1 , Z2 , Z3 , . . . be a sequence of random variables, and

let r1 , r2 , . . . be either a deterministic sequence of number or a

sequence of random variables.

Inference@LS-Kneip 136

We will write Zn = Op (1) if for every > 0 there exists an

M < and an n IN such that

stochastically bounded.

We will write Zn = oP (1) if and only if Zn P 0.

Zn = OP (Vn ) means|Zn |/|Vn | = OP (1).

Zn = oP (Vn ) means that |Zn |/|Vn | P 0.

Example: X = OP (n1/2 )

Inequality of Chebychev:

1

P [|X | > k] for all k > 0

k2

1

P [ k X + k] 1

k2

k P [ k X + k]

2 1 1

4 = 0, 75

3 1 9 0, 89

1

4 1 1

16 = 0, 9375

Generalization:

E(|X |r )

P [|X | > k] for all k > 0, r = 1, 2, . . .

kr

Inference@LS-Kneip 137

Cauchy-Schwarz inequality:

Let x1 , . . . , xn and y1 , . . . , yn be arbitrary real numbers. Then

n

n n

( xi yi )2 ( x2i )( yi2 )

i=1 i=1 i=1

Integrated version:

( )2

b b b

f (x)g(x)dx ( f (x)2 dx)( g(x)2 dx)

a a a

2

(E(XY )) E(X 2 ) E(Y 2 )

Hlder inequality:

Sei p > 1 und p1 + 1q = 1

Let xi , yi 0, i = 1, . . . , n be arbitrary numbers. Then

n n

p 1/p

n

xi yi ( xi ) ( yiq )1/q

i=1 i=1 i=1

b b b

f (x)g(x)dx ( f (x)p dx)1/p ( g(x)q dx)1/q

a a a

Inference@LS-Kneip 138

2 Bootstrap and Regression Models

Problem: Analyze the inuence of some explanatory (indepen-

dent) variables X1 , X2 , . . . , Xp on a response variable (or de-

pendent variable) Y .

Observations

(Y1 , X11 , . . . , X1p ), (Y2 , X21 , . . . , X2p ), . . . , (Yn , Xn1 , . . . , Xnp )

Model

[ ]

i N (0, )

2

by the model,

= E(Y |X1 = Xi1 , . . . , Xp = Xip ),

variate normal random vector.

Inference@LS-Kneip 21

Remark: Regression analysis is usually a conditional analysis.

The goal is to estimate the regression function m which is the con-

ditional expectation of Y given X1 , . . . , Xp . Standard inference

studies the behavior of estimators conditional on the observed

values.

However, dierent types of bootstrap may be used depending on

how the data is generated.

1) Random design:

(Y1 , X11 , . . . , X1p ), (Y2 , X21 , . . . , X2p ), . . . , (Yn , Xn1 , . . . , Xnp )

is a sample of i.i.d. random vectors, i.e. observations are in-

dependent and identically distributed.

Example: p + 1 measurements from n individuals randomly

drawn from an underlying population.

2) (Xj1 , . . . , Xjp ), j = 1, . . . , p, random vectors which are, ho-

wever, not independent or not identically distributed (e.g.

time series data, the X-variables are observed in successive

time periods).

3) Fixed design: Data are collected at are pre-specied, non-

random values Xjk (corresponding for example to dierent

experimental conditions).

Inference@LS-Kneip 22

The model can be rewritten in matrix notation:

Y =X+

E() = 0, Cov() = 2 In ,

[ Nn (0, 2 In )]

X X12 X1p

Y 11

1

. X21 X22 X2p

with Y = .. , X= .. ..

..

. . .

Yn

Xn1 Xn2 Xnp

0 1

1 2

=

.. , =

..

. .

p n

least squares:

Least squares method: Determine 0 , 1 , . . . , p by minim-

zing

n

Q(0 , . . . , p ) = (Yi Yi )2

i=1

n

= (Yi 0 1 Xi1 . . . p Xip )2

i=1

Inference@LS-Kneip 23

Let E and Cov denote conditional expectation and covariances

given the observed X-values.

Properties of

1. is an unbiased estimator of

E (0 )

0

. .

E () = .. = .. =

E (p ) p

2. Covariance matrix:

= [XT X]1 XT Cov(Y )X[XT X]1

= 2 [XT X]1 XT X[XT X]1

= 2 [XT X]1

If i N (0, i2 ) then Nn (0, 2 In ), and consequently

( )

Np+1 , 2 [XT X]1

4. Asymptotic distribution: Assume that n1 i Xij Xik cjk as

well as n1 i Xij c0k as n Note that cjk = E(Xj Xk )

and c0j = E(Xj ) in the case of random design. Furthermore,

Let C denote the (p + 1) (p + 1) matrix with elements cjk,

j, k = 0, . . . , p, c00 = 1, cj0 = c0j , and assume that C is of

full rank. Then

( 2 1

)

n( ) Np+1 0, C

Inference@LS-Kneip 24

Estimation of 2 :

p

The residuals i = Yi Yi = Yi 0 j Xij estimate

j=1

the error term i

Estimator 2 of 2 :

1 n

2

= (Yi Yi )2

n p 1 i=1

2 is an unbiased estimator 2

If the true error terms i are normally distributed, then (n

2

p 1) 2 2np1

Let ij , i, j = 1, . . . , p + 1 denote the elements of the matrix

= [XT X]1 . Then, for normal errors,

j j

tnp1

jj

mates.

j j

Note: Under the normality assumption, is a Pivot stati-

jj

stics. In the general case (under some weak regularity conditions),

this quantity is an asymptotic Pivot statistics. jj /n converges

to the j th diagonal element of the matrix C, and therefore

j j

L N (0, 1) as n

jj

Inference@LS-Kneip 25

2.1 Bootstrapping Pairs

a random design. Let Xi = (Xi1 , . . . , Xip ). The construction

of bootstrap condence intervals then proceeds as follows:

Basic bootstrap condence interval:

Original data: i.i.d. sample (Y1 , X1 ), . . . , (Yn , Xn )

Random samples (Y1 , X1 ), . . . , (Yn , Xn ) are generated by

drawing observations independently and with replacement

from the available sample Yn := {(Y1 , X1 ), . . . , (Yn , Xn )}.

(Y1 , X1 ), . . . , (Yn , Xn ) least squares estimators j , j =

1, . . . , p + 1.

Determine

2 and 1 2 quantiles t 2 ,j and t1 2 ,j of the condi-

tional distribution of j given Yn := {(Y1 , X1 ), . . . , (Yn , Xn )}.

P (j t 2 ,j ) , P (j > t 2 ,j 1 ,

2 2

P (j t1 2 ,j ) 1 , P (j > t1 2 ,j ) ,

2 2

Here, P denotes probabilities with respect to conditional

distribution of j given Yn .

Approximate 1 (symmetric) condence interval:

[2j t1 2 ,j , 2j t 2 ,j ]

Inference@LS-Kneip 26

Remark: Under some weak regularity conditions the bootstrap

is consistent, whenever

. In other words, the basic bootstrap condence interval provi-

des an asymptotically (rst order) accurate condence interval,

even if the errors are heteroscedastic (unequal variances)! This is

not true for the standard t-intervals.

Modication: Bootstrap-t intervals:

Random samples (Y1 , X1 ), . . . , (Yn , Xn ) are generated by

drawing observations independently and with replacement

from the available sample Yn := {(Y1 , X1 ), . . . , (Yn , Xn )}.

Use (Y1 , X1 ), . . . , (Yn , Xn ) to determine least squares esti-

mators j , j = 1, . . . , p + 1 as well as estimators ( 2 ) of the

error variance 2 .

With jj denoting the j-th diagonal element of the matrix

= [(X )T X ]1 compute

j j

jj

Determine

2 and 1

2 quantiles 2 ,j and 1 2 ,j of the

j j

conditional distribution of

jj

[j 1 2 ,j jj , j 2 ,j jj ]

terval will be incorrect for heteroscedastic error.

Inference@LS-Kneip 27

In order to understand bootstrap behavior for random design let

us analyze the simplest case with p = 1. Then Yi = 0 +1 Xi +i .

Consider the estimator

i (Xi X)i

1

(X i X)Y i

1 = i = 1 + n

i (Xi X) i (Xi X)

2 1 2

n

of the slope 1 .

Random design implies that (Yi , Xi ), and hence (i , Xi ), i =

1, . . . , n are independent and identically distributed. Under some

regularity conditions (existence of moments) we have

1

(Xi X)2 p E(Xi x )2 = X

2

,

n i

1

(Xi X)i L N (0, v,X

2

),

n i

where

( )

2

v,X = E (Xi x )2 2i .

If i and Xi are independent and 2 = var(i ) does not depend

2 2

on Xi , then v,X = X . We then generally obtain for large n

( 1 )

n i (X i X) i

distr( n(1 1 )) distr

i (Xi X)

1 2

n

( 1 )

n i (X i X) i

2

v,X

distr 2 N (0, 4 )

X x

Inference@LS-Kneip 28

Now consider the bootstrap estimator 1 ,

i (Xi X )i

1

(X X )Y

1 = i i

X )2

i

= 1 + 1

n

X )2

,

i (X i n i (X i

where i = Yi 0 1 Xi .

Recall that by denition, (Yi , Xi ), and hence (i , Xi ), i = 1, . . . , n

are independent and identically distributed observations (condi-

tional on Yn ). We obtain E( n1 i (Xi X )2 |Yn ) = n1 i (Xi

X)2 =: X

2

, and

1

| (Xi X )2 X

2

| P 0

n i

( )

as n . Moreover, E 1

n i (Xi X )i |Yn = 0 and

( )

1 1

var (Xi X )i |Yn = (Xi X)2 2i

n i n i

( )

i (Xi X) i

1 2 2

distr n(1 1 )|Yn N (0, n

4 ).

X

Since n1 i (Xi X)2 2i P v,X

2

, x P x , we can conclude

that asymptotically

( )

distr( n(1 1 )) distr n(1 1 )|Yn

Bootstrap consistent

Inference@LS-Kneip 29

2.2 Bootstrapping Residuals

lar design of the regression model. The only crucial assumption

is that the error terms i are i.i.d. with constant variance 2 .

Residuals:

p

i = Yi Yi = Yi o j Xij

j=1

Matrix notation:

1

.

= .. = (I X[XT X]1 XT )Y = (I X[XT X]1 XT )

| {z }

n H

Cov() = 2 (I H)

obtain

var(i ) = 2 (1 hii ) < 2

Standardized residuals:

i

ri = var(ri ) = 2

1 hii

We have i i = 0. For the standardized residuals it is, however,

1

not guaranteed that r = n i ri is equal to zero. The residual

bootstrap thus relies on resampling centered standardized resi-

duals ri := ri r.

Inference@LS-Kneip 210

Note: Residual plots play an important role in validating regres-

sion models.

a.) Nonlinear model:

Mangelnde Modellanpassung

4

2

residuals

0

2

0 50 100 150

fitted y

b.) Heteroscedasticity

Heteroskedadastizitt

100

50

0

Residuals

50

100

150

200

0 50 100 150

fitted y_i

Inference@LS-Kneip 211

Bootstrapping Residuals

Original data: i.i.d. sample (Y1 , X1 ), . . . , (Yn , Xn ) Estima-

tor

Calculate (centered) standardized residuals

i

ri = , ri = ri r, i = 1, . . . , n

1 hii

Generate random samples 1 , . . . , n of residuals by drawing

observations independently and with replacement from {r1 , . . . , rn }.

Calculate

p

Yi = 0 + j Xij + i , i = 1, . . . , n

j=1

mation from the data (Y1 , X1 ), . . . , (Yn , Xn ).

Basic bootstrap condence intervals:

Determine 2 and 1 2 quantiles t 2 ,j and t1 2 ,j of the

conditional distribution of j .

P (j t 2 ,j ) , P (j > t 2 ,j 1 ,

2 2

P (j t1 2 ,j ) 1 , P (j > t1 2 ,j ) ,

2 2

Here, P denotes probabilities with respect to conditional

distribution of j given Yn .

Approximate 1 (symmetric) condence interval:

[2j t1 2 ,j , 2j t 2 ,j ]

Inference@LS-Kneip 212

In order to understand the residual bootstrap let us again analyze

the simplest case with p = 1, and recall that

i (Xi X)i

1

(X i X)Y i

1 = i = 1 + n

i (Xi X) i (Xi X)

2 1 2

n

Let X := n i (Xi X)2 . If the errors i are i.i.d. zero mean

2 1

conditions) the central limit theorem implies that conditional on

the observed values X1 , . . . , Xn

( 1 )

n i (X i X) i 2

distr( n(1 1 )) = distr N (0, 2 )

i (Xi X)

1 2 X

n

By denition,

i (Xi X)Yi i (Xi X)i

1

1 = = 1 + n

.

i (X i X)2

n

1

i (X i X) 2

We have

1 2

E(i |Yn ) = 0, var(i |Yn ) = ri =: 2 ,

n i

and therefore

( )

1 1

var (Xi X)i Yn = (Xi X)2 2

n i n i

( ) 2

distr n(1 1 )|Yn N (0, 2 ).

X

Inference@LS-Kneip 213

2.3 Wild Bootstrap

heteroscedastic, i.e. var(i ) = i2 . In this case the wild bootstrap

oers an alternative.

There are several versions of the wild bootstrap. In its simplest

form this procedure works as follows: Conditional on Yn , a boot-

strap sample 1 , . . . , n of residuals is determined by generating

n independent random variables from the following binary dis-

tributions:

( )

1 5

P i = i = ,

2

( )

1 5

P i = i = 1 ,

2

5+ 5

i = 1, . . . , n, where = 10 .

E(i |Yn ) = E (i ) = 0

var(i |Yn ) = var (i ) = 2i

E((i )3 |Yn ) = E ((i )3 ) = 3i

Inference@LS-Kneip 214

Implementation of the wild bootstrap:

Original data: i.i.d. sample (Y1 , X1 ), . . . , (Yn , Xn ) Estima-

tor

Calculate (centered) standardized residuals

i

ri = , ri = ri r, i = 1, . . . , n

1 hii

Generate n independent random variables i from binary

distributions,

( )

1 5

P i = i = ,

2

( )

1 5

P i = i = 1 ,

2

5+ 5

i = 1, . . . , n, where = 10 .

Calculate

p

Yi = 0 + j Xij + i , i = 1, . . . , n

j=1

mation from the data (Y1 , X1 ), . . . , (Yn , Xn ).

Basic bootstrap condence intervals:

Determine 2 and 1 2 quantiles t 2 ,j and t1 2 ,j of the

conditional distribution of j .

Approximate 1 (symmetric) condence interval:

[2j t1 2 ,j , 2j t 2 ,j ]

Inference@LS-Kneip 215

In order to understand the basic intuition let us again analyze

the simplest case with p = 1, and recall that

(Xi X)Yi 1

(Xi X)i

1 = i = 1 + n

i

i (Xi X) i (Xi X)

2 1 2

n

i2 . Let X

2

:= n1 i (Xi X)2 and v,X

2

= n1 i (Xi X)2 i2 . Un-

der some regularity conditions the central limit theorem implies

that conditional on the observed values X1 , . . . , Xn

( 1 )

n i (X i X) i

2

v,X

distr( n(1 1 )) = distr N (0, 4 )

i (Xi X)

1 2 X

n

As above,

i (Xi X)Yi i (Xi X)i

1

1 = = 1 + n

,

i (X i X)2

n

1

i (X i X) 2

and by construction

( )

1 1

var (Xi X)i Yn = (Xi X)2 2i =: w,X

2

.

n i n i

( ) 2

w,X

distr n(1 1 )|Yn N (0, 4 ).

X

1

2

E (w,X ) = (Xi X)2 E (2i ) v,X

2

n i

Inference@LS-Kneip 216

implies that |w,X

2

v,X

2

| 0 as n . Wild bootstrap

consistent.

2.4 Generalizations

ping residuals, wild bootstrap) can also be useful in more com-

plex regression setups. An appropriate method then has to be

selected in dependence of existing knowledge about underlying

design and structure of residuals.

1) Nonlinear regression:

Yi = g(Xi , ) + i ,

X - Age of the car (in years)

selling price

Y - depreciation = original price (new car)

1.0

0.8

Y = relativer Wertverlust

0.6

0.4

0.2

0.0

0 2 4 6 8 10

X= Alter in Jahren

Inference@LS-Kneip 217

Model: Yi = eXi + i

An estimator is determined by (nonlinear) least squares;

residual: i = Yi eXi

Bootstrap: Random design bootstrapping pairs; bootstrap-

ping residuals for homoscedastic errors; wild bootstrap for

heteroscedastic errors.

2) Median Regression:

Linear model: Yi = 0 + j j Xij + i

In some applications the errors possess heavy tails ( out-

liers!). In such situations estimation of by least squares

may not be appropriate, and statisticians tend to use more

robust method. A sensible procedure then is to determine

estimates by minimizing

n

|Yi 0 j Xij |

i=1 j

optimization algorithms.

Inference is the usually based on the bootstrap. Random de-

sign bootstrapping pairs; bootstrapping residuals for ho-

moscedastic errors; wild bootstrap for heteroscedastic errors.

3) Nonparametric regression:

Model:

Yi = m(Xi ) + i

for some unknown function m. The function m can be esti-

mated by nonparametric smoothing procedures (kernel esti-

mation; local linear estimation; spline estimation). Inference

is often based on the bootstrap.

Inference@LS-Kneip 218

2.5 Time series

The general idea of the residual bootstrap can be adapted to

many dierent situations. For example, it can also be used in the

context of time series models.

Example: AR(1)-process:

Xt = Xt1 + t , t = 1, . . . , n

for i.i.d zero mean error terms with var(t ) = 2 . If || < 1 this

denes a stationary stochastic process.

Standard estimator of :

n

(Xt X)(Xt1 X)

= i=2n

i=1 (Xt X)

2

Asymptotic distribution:

n( ) L N (0, 1 2 )

Bootstrapping residuals

Calculate centered residuals

1

t = Xt Xt1 , t = t t , t = 2, . . . , n

n1 t

k , k+1 , . . . , 0 , 1 , . . . , n

of residuals by drawing n + k + 1 observations independently

and with replacement from {1 , . . . , n }.

Generate a bootstrap time series by Xk = k and

Xt = Xt1 + 1 , t = k + 1, . . . , n

Inference@LS-Kneip 219

Under the standard assumptions of AR(1) models this bootstrap

is consistent.

Basic bootstrap condence intervals:

Determine 2 and 1 2 quantiles t 2 and t1 2 of the condi-

tional distribution of .

Approximate 1 (symmetric) condence interval:

[2 t1 2 , 2 t 2 ,j ]

Inference@LS-Kneip 220

- Project ManagementUploaded bymrcn
- Davison Full CvUploaded byVishnu Prakash Singh
- Template MerahUploaded byJoanne Wong
- Encyclopedia of Research Design, 3 Volumes (2010) by Neil J. Salkind.pdfUploaded bySidney GM
- UsingStdNormalDataAnalysisKEO_1107Uploaded bymisterkeno
- Linear Minimum Variance Unbiased Estimation of Individual and Population slopes in the presence of Informative Right CensoringUploaded byIJSRP ORG
- Introduction to Bayesian AnalysisUploaded byValerio Taddei
- Extreme Value Theory Introduction DeHaanFerreiraUploaded byxcavier
- Sepsis y Signos Clinicos (1)Uploaded byNinelys Cod
- 6sA03Uploaded byMuhammad Bilal Junaid
- Importance SamplingUploaded byfrunzamc
- Percentage Points.pdfUploaded bywolfretonmaths
- Cy 23610614Uploaded byAnonymous 7VPPkWS8O
- Poison DistributionUploaded byravi431
- zhangwang-cv012Uploaded byhafiz10041976
- Alpha BootUploaded byVic Key
- 2286420Uploaded byMarcos Onreva
- Report.pdfUploaded byaqibazizkhan
- EstimationUploaded byanil_singh_62
- StochasticProcessUploaded byUsman Warraich
- biostatisticsUploaded bythesmushroom
- econmicUploaded byJason Huang
- nullUploaded byapi-27584557
- LAB 3Uploaded byJose Martinez
- Lecture_10_ch7_222_w05_s4Uploaded byRa Yaj
- Class 06 - Hypothesis Testing CasesUploaded byb_shadid8399
- Error Distributions and Other StatisticsUploaded byVera Firmansyah
- geostatistikUploaded byVikri Helmi Lubis
- The Impact of Exotic Trout on Native Charr in a Japanese StreamUploaded byMarcelo Kittlein
- 501cheatpaper.pdfUploaded byTianruiLuo

- MCMC-UseRUploaded byAlex Din
- Trends in Audio and Acoustic Signal ProcessingUploaded byAl Strings
- Lecture3 Market Impact MicrostructureUploaded byAl Strings
- MLmindmap.pdfUploaded byAl Strings
- 1709.01412Uploaded byAnand Krish
- Measure Theory for DummiesUploaded bywarden8081
- Data IngestUploaded byAl Strings
- SparkR SlidesUploaded byAl Strings
- SSRN-id2291720.pdfUploaded byAl Strings
- PracticalGuide-QuantPortfolioTradingUploaded byAl Strings
- CTA Performance Risk PersistenceUploaded byAl Strings

- [IJCST-V6I4P11]:Anjali SUploaded byEighthSenseGroup
- Proakis Matlab CommsUploaded byMoaaz Mahdi
- Advanced Investment Research (Ghazi) FA2016Uploaded bydarwin12
- Prevention Stratgies.pdfUploaded byAmy Powers
- SAP HANA Subject PointsUploaded bysridhar
- Peripheral Nerve BlocksUploaded bySyarifuddin Abdul Jabbar
- 2018-2019 science pacing guideUploaded byapi-417439119
- ReviewUploaded byHaranatha Sarma Sridhara
- crucigrama 3 en 1.pdfUploaded byFer Quispe Gallegos
- Improving OperationUploaded byjsotofmet4918
- law on property case digests 2016Uploaded byEros Freuy B. Ancheta
- Nested Radicals and Other Infinitely Recursive ExpressionsUploaded byapi-26401608
- Market FailureUploaded byShevon Williams
- The Five Mindfulness TrainingsUploaded byKanneryBosch
- 06.Development of Acute Megakaryoblastic Leukemia in Down Syndrome is Associated With Sequential Epigenetic ChangesUploaded byCristi Daniel Neagoe
- NormesClfileUploaded byseb_dominique
- 20050106 IFRS PrinciplesUploaded bymigrane
- Group A- SUMMER CAMP 2008 - Lesson PlansUploaded byapi-19635461
- Up the Down StaircaseUploaded byAna_Koridze_243
- Catering Thesis CinaUploaded byHervitta Putri
- TopicUploaded byThanigaivel Manivannan
- When Growth StallsUploaded byeric_st
- Top 10 Inner Game Sticking Points with Women -- And How to Overcome Them - AMP (Authentic Man Program)Uploaded bybbayer
- Franco, K., Alvarez, G., & Ramírez, R. E. (2011). Instrumentos Para Trastornos Del Comportamiento Alimentario Validados en Mujeres Mexicanas Una Revisión de La Literatura.Uploaded byP. Noemi
- Digital Citizenship SyllabusUploaded byjanetleahr
- The Final Proposal of Rabindranath Tagore Env Thinking2_26!6!2015Uploaded byNayeem Sazzad
- Behavioral Psychology.pptUploaded byAbdul Ghaffar
- Sphere- Terms and ConditionsUploaded byNapoleon Belleza
- ROC Credit Risk Management DatasheetUploaded byDeepak Singh
- 2nd Edition CX Strategy Summit 2017Uploaded byruchi21july