Attribution Non-Commercial (BY-NC)

38 views

Attribution Non-Commercial (BY-NC)

- Comando Xtprobit
- Paper Summary2
- systat
- 14 Clark and McCracken Chapter
- ch01-ISM
- Fundamentals of Rietveld Refinement Additional Examples HSP v3
- 05508306
- 16
- DONE and Pls COPY SERV007 MenuPricing FS 16812 PS 1-Dunno+2003pledges
- Best Practices for Cog Nos Report Studio
- Bootstrap
- VOL5IE125P5
- Clarify
- 110664137 Mathematics Internal Assessment Type II Fish Production
- 16683243
- solidcam2009millinguserguide-090914023626-phpapp01
- MIBOR
- ExLATE Handout JA May2011short
- Pp-qm-ecm Bb Config List
- Long Memory AbstractsMay2004

You are on page 1of 10

Given a data set, say {x1 , x2 , . . . , xn } and a statistic of interest, say θ, the

basic algorithm for the non-parametric bootstrap consists of the following:

1. Resample the data with equal probability and with replacement. That

is each resampling is performed on the entire n data points, so that each

observation has probability n1 of being sampled at every resampling. For

example, for a original sample of size 5, one bootstrap sample might

be x∗ = {x4 , x1 , x3 , x2 , x2 }.

2. Calculate the statistic of interest, θ∗ = g(x∗ ), call the bth estimate θb∗ ,

and store the value in a vector.

3. Repeat (1) and (2) a large number of times.

The resulting vector of bootstrap statistics then provides an estimate of

the distribution of the statistic, by way of

1. Bootstrap estimate of the expected value:

1 X ∗

θ̂ = θ (1.1)

B b b

∗

2. Bootstrap quantiles: Let θ[q] represent the q th quantile of the bootstrap

statistic. That is, take the vector of statistics produced by the boot-

strap procedure and rank them from smallest to largest. The ranks of

the vector then correspond to the bootstrap estimate of the quantiles of

the distribution. For example, if the number of bootstrap iterations was

1000, then, the 25th element of the ranked vector of bootstrap statistics

is the bootstrap estimate of the 0.025th quantile of the distribution of

the statistic.

3. The bootstrap estimate of the standard error

v

u B

u 1 X ¡ ∗ ¢2

se(θ)

ˆ =t θb − θ̄ (1.2)

B − 1 b=1

1

³ ´

∗ ∗

4. A (1 − α)% confidence interval is θ[B α , θ

] [B(1− α )]

2 2

using the bootstrap estimates. The non-parametric bootstrap will be the

best approach to inference when everything we know about the distribution

comes from the sample. In the case where we know something about the

distribution before we look at the sample, parametric approaches will give

us better results.

Inference with the bootstrap is a direct extension of traditional inference.

Example 1 The table below shows the results of a small experiment in which

7 mice were randomly chosen from 16 to receive a new medical treatment,

while the remaining 9 were assigned to the non-treatment group. Investiga-

tors wanted to test whether the treatment prolonged life after surgery. The

table shows the survival times in days.

Treatment 94,197,16,38,99,141,23 7 86.86 25.24

Control 52,104,146,10,51,30,40,27,46 9 56.22 14.14

Difference 30.63 28.93

Say we wish to test for treatment differences, and know that the median

is a better measure of the center of distribution than the mean.

2

1000 bootstrapped Differences of Treatment Medians

500

450

400

350

300

frequency

250

200

150

100

50

0

−100 −50 0 50 100 150 200

median difference

receiving two different post-surgery treatments.

The bootstrap 95% confidence interval was (−29, 101). What is our con-

clusion?

Sometimes we know the distribution of the sample, but we cannot derive the

distribution of the statistic of interest. Sometimes we can use asymptotic

approximations, but if our sample is small these may be grossly inaccurate.

Furthermore, there are cases where the non-parametric bootstrap will fail.

3

In the case where we know the distribution of the sample, but not of

the sample statistic, the parametric bootstrap often provides a powerful ap-

proach.

The basic algorithm for the parametric bootstrap is as follows:

interest, using sample estimates for the parameters.

the value in a vector.

statistic, just as for the non-parametric case.

mean p and variance np(1 − p). Suppose we wish to conduct inference on a

population proportion using the exact distribution of the underlying sample

from which we calculate p̂. We know that the underlying distribution of each

of our sample observations is bernoulli with unknown parameter p. How

would we conduct the parametric bootstrap?

P

Xi

1. First calculate the sample estimate of p, which is p̂ = i

n

.

and calculate p̂.

and the parametric bootstrap proves to be quite useful.

Let X1 , X2 , · · · , Xn be independent and identically distributed random vari-

ables whose probability distribution function (pdf) is given by f and whose

cumulative distribution function (cdf) is given by F .

4

Rx

That is, P r{Xi ≤ x} = F (x) = −∞ f (x)dx ∀x. Let Y[n] = max{X1 , · · · , Xn },

in words, Y[n] is the largest value in the sample, or the sample maximum.

Gn (y) = P r{Y[n] ≤ y}

= P r{X1 ≤ y, X2 ≤ y, . . . , Xn ≤ y}

= P r{X1 ≤ y}P r{X2 ≤ y} · · · P r{Xn ≤ y}

= [F (y)]n (2.1)

frightfully complicated. The normal distribution for example, has no closed

form solution for the distribution of the sample maximum. We want a better

way to use the information in the sample for our inference.

Why will the non-parametric bootstrap not work for the sample max?

from a reservoir on the Savannah River Site, a former nuclear processing fa-

cility. The reservoir was used as a cooling pond for nuclear effluent through

the 1980s, receiving high levels of radioactive materials that now reside in

the sediments in the pond. It is of interest to know the probability that if

163 Bass are taken from the reservoir each year that the maximum tissue

concentration of radiocesium will exceed 30 picocuries per gram.

163 4.33 34.06 13.17 4.58

5

Radiocesium Tissue Concentration in Bass from PAR Pond

45

40

35

30

frequency

25

20

15

10

0

0 5 10 15 20 25 30 35

picocuries per gram

137

Figure 2: An approximately Normal Data set of Cs Body Burdens

6

A parametric bootstrap was performed using the normal distribution for

the underlying distribution of the data. A histogram of the bootstrapped max-

imums is shown below.

300

250

200

frequency

150

100

50

0

20 25 30 35

picocuries per gram

30 picocuries per gram.

What is the bootstrap estimate of the probability that the maximum body bur-

den in a sample of size 163 will exceed 30 picocuries per gram?

7

2.2. Code for Non-parametric Bootstrap Two Sample

Inference

treatment = [94,197,16,38,99,141,23];

control = [52,104,146,10,51,30,40,27,46];

B=1000; mediantreat=zeros(B,1);

mediancontrol=zeros(B,1);

medianDiff=zeros(B,1);

boottreat=zeros(length(treatment),1);

bootcontrol=zeros(length(control),1);

for b=1:B

for j=1:length(treatment);

pick=unidrnd(length(treatment));

boottreat(j)=treatment(pick);

end

for k=1:length(control);

pick=unidrnd(length(control));

bootcontrol(k)=control(pick);

end

mediantreat(b) = median(boottreat);

mediancontrol(b) = median(bootcontrol);

medianDiff(b) = mediantreat(b)-mediancontrol(b);

end

hist(medianDiff);

title(’1000 bootstrapped Differences of Treatment Medians’)

xlabel(’median difference’)

ylabel(’frequency’)

8

sortmedian=sort(medianDiff);

BSCI=[sortmedian(25),sortmedian(975)];

imum

hist(bass);

title(’Radiocesium Tissue Concentrations in Bass from PAR Pond’);

xlabel(’picocuries per gram’);

ylabel(’frequency’);

mu = mean(bass);

sigma = sqrt(var(bass));

B=1000;

maxbass=zeros(B,1);

for b=1:B

maxbass(b)=max(basspboot);

end

hist(maxbass);

title(’Bootstrapped Maximum Radiocesium Tissue Concentrations

in Bass from PAR Pond’);

xlabel(’picocuries per gram’);

ylabel(’frequency’);

Count30=zeros(B,1);

for j=1:B

if maxbass(j)>=30, Count30(j)=1;

9

end

end

p30=sum(Count30)/B;

10

- Comando XtprobitUploaded byjc224
- Paper Summary2Uploaded byeuler96
- systatUploaded bySamuel Bezerra
- 14 Clark and McCracken ChapterUploaded bycclaudel09
- ch01-ISMUploaded byKylie Champ
- Fundamentals of Rietveld Refinement Additional Examples HSP v3Uploaded byBasharat Ahmad
- 05508306Uploaded byyeisongarces
- 16Uploaded bygowtham2u2
- DONE and Pls COPY SERV007 MenuPricing FS 16812 PS 1-Dunno+2003pledgesUploaded byxx3xxx
- Best Practices for Cog Nos Report StudioUploaded bymranjank
- BootstrapUploaded byIan Downie
- VOL5IE125P5Uploaded byJournal of Computer Applications
- ClarifyUploaded byHazel Cadoo-Gabriel
- 110664137 Mathematics Internal Assessment Type II Fish ProductionUploaded byTyron Kaotic Martin
- 16683243Uploaded byMona Frikha Elleuch
- solidcam2009millinguserguide-090914023626-phpapp01Uploaded byTomas Trojci Trojcak
- MIBORUploaded byrasshom
- ExLATE Handout JA May2011shortUploaded byHéctor Flores
- Pp-qm-ecm Bb Config ListUploaded byఈశ్వర్ భరణి
- Long Memory AbstractsMay2004Uploaded byAlberto Muñoz Cabanes
- Jurnal Fungsi Produksi 1Uploaded byMeuthia Alamsyah
- tsDynUploaded byaloo+gubhi
- Ebola - ReportUploaded byTouqir Shah
- MIT18_05S14_class27-prob (2).pdfUploaded byIslamSharaf
- 2012-08-30-114718_2823.docxUploaded byprmrao
- BSC6810 Parameter ReferenceUploaded byJames Tonderai Matswani
- BAB IIUploaded byAsti Sauna Mentari
- 1-s2.0-S0886779818301949-main shotcrete failure.pdfUploaded bySanjeev Kr. Thakur
- Size Dependency in Colour Patterns of Western Palearctic CarabidsUploaded bytineid78
- Krippendorff 2015 on the Reliability of Unitizing TexUploaded byclaudia

- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- Chapter 21Uploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- Chapter 13Uploaded byFanny Sylvia C.
- Chapter 12Uploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Chapter 8Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 9Uploaded byFanny Sylvia C.
- Chapter 5Uploaded byFanny Sylvia C.
- Chapter 6Uploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- An Ova PowerUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- R Matrix TutorUploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker
- Close Out NettingUploaded byFanny Sylvia C.

- 228 Time Lags NoteUploaded byKhairul Akmal
- Hubble Sphere - Universe in ProblemsUploaded byseth
- MDWpo_EUploaded byb-ferguson6432
- Mohr-Coulomb Parameters for Modelling of Concrete StructuresUploaded byDiego Alejandro Flores Ortíz
- NUS MA1100Uploaded byJonah Chew
- 206_208Uploaded byDarkmatter Darkmatterr
- ME 2204 Fluid Mechanics NotesUploaded byPraveen Raj
- Lecture7 Borel Sets and Lebesgue MeasureUploaded bysourav kumar ray
- catia v5 PptUploaded byAshfaq Masood
- 07087830Uploaded byAnonymous 6iFFjEpzYj
- Visual EncodingUploaded byMarcelo Alves
- Andy Liu - The Murray Klamkin Problems (ATOM, Vol 7) - Atommks1Uploaded byTriều Văn Dương
- syntax wavelet.docxUploaded byAzhar Muttaqin
- QC10th ErrataUploaded byTodd Corenson
- A Threshold Selection Method From GrayUploaded byAli Al Helbawi
- Plate Analysis Theory and Application Volume 2 Numerical Methods CT4180Uploaded bythehammersmith
- Risk and ReturnUploaded byAliMalhi
- LIER SUR TTBUploaded byCharu Dole
- UT Dallas Syllabus for cs4337.001.10f taught by Shyam Karrah (skarrah)Uploaded byUT Dallas Provost's Technology Group
- Motion Control With LabviewUploaded bysocat120013485
- emUploaded byAkshay Shinde
- ExerciseUploaded byAman kumar
- (Beaver)Perspectives on Recent Capital Market ResearchUploaded byWilliam Friendica Map
- segmentation.pptxUploaded byjayteearora
- Ph.D. Thesis of Farzana HussainUploaded byFarzana Hussain
- Synth: An R Package for Synthetic Control Methods in Comparative Case StudiesUploaded bytodayamam
- 1-s2.0-S002216941830622X-mainUploaded byanctn2014
- d1zttcfb64t0un.cloudfront.net Gatepapers EC GATE%2714 Paper 01Uploaded byraja_4u
- Verilog for PrintUploaded byrppvch
- fettweis1986Uploaded byAlex Arm.

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.