16 views

Uploaded by 3rlang

bootstrap

save

You are on page 1of 18

A nonparametric Approach to Statistical Inference Mooney & Duval Series on Quantitative Applications in the Social Sciences, 95, Sage University Papers Seminar General Statistics, 25 October 2012 Wendy Post Statistician Department of Special Education University of Groningen

1

overview

• • • • • • • Introduction: what is bootstrapping Introduction of a real dataset Estimation of standard errors Confidence intervals Estimation of bias Testing Literature

2

1

What if the usual assumptions of the population cannot be made (normality assumptions.E. Kass (2011): the big picture Bootstrap method Quantitative research: testing and estimation require information about the population distribution. 2 .Quantitative research Real world theoretical world 3 scientific models Data statistical models ϴ Conclusions Based on R. or small samples): the traditional approach does not work Bootstrap: Estimates the population distribution by using the information based on a number of resamples from the sample.

allowing one to use fingers or a tool to provide greater force in pulling the boots on. Bootstrap method Impossible task: to pull oneself up by one's bootstraps A metaphor for ‘a self-sustaining process that proceeds without external help’.Bootstrap Wikipedia: Boots may have a tab. 3 . loop or handle at the top known as a bootstrap.

x2*.….x2.xn*) Statistic (sample mean) Based on Efron and Tibshirani (1993) Statistic Bootstrap sample (mean of X*) 4 .…. the mean) • Estimate the sample distribution of the statistic by the bootstrap sample distribution Bootstrap method 8 Real world unknown probability distribution/model Observed data bootstrap world estimated probability model Bootstrap sample P X = (x1.xn) ˆ P X* = (x1*.Bootstrap method Use the information of a number of resamples from the sample to estimate the population distribution Given a sample of size n: Procedure • treat the sample as population • Draw B samples of size n with replacement from your sample (the bootstrap samples) • Compute for each bootstrap sample the statistic of interest (for example.

72 1.85 -0.49 0.81 1.40 B_sample 2 1.73 5 .55 0.08 0.22 -0.15 0.27 -0.26 -0.20 0. . 0.07 -0.12.49 -0.55 0. .13 0.15 -0. 0.41 1.26 -0.11 0.45) Example Case number 1 2 3 4 5 6 7 8 9 .40 0.30 -0.27 0.26 -0.40 1.22 -0. .26 -0.17 0.20 -0.49 -0.17 0.20 -0.17 -0.61 0.07 0.08 0.Example Population: standard normal distribution Sample with sample size = 30.15 . 0.40 1.58 -0.1698 95% Confidence interval: (-0. 0.61 -0.22 0. Results sample (traditional analysis) Sample mean: 0.40 1.07 B_sample 1 -0.22 0.08 0. 1000 bootstrap samples.93 0.17 0.30 -1.27 -0.83 B_sample 3 1. 25 26 27 28 29 30 Mean Sd Original data 1.32 0.11 .15 0.49 1.72 0.52 1.46) Bootstrap results: sample mean: 0.40 -0.07 1.72 1.22 -0.1689.55 0. 95% Confidence interval : (-0.07 -0.55 1.52 -1.10.40 -0.08 -0.13 1.

30) summary(meansW) quantile(meansW. p.numeric(1000) #declaration of de bootstrap mean vector # bootstrap procedure for (i in 1:1000){ meansW[i] <.025.rnorm(30.975)) # traditional approach summary(sampleW) mean(sampleW) CI_up = mean(sampleW)+1.mean(sample(sampleW.96*sd(sampleW)/sqrt(30) 6 . c(0.R-script Based on Crawley: The R book.96*sd(sampleW)/sqrt(30) CI_lo = mean(sampleW)-1. 0.seed(146035) #start seed for simulation sampleW <. replace = T))} R-script # bootstrap results hist(meansW. 1) # draw sample of size 30 meansW <.320 set. 0.

Bootstrap results Results example Population: standard normal distribution Sample with sample size = 30.1698 95% Confidence interval: (-0.10.12. Results sample Sample mean: 0. 95% Confidence interval : (-0. 1000 bootstrap samples. 0.45) 7 . 0.46) Bootstrap results: sample mean: 0.1689.

1) # draw sample of size 30 mymean= function(sampleW.321 library(boot) set. R = 1000) Bootstrap Statistics : original t1* 0.R-script : alternative Based on Crawley: The R book. p. statistic = mymean. i) mean(sampleW[i]) myboot = boot(sampleW.697321e-05 std.seed(146035) #start seed for simulation sampleW <. error 0. R=1000) myboot R-script : output ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = sampleW. mymean.1698585 bias 7.1484104 8 . 0.rnorm(30.

Justification: 1. the bootstrap distribution approaches the sample distribution bootstrapping • Estimation of the standard error of your statistic of interest • Construction of confidence intervals • Estimation of bias • Bootstrap testing (for small samples. closely related to permutation tests) 9 . bootstrapping will provide a good approximation of the sample distribution. 2. If the number of resamples (B) from the original sample increases. If the sample is representative for the population.Why does bootstrapping work? Basic idea: If the sample is a good approximation of the population. the sample distribution (empirical distribution) approaches the population (theoretical) distribution if n increases.

measured by Attitudes Survey towards Inclusive Education (ASIE) 19 Manipulated data for the Buddy project Research question: What is the effect of the buddy intervention on the attitude compared to the control group? 20 10 .Comparison of two independent groups Buddy project (De Boer) Study aim: To determine the effect of a buddy program intervention for influencing the attitude of typically developing students towards children with severe intellectual and/or physical disabilities Comparison of two groups: Buddy intervention: 4 typically developing students each being a buddy for a child with disabilities Control group: 4 typically developing students without being a buddy for a child with disabilities Dependent variable: difference in attitude(post-pre).

25 22 11 .Manipulated data Data Buddy project 21 Manipulated data Buddy project Design: two independent groups Test statistic: difference in means = 6.

000 24 12 .…B 4.…x4* 2. Select B independent bootstrap samples (replications) with replacement from the data of the children in the control intervention x5*. Determine the bootstrap sample distribution of difmean* 4.manipulated data Buddy project How to bootstrap for estimation of the standard error of the sample statistic? General 1. Compute the bootstrap sample statistic for each replication.…x8* B times (for estimation of standard errors. Select B independent bootstrap samples (replications) with replacement from the data of the children in the buddy intervention x1*. Select B independent bootstrap samples (replications): sample with replacement: x1*. B = 25-200 is OK) 2. θ*(b). b = 1. 200.…x8* 3. difmean*(b). Determine the bootstrap sample distribution of the bootstrap sample statistic 4. Compute the difference in means in both interventions for each bootstrap sample.…B 3. Estimate the standard deviation of the bootstrap sample statistic (standard error of the estimate) 23 manipulated data Buddy project In our Buddy project: 1. b = 1. Estimate the standard deviation of the difmean* (standard error) We will do this for B = 25. 1000 and 10.

11 2.46 2.88 6.48 2.33 6.5 6.Frequency distribution of the bootstrap sample statistic (difmean*) Histogram of ms 8 B= 25 Frequency 40 50 60 Histogram of ms B= 200 Frequency 6 4 2 0 0 -2 10 20 30 0 2 4 ms 6 8 10 12 -2 0 2 4 ms 6 8 10 12 Histogram of ms Histogram of ms B=1000 Frequency B= 10000 15 0 F re q u e n c y 50 0 0 500 1000 10 0 1500 0 5 ms 10 0 ms 5 10 25 Buddy project statistics of the bootstrap distribution of difmean* Number of Mean Median replications 25 200 1000 10000 5.24 6.0 6.47 26 the standard error 13 .26 6.5 6.25 sd 3.

5 10.25 97.5 6.0 50% 6.25 10.5 10.0 1.5% 10.5 6.0 6.5 lower boundary upper boundary 27 bootstrap for bias estimation Bias of an estimator: difference between population value θ and the expected value of that estimator .30 1. In formula: bias = θ .5% replications sd 25 200 1000 10000 0. then the estimators may be biased 28 14 .Buddy project percentiles All percentiles can be computed: estimation of the confidence intervals: percentile method Number of 2.E( ) So is the estimator on average correct? If the theoretical assumptions about distributions are not satisfied.24 1.

25) 30 15 .88 6.25 Estimated bias = bootstrap mean – sample mean B 25 200 1000 10000 Mean θ* 5. x4* to the buddy project.g. Compute for each the bootstrap sample the difference in means. Assign the first 4 observations x1*.62 6.37 -0. and the last 4 observations. b = 1.24 29 Buddy project: Testing the difference between two interventions 1.08 0.…. x4*. Sample mean of difference was: 6.….x8* to the control group.17 6.24 6. 3. 6. 2.01 0.bootstrap for bias estimation The difference between original sample statistic and the bootstrap sample statistic can be regarded as the estimation of the bias.26 Bias θ*-0.33 6.…B 4. Approximate the p-value by the number of bootstrap sample differences larger than the original sample mean (e.26 6. θ*(b). Draw B independent bootstrap samples (replications): sample with replacement: x1*. Note: this is the distribution under the null hypothesis of equal distributions.…x8*.01 corrected mean 6.

000 gives sample value < -6.0282 • 287 cases of 10.Buddy project: Testing the difference between two interventions Results • 282 cases of 10. 32 16 . Number of bootstrap samples: The number of bootstrap samples depends on what you like to do.000 gives a sample mean difference > 6.25 • two-sided p-value: 0. For all other applications: Take B> 1000. For estimation of standard error: 25-200 resamples should be sufficient.057 31 sample size Sample size: Bootstrapping is working well for samples >30-50 (if the sampling procedure is random) For smaller sample sizes: there might be problems about accuracy and validity: be careful with the interpretation.25 • One sided P-value: 0.

• The strength of bootstrapping lies in estimation rather than in testing (certainly for small samples) • The validity of the bootstrap testing is questionable for small sample sizes 33 Comments about bootstrapping (cont) • There are more advanced ways to construct confidence intervals with bootstrapping – BCα-method (bias-correction and accelerated) – ABC-method (approximate confidence interval) • Testing with the bootstrap method in small samples is closely connected with the randomization tests: randomization tests are the exact versions. • Bootstrap can be applied to more complex designs and in more situations (there are several ways in regression models) 34 17 . Only the simplest cases are demonstrated here.Comments about bootstrapping • Only the basic principles are introduced here • The sampling method depends on the stochastic element one would like to generate (depends on the model!).

An introduction to the bootstrap.D. Ter Braak. G. C. R.Z. (1992).E. R. Kass (2011). London: Sage publications. (1993).). and Duval.H. R. Bootstrapping.J. Statistical inference: The big picture. C. Sendler.Rothe & W. Permutation versus bootstrap significance tests in multiple regression and anova.J. 79-86. In Bootstrapping and related techniques.Jockel. (eds. and Tibshirani. K. Efron. London: Chapman & Hall. (1993). Statistical Science 26(1) 1-9.literature Moony. B.F. 35 18 . A nonparametric approach to statistical inference.

- 10.1.1.87.2469Uploaded byMauricioKuchta
- Two Sample t RandomizationUploaded byapi-3744914
- Lecture 1Uploaded byErArpitaGupta
- Abstract.docx AlokUploaded byPallavi Dubey
- Technical Analysis Dlm PerusUploaded byAmelia Monica
- Lab6 IntervalsUploaded bySuvayan Roy
- Barak Chian 2015Uploaded bytoilanhan1977
- Impact Bootsrap Time SeriesUploaded byAlkindi Ramadhan
- A Hybrid Neural Network Model for Noisy Data RegressionUploaded byAinaa Aliah
- A Cost Estimate Method for Bridge Superstructures Using Regression Analysis and Bootstrap - Fragkakis, Labropoulos,PantouvakisUploaded bystavros_sterg
- Unreliable Cross ValidationUploaded bymatthewriley123
- 06732448Uploaded byDxtr Medina
- US Federal Reserve: concentration 1989Uploaded byThe Fed
- SSRN_ID278109_code010727510Uploaded byLuqman Hakim Bin As'ari
- CSEBookUploaded bySagar Ahir
- A Dual Pathway Model of Steroid Use Among Adolescent BoysUploaded byNatalia Mchedlishvili
- skittles term prooject compiled dataUploaded byapi-325274340
- Full TextUploaded bymrd257
- 051409 Heckman Ppt HiresUploaded byCícero Augusto Bruel Antonio
- HW3Uploaded byDunbar Anders
- Chapter 5Uploaded byuzzn
- _Review AP StatsUploaded bybryanbryanbryanlol
- The Islamic Calendar Effect in Malaysian Stock MarketUploaded byنورهدايه عبدالكريم
- Public ArUploaded byjgonzález_102
- Factors That Influence Standard Automated Perimetry TestUploaded byaristos58
- Intuitiveness of the Java Programming LanguageUploaded byJason Simmons II
- Effect Climate Change on Nature Based TourismUploaded byMQazuiniMuzahar
- SmartPLS UtilitiesUploaded byplsman
- 29Uploaded byBagher Abbaspour Galedari
- Main PaperUploaded byAna Pop

- 10.1.1.87.2469Uploaded byMauricioKuchta
- Two Sample t RandomizationUploaded byapi-3744914
- Lecture 1Uploaded byErArpitaGupta
- Abstract.docx AlokUploaded byPallavi Dubey
- Technical Analysis Dlm PerusUploaded byAmelia Monica
- Lab6 IntervalsUploaded bySuvayan Roy
- Barak Chian 2015Uploaded bytoilanhan1977
- Impact Bootsrap Time SeriesUploaded byAlkindi Ramadhan
- A Hybrid Neural Network Model for Noisy Data RegressionUploaded byAinaa Aliah
- A Cost Estimate Method for Bridge Superstructures Using Regression Analysis and Bootstrap - Fragkakis, Labropoulos,PantouvakisUploaded bystavros_sterg
- Unreliable Cross ValidationUploaded bymatthewriley123
- 06732448Uploaded byDxtr Medina
- US Federal Reserve: concentration 1989Uploaded byThe Fed
- SSRN_ID278109_code010727510Uploaded byLuqman Hakim Bin As'ari
- CSEBookUploaded bySagar Ahir
- A Dual Pathway Model of Steroid Use Among Adolescent BoysUploaded byNatalia Mchedlishvili
- skittles term prooject compiled dataUploaded byapi-325274340
- Full TextUploaded bymrd257
- 051409 Heckman Ppt HiresUploaded byCícero Augusto Bruel Antonio
- HW3Uploaded byDunbar Anders
- Chapter 5Uploaded byuzzn
- _Review AP StatsUploaded bybryanbryanbryanlol
- The Islamic Calendar Effect in Malaysian Stock MarketUploaded byنورهدايه عبدالكريم
- Public ArUploaded byjgonzález_102
- Factors That Influence Standard Automated Perimetry TestUploaded byaristos58
- Intuitiveness of the Java Programming LanguageUploaded byJason Simmons II
- Effect Climate Change on Nature Based TourismUploaded byMQazuiniMuzahar
- SmartPLS UtilitiesUploaded byplsman
- 29Uploaded byBagher Abbaspour Galedari
- Main PaperUploaded byAna Pop
- Mep Pak DindinUploaded byslept4rest
- Chapter1 Update BassamUploaded byReem BouZeineddine
- TRABAJO DE ALGEBRA LINEAL.docxUploaded byMigueMarencoS
- Ley nº 30222.pdfUploaded bydiego
- loljljlUploaded byKimberly Jacobson
- Reconocer Lo Que Es - Bert Hellinger.pdfUploaded byAlberto Mejias
- 15113-30331-1-SM.pdfUploaded bySEMIOTIKA NEGATIV
- Terjemhan Bailey Limfoma Head and NeckUploaded byEBE THT
- Od Graphic OrganizersUploaded byAnn Karap
- jurnal 2.pdfUploaded byAguswan Furwendo
- VERNANT JPierreUploaded bySimone Maders
- Taller SemanaUploaded bylorena
- A Study on Developing Learning Strategies in VioliUploaded byAuguste Dupin
- TDK PTKPUploaded byvivianaarsew
- design beam ACIUploaded by970186cs
- pedoman RS tipe C.pdfUploaded byintan merta
- Invaginasi PrintUploaded bypuskesmaswajo
- FORMULIR-KREDENSIALUploaded byqko
- 22585_Final NAAMM Pipe RailingUploaded byJonah Scott
- model-cerere-2aspec.pdfUploaded bySabin Costinel
- 299851812 Budaya Mutu Keselamatan Pasien PuskesmasUploaded bynurlaili hasyanah
- EL MAPAUploaded byMiriam
- Paradúo-BitacoraUploaded byJhon Jairo Palacio Zapata
- Distribuciones de Probabilidad ContinuaUploaded byESaesCruz
- Impeachment of CJ Sereno.docxUploaded bymtabcao
- lp ANCUploaded bysri suryani
- simulare-mg-2010.pdfUploaded byMitrea Andreea
- 6.- Anexo_de_CalculoUploaded byƝcʟɘɱ LDenos
- MAPACONCEPTUAL-EDUCACIÓN EN LA SOCIEDAD DEL CONOCIMIENTO.docxUploaded byROMEL
- MORCHELA.pdfUploaded byאיזבלה להילחם מאושר
- Usm Stis 2017 - b InggrisUploaded by3rlang
- Kruskal-Wallis H TableUploaded by3rlang
- 40 Top QuestionsUploaded by3rlang
- Pertemuan 10Uploaded by3rlang
- 02 Kebijakan DAU 2017Uploaded by3rlang
- Tugas Flasafah Sains AUploaded by3rlang
- Bahan Rapat Penguji Skripsi 2017Uploaded by3rlang
- Pertemuan 13 - MultikolinieritasUploaded by3rlang
- 4 Uji HipotesisUploaded by3rlang
- KW Expanded Tables 4groupsUploaded by3rlang
- MSN Uji U Mann WhitneyUploaded by3rlang
- 03.2-MSN-Satu Sampel Uji RunUploaded by3rlang
- KW Expanded Tables 3groupsUploaded by3rlang
- Pertemuan 10-SK Dan Uji Hipotesis 2 PopUploaded by3rlang
- Jadwal UAS Ganjil TA 2017_2018 FIXUploaded by3rlang
- MSN-Dua Sampel Berpasangan-Uji Ranking Bertanda Wilcoxon.pdfUploaded by3rlang
- 04.3-MSN-Dua Sampel Berpasangan-Uji Ranking Bertanda Wilcoxon cUploaded by3rlang
- x5 Uji Dua Sampel Dependen - Uji Walsh (Gak Diajarin)Uploaded by3rlang
- 04.3-MSN-Dua Sampel Berpasangan-Uji Ranking Bertanda Wilcoxon c.pdfUploaded by3rlang
- GBPP S2006 Statistik NonparametrikUploaded by3rlang
- BNP.01.Uji Tanda (Sign-Test) - 2Uploaded by3rlang
- 02 MSN 2017 Uji Binomial Dan Uji Chi-SquareUploaded by3rlang
- modul1Uploaded by3rlang
- 1 Parametrik Dan NonparametrikUploaded by3rlang
- Manual_SUSENAS_Overall and Survey Process_v1.2.pdfUploaded by3rlang
- 9_CetakBiru_SatuData_web.pdfUploaded by3rlang
- 04.2-MSN-Dua Sampel Berhubungan-Uji Tanda cUploaded by3rlang
- Pertemuan 01-02 - Regresi Linier SederhanaUploaded by3rlang
- Pertemuan 01-Statistik Parametrik Dan NonparametrikUploaded by3rlang
- Biclust STIS Jan 2017Uploaded by3rlang

- Usm Stis 2017 - b InggrisUploaded by3rlang
- Kruskal-Wallis H TableUploaded by3rlang
- 40 Top QuestionsUploaded by3rlang
- Pertemuan 10Uploaded by3rlang
- 02 Kebijakan DAU 2017Uploaded by3rlang
- Tugas Flasafah Sains AUploaded by3rlang
- Bahan Rapat Penguji Skripsi 2017Uploaded by3rlang
- Pertemuan 13 - MultikolinieritasUploaded by3rlang
- 4 Uji HipotesisUploaded by3rlang
- KW Expanded Tables 4groupsUploaded by3rlang
- MSN Uji U Mann WhitneyUploaded by3rlang
- 03.2-MSN-Satu Sampel Uji RunUploaded by3rlang
- KW Expanded Tables 3groupsUploaded by3rlang
- Pertemuan 10-SK Dan Uji Hipotesis 2 PopUploaded by3rlang
- Jadwal UAS Ganjil TA 2017_2018 FIXUploaded by3rlang
- MSN-Dua Sampel Berpasangan-Uji Ranking Bertanda Wilcoxon.pdfUploaded by3rlang
- 04.3-MSN-Dua Sampel Berpasangan-Uji Ranking Bertanda Wilcoxon cUploaded by3rlang
- x5 Uji Dua Sampel Dependen - Uji Walsh (Gak Diajarin)Uploaded by3rlang
- 04.3-MSN-Dua Sampel Berpasangan-Uji Ranking Bertanda Wilcoxon c.pdfUploaded by3rlang
- GBPP S2006 Statistik NonparametrikUploaded by3rlang
- BNP.01.Uji Tanda (Sign-Test) - 2Uploaded by3rlang
- 02 MSN 2017 Uji Binomial Dan Uji Chi-SquareUploaded by3rlang
- modul1Uploaded by3rlang
- 1 Parametrik Dan NonparametrikUploaded by3rlang
- Manual_SUSENAS_Overall and Survey Process_v1.2.pdfUploaded by3rlang
- 9_CetakBiru_SatuData_web.pdfUploaded by3rlang
- 04.2-MSN-Dua Sampel Berhubungan-Uji Tanda cUploaded by3rlang
- Pertemuan 01-02 - Regresi Linier SederhanaUploaded by3rlang
- Pertemuan 01-Statistik Parametrik Dan NonparametrikUploaded by3rlang
- Biclust STIS Jan 2017Uploaded by3rlang