You are on page 1of 451
Monographs on Statistics and Applied Probability 57 An Introduction to the Bootstrap Bradley Efron Robert J. Tibshirani CHAPMAN & HALL/CRC E = | Beatie corm ee (© 1993 Chapman & Hi LIne. Printed sn the United States of America All ight eserved. No pan of thie book may be reprinted or feproduced or ulized tn any form or by any electron, mechanical or other means, now known oF hereafter vented, tcluding phooeapying and recording o by an information ‘torageor retrieval sYstem, without permission in writing from the publishers. Library of Congress Cataloging in-Publication Data tion, Bradley ‘An introdvctio tothe bootstrap / Brad Bron, Rob Teshian pom Includes bibliographical references ISBN D12.04231-2 {Bootstrap (Statics) 1 Tsturam, Robert QArT6R ESTES 1993 519.5144 —de20 93-4689 cr Briah Library Cataloguing in Publication Daa also available “This poo was typeset by the authors sing a PostSenpt (Adobe Systems ine.) based phototypesete(Vsmotrome 200P) The figures were gested n PostScript using th ‘Scuia analysis language (Becker ¢, al 1988), Aldus Freehand (Aldus Corporation an “Matiemanea (Wolfram Research Inc.) They were directly mcorprsedintothetypese document The text was formated using the LATEX language (Lanpor., 1986), version of TEX (Kauth, 1984), 4S3 4S0 2 eed Cate To (CHERYL, CHARLIE, RYAN AND JULIE AND TO THE MEMORY OF RUPERT @, MILLER, JR. Contents Preface xiv 1 Introduction 1 LL An overview of this book o 12 Information for instructors 5 1.3 Some of the notation used in the book 9 ‘The accuracy of a sample mean 10 24. Problen 15 ‘Random samples and probabilities ar 3A. Introduction 7 32 Random samples v 3.3 Probability thaoty 2 34 Problems 28 ‘The empirical distribution function and the plug-in principle aL 41. Introduction an 42 The empirical distribution function 3 43° The plugein principle 35 44 Problems aT Standard errors and estimated standard errors 39. Ba lutroduction 39 5.2. The standard error of a mean 39 5.3 Estimating the standard error of the mean 2 94 Problems 43 al ‘CONTENTS 45 5 ‘The bootstrap estimate of standard error 6 Example: the correlation coefficient 49 ‘The number of bootstrap replications B 50 ‘The parametric bootstrap 53 Bibliographic notes 56 Problems aT 7 Bootstrap standard errors: some examples 60 7. Introduction oo 7.2 Example J: test score date a 73. Example 2: curve fitting 70 TA Au example of bootstrap failure 8 7.5 Bibliograplue notes st 76 Problems a 8 More complicated data structures 86 81 Tatroduction 86 8.2. One-sammple problems 86 83 The two-sample problem 88 84 More general data structures 90 85 Example: ltenizing hormone 2 86 The moving blocks bootstrap 9 87 Bibliographic notes 102 88 Problems 103 8. Regression models 105 9.1 Introduction 105 9.2. The linear regression model 108 9.3 Example: the hormoue data 107 9.4 Application of the buotstrap a 9.5. Bootstrapping pairs vs bootstrapping residuals 113, 9.6 Bearaple: tho eel survival data 15 9.7 Least median of squares a7 9.8 Bibliographic notes rat 99° Problems 321 aaa 10.1 Introduction 124 5 conrents| 10.2 The bootstrap estimate of bias 10.3 Example: the patch data 10.4 An improved estimate of bias 10.5 ‘The jackknife estimate of biss 106 Bins correction 10,7 Bibliographic notes 10.8 Problems 11 Phe jackknife 11.1 Introduction 11.2 Definition of the jackknife 11.3 Bxample: lest score data 114 Peeudovalues 11.5 Relationship between the jackkwife 126 Failure of the jacket ILT Tho deleted jaekkeuife 118 Bibliographic noves 119 Problems ut bootstrap 12 Confidence intervals based on bootstrap “tables’ 22.1 Introduction 12.2 Some background on confidence intervals 123 Relation between confidence intervals and hy pothe- sis teste 124 Student's ¢ interval 425 The bootstrap-t interval 12.6 Transforinations and the bootstrap-t 127 Bibliographic notes 128 Problems 18 Confidence intervals based on bootstrap percentiles 13.1 Tntroduetion 192 Standard normal intervals 13.3 The percentile interval 13.4 Is the percentile interval backwards? 13.5 Coverage performaace 19.6 ‘The transformatiou-respecting property 234.7 The range-preerving property 13.8 Discussion 14 126 130 133 139 139 at at ML 143 145, 145 us 4 9 150 153, 153 155; 156 18s 160, 162 160 166 * contents 13.9 Bibliographic notes 176 13.10 Problems wr 14 Better bootstrap confidence intervals, 1738 14.1 Introduction 178 14.2 Bxampie: the spatial test data 179 148 The BC, method 184 MA The ABC method 8s 14.5 Example: the tooth data 190 146 Bibliographic notes 199 147 Problems ios 15 Permutation tests 202 15.1 Introduetion ae 45.2 The two-sample problem 202 15.8 Other tose statistics 210 15.4 Relationship of hypothesis tests +o confidence intervals and the bootstrap au 15.5 Bibliographic notes 18 156 Problems 218 16 Hypothesis testing with the bootstrap 220 16.1 Introduction 220 362 The evo sample problem 20 16.3 Relationship between the permutation test and the bootstrap 228 164 The one-sample problem 24 16.5 Testing multimodality of a popalation nr 166 Discussion 232 367 Bibliogeaphie notes 233 16.8 Problems 2st X17 Crose-validation and other estimates of pre 17-1 Introduction 37.2 Bxample: hormone data 17.3 Cross-validation 17-4 Gy and other estimates of prediction error 17.5 Etample: cissifeation trees 17.6 Bootstrap estunates of prediction error cownenas 17.6.1 Overview 17.6.2 Some details 17.7 The 632 bootstrap estimator 178 Discussion 17.9 Bibliographic notes 17.10 Problems 18 Adaptive estimation and calibration 258 18.1 Introduetion 25% 18.2 Example: smoothing parameter selection for curve fitting 258 18.3 Example: calibration of a confidence point 263 18.4 Some general considerations 266 185 Bibliographic notes 288 186 Problems 269 10 Assessing the error in bootstrap estimates m 19 totroduction 2 19.2 Standard error estimation a2 19.3 Percentile estimation 23 19.4 Tho juckknife-after boots 275 19.5 Derivations 290 19.6 Bibliographic noves 281 197 Problems 281 20.4 geometrical representation for the hootstrap and Jackkanife 283 20.1 Introduction 283 20.2 Bootstrap sampling 285 20.3 The jackknife as an approsomation to the bootstrap 287 20.4 Other jackknife appraximmations 280 2055 Estimates of bias 200 206 An example 208 20.7 Bibtogeapinc notes 205 | 2.8 Problems 205 X 21 An overview of nonparametric and parametric inference 206 21,1 Inteoduetion 206 21.2 Distributions, densities and likelihood functions 296 ca conrents 21.3 Functional statisties and influence functions 214 Parametric maximum likelihood inference 21.5 The parametric bootstrap 21.6 Relation of parametie miaxinim likelihood, boot= strap and jackknife approaches 21.6.1 Example: influence components for the mean 21.7 The empirical cif as a maximum likelihood estimate 21.8 The sandwich estimator 218.1 Example: Mouse data 21.9 The delta method 21.9.1 Example: delta method for the mesa 21.9.2 Example: delta method for the correlation coolirsent 21.10 Relationsliip between the delta method and = Sinitestmal jackknife 21.11 Exponential families 21.12 Bibliographic notes 21.13 Problems 22 Further topics in bootstrap confidence intervals 22.1 Introduction 22.2 Correctness and accuracy 22.3 Confidence points based on approximate pivots 22.4 ‘The BC, interval 22.5 The underlying bass for the BC, inierval 22.6 The ABC approximation 22.7 Least favorable families 22.8 The ABC, method and transformations 22.9 Discussion 22.10 Bibliographic notes 22.11 Problems 28 Bificient bootstrap computations 28.1 Introduction 28.2 Post-sampling adjustments 23.3 Application to bootstrap bias estimation 23.4 Application to bootstrap variance estimation 23.5 Pre- and post-sampling adjuetmuents 23.6 Importance sampling for tail probabilities 23.7 Application to bootstrap tail probabilities 208 02 307 309 310 310 ai w3 315 ans 35 316 319 320 sat 321 32 322 335 326 328 a 333 334 335 335 338 338 40 342 346 18 349 352 conrENts 28.8 Dibliographie notes 28.9 Problems 24 Approximate likelihoods 24.1 Introduction 24.2 Empirical likelihood 24.3 Approximate pivot methods 24.4 Bootstrap partial likeliliod 245 Implied likelihood 246 Discussion 24.7 Bibliographic notes 248 Problems 25 Bovtstrap bloequivalence 25.1 Introduction 25.2 A Vioequivalence problem 25.3 Bootstrap eoufidence intervals 25.4 Bootstrap power calculations 25.5 A moze careful power calculation 25.6 Peller's intervals 25.7 Bibliographic aotes 25.8 Problems 26 Discussion and further topics 26.1 Discussion 26.2 Somme questions abaat the bootstrap 26.3 References on further topics Appendix: software for bootstrap computations: Tntrodnetion Some available software XS language functions References Author index Subject index 386 387 358 358 360 362 208 3607 370 a7 371 ara 372 372 ara 379 881 384 380 380 302 392 04 596 398 398 99 300 13, 426 430 Preface Dear friend, theory 18 all gray, tnd the golden tree of life xs green, Goethe, from "Paust” ‘The ability to sumplify means to elimmate the unnecessary so that the necessary may speak Hans Hoffmann Statistics is a subject of amazingly many uses and surprisingly few effective practitioners. The traditional road to statistical kuowl- edge is blocked, for most, by a formidable wall of mathematics, ur approach here avoids that wall. The bootstrap ws a eomputer- ‘based method of statistical inference that can answer many teal statistical questions without forntulas. Our goal in this book is to farm sewntists and engineers, as well as statisticians, with compu- {ational techniques that they can use to analyze and understand complicated data sets ‘The word “understand” iv an important one in the previous sen- tence. This is not a statistical cookbook, We aim to give the reader a good intuitive wderstauing of statistical mference ‘One of the charms of the bootstrap is tie direct appreciation it -rves of variance, bias, coverage, aud other probabilistic phenotn- ‘ena. What does i mean that a oonfideace interval contains the tue value with probability 20? The usual textbook answer an- pears formidably abstract to most beginning students. Bootstrap ‘confidence intervals are directly constructed from real data sets, tusing a simple computer algorithm, ‘This doesn’t necessarily make it easy to understand confidence intervals, but at least. the diff culties are the appropriate conceptual ones, and not mathematical rules PREFACE ” ‘Much of the exposition in our book is based on the analysis of real data sets. The mouse data, the stamp data, the tooth data, the hormoue dala, and othar amall but genuine examples, are ‘important part of the presentation. These are especially valuable if the reader ean try his own computations on them. Pefsonal com. puters are sufficient to handle moat bootstrap computations for ‘these small data sets. "This book doct wot give a rigorous Lecical treatment of the bootstrap, and we concentrate on the ideas rather tas thear math ematical justifieation. Many of these ideas are quite sophisticated, hhowever, and this book is not just for begmners. ‘The presenta- tion starts of slowly but builds in both its soope and depth. More mathematically advanced accounts of the bootstrap may be found im papers and books by many researchers that are listed in the Bibliographic notes at the end of the chapters. ‘We woul like to thank Andreas Buja, Anthony Davison, Peter Hall, Trevor Hastie, Jobn Rice, Bernard Silverman, James Stafford and Sanm Tibshirani for making very helpful comments and sugges tions on the manuscript. We especially thank Timothy Hesterberg ‘and Cliff Liutueborg for the great deal of time and effort that they spent on reading and preparing commests, Thanks to Maria-Luis Gardner for providing expert advice on te “rules of punctuation ‘We would also like to thank numerous students at both Stanford University and the University of Toronto for pointing out errors in earlier drafts, and colleagues and staff at, our universities for theie support. Thanks to Toin Glinos of the University of Toronto for maintauning a healthy computing environment Karola DeCleve typed taveh of the frst draft of Wis book, and maintained vigi- lance against errors during its entire history, All of ts was done cheerfully aral m a most helpful manner, for which we are truly Brateful, Trovr Haslic provided expert “S? and TEX advice, crucial sages i the project. ‘We were lucky to have not one but two superb editors working ‘on this project. Bea Schube got us going, before starting her 1e- tirement; Bea lias dane a great deal for the atatisties profession land we wish her all the best. John Kimmel warried the ball after Bea left, and did an excellent job, We thank our copy-editor Jim. Geronimo for his thorough correction of the manuscript, and take responsibility for any errors that remain. "The first author was supported by the National Institutes of Health and the National Science Foundation. Both groups have wi PRERACE, supported the development of statistical theory at Stanford. cluding much of the theory bebind this book. The second author would like to thank his wife Cheryl for her understanding and support durmg this entire project, and his parents for a Kfetime of encourngement. He gratefully acknowledges the support of the Natural Sciences snl Phgsvoorny Research Conncil of Casal, Palo Alto and Toronto tune 1999 weapTo ~_ CHAPTERL Introduction Statistics isthe Sconce of learning ftom experience, especally ex perience that acrves a ite bit ata time, The earliest formation Seience was statistics, originating in about 1690. This century has seen statistical techniques beeotae the analytic methods of choice In biomedical scence, payehology, education, economies, commanie cations theory, sociology, genetic studies, epidemiology, and other arens. Recently, traditional sciences like geology, physics, and as- {ronomy have begun to make increasing wse of statistical methods a hey focus on areas that cenand informational eciency, such as the study of rare and exotic particles ot extremely distant galaxies, ‘Most people are not natura-born statisticians. Left to our own deviees we are not very good at picking out paiterss from a sea of noisy data. ‘To put it another way, we are all too good at pice jg out non-existent pattems that happea to sut one purposes. Statistical theory attacks the problem from both ends. t provides cptimal methods for finding a zeal signal in a noisy backgroud and also provides strict checks against the overinterpretation of random paterns Statistical theory altempts to answer thrce basic questions (1) How should 1 collect may data? (2) How should I analyze and summacize the data that l've eol- lected? (9) How accurate are my data surnmaries? Question 3 constitutes part of the process known as statistical n- ference. The bootstrap isa recently developed techaiaue for making certain kinds of statistical inferences, It is only recently developed because it requires modern computer power to sunplify the often | Intricate caleilations of traditional statistical theo "The explanations that we will give for the bootstrap, and other a meron 10% computer-based methods, involve explanations of trstional ideas in statistical inference. The basic ideas of statistics haven’s chaned, Dut their implementation has, ‘The modem computer lets us ap~ ply these ideas fexibly, quicklyy easily, and with a mmimom of ‘mathematical assumptions, Our primary purpose in the book is to explam when and why bootstrap methods work, and how they eax be applied in a wide variety of real data-analytic situations. All three baste statistical wntcepts; data collection, summary and inference, are illustrated in the New York Tunes excerpt of Figure LL. A study was done to see if small aspirin doses would prevent heart attacks in healthy middle-aged men. ‘The data for the as- prin study were collected in a particulary efficient way" by a con- trolled, rindomized, double-blind study. One half of the subjects rocerved aspirin and the other half received a control substance, or placebo, with no active mgredients, The subjects were randomly assigned to the aspirin or placebo groups. Both the subjects and the supervising physicians were blinded to the assigamaents, with the statisticians keeping a secret code of who received which substance, Scientists, ike evesyone else, want the project they are working on to succor. The elaborate precautions of controlled, randomized, nt guard against sisiug benefits that don’t exist, the chance of detecting a geste positive effect. ‘The summary slatoties in the newspaper aruicle are very simple heart attacks subjects (fatal plus non-fatal) aspirin group: 104 11037 plaoebo group: 189 11034 We will see examples of much more covnplicated sutnmeries in later chapters. One advantage of using a gcod experimental design is & simplification ofits results, What stzikes the eye here s the lower rale of heart attacks in the spins group. The ratio of the two rates is rowiiost _ ; iso/inosi ~ °° ta) IF this study can be believed, ad its slid desig makes t¢ very ‘eleva the apirn-takers only ave 55% ax many bear tack placebo taker Of coure we are not really intrested in , the estimated ratio, What we would keto know is, the tre ratio, chat the ratio iwrRopUCTION HEART ATTACK RISK FOUND 10 BE GUT BY TAKING ASPIRIN LIFESAVING EFFECTS SEEN Study Finds Benefit of Tablet Every Other Day Is Much Greater Than Expected ‘ByMAnoLDM scHEECKEA. 4 pernopuction wwe would see if we could treat all subjects, and not just a sample of them. The value @ = .55 is only an estimate of 9. The sample seems large here, 22071 subjects in all, but the conclusion that aspirin works is really based on a sualler uumnber, the 298 observed heart attacks. How do see know that 6 might not ome out rauck less favorably if the experiment were run again? ‘This is where statistical inference comes in. Statistical theory allows us to make the following mference: the true valu of 0 lies ti he mtaval 49<0<.70 a with 959 confidence. Statement (1.2) is a classical confidence in terval, ofthe type discuserd in Chapters 12-14, and 22. It says that if we ran a much bigger experiment, with millions of subjects, the atio of rates probably woulda’t be too much different than (1.1) We almost certainly wouldn't deeice that @ exceeded 1, that is thnt aspirin was actually harmiul. It is realy rather amazing that the same data that give us an estimated value, 6 = .56 in this ease, also can give us a good iden of the estimate’s accuracy. Statistical inference is serious business. A lot can ride on the decaion of whether or not an observed effect is real. The aspirin study sacked atrokes as well as heart attacks, with the following results: strokes subjects aspire group: 1911037 placebo group: 9811084 3) For strokes, the ratio of rates is 5 lgyuost (ay “98/1103 It now looks like (aking aspirin is actually harmful Howowre the interval for tue true stroke ratio @ turns out te be 98<0< 1.59 (1s) with 99% confidence. This incluxles che ueuteal value @ = 1, at which aspinm would be no better or worse than placebo vie vis strokes. In the language of statistial hypothesis testing, aspion ‘was found to he significantly beneficial for preventing heart attacks, Dut not significantly barraful for eausing strokes. The opposite con- clusion had been reached in an older, smaller study concerning men irropuction 5 ‘who had experienced previous heart attecks, The aspirin treatment remains mildly controversial for such patients ‘The bootstrap is a data-based simulation method for statistical Inference, which can be used to produce inferences like (1.2) and (1.5). The use of the term bootstrap derives from the phrase to pull oneself up by one's bootstrap, widely thought to be based on ‘one of the eightoenth century Adventures of Raron Munchassen, by Rudolph Erie Raspe. (The Baon had fallen 0 the bottom of 8 deep lake. Jist when it looked Tike af was Toot, he thought 1 Pick hiniself up by his own bootstraps.) Kas ns the same as the term “bootstrap” used in computer seience meaning to “boot” a computer from a set of core instructions, though the derivation is similar. Here is how the bootstrap works in the stroke example. We ero- ate two populations: the first consisting of 119 ones and 11037, 119=19918 zer0¢s, and the second ennristing of 98 ones and 11034. 98=10996 zeroes, "We draw sith replacement a sample of 11037 items from the first. population, and a sample of 11034 items from the second population, Exch of these is called a bootstrap sample. Frou Utes we derive the bootstrap replicate of 8 Proportion of ones in bootstrap sample #1 Tootstrap sampiegs” (8) We repeat this process a large mumber of times, say 1000 vines ‘and obtain 1000 bootstrap replicates 6°. Ths process i easy to ssa plement on a computer, as we wil see later. ‘These 1000 replicates ‘ontain information that can be used to make inferences from our data. Yor example, the standard deviation turned out to be 0.17 Jn a batch of 1000 replicates that we generated. The valye 0.17 is an estimate of the standard error of the ratio of rates. This indicates that the observed ratio 6 = 1.21 is only a litte more than one stbndard ovsor lager ta ado the nontral vale @ = 1 cannot be ruled out. rough 95% confslence interval lke (1.5) fa be derived by taksng the 26th and 9750 largest of the LOM) replicates, which m tis case turned out to be (93, 180) In this simple example, the confidence interval derived from the bootstrap agrees very closely with the one derived from statistical theory, Bootstrap methods are mended to smmplify the calculation of inferences like (1:2) and (1.8), producing them 48 an automatic ‘way sven in situations much more complicated than the agpirin study. Proportion of ones ‘The termmology of statistical summaries and inferences, like re gression, correlation, analysis of varianee, discriminant. analysis, standard error, significance level and confidence interval, has be- ‘come the lingua franca ofall disciplines that deal with noisy data. ‘We will be examining what thus language means and how it works 'n practice. The particular goal of bootstrap theory is a compute ‘base implementation of base atatstial concepts. Ip ome ways 2s easier to understand these concepts in computer-based contexts ‘han through traditional mathematical exposition, LL An overview of this book ‘This book describes the bootetrap and other methods for assessing statistical accuracy. The bootstrap does nct work in isolation but rather is applied to a wide variety of statistical procedures, Part fof the objective of this book ls expose the reader to snaiy exciting, ‘and useful statistical techniques through real-data examples. Some ‘of the techniques described inelise nonparametric regression, den sity estimation, classiieation tzees, and least median of squares regression Here 6 a chapter-by-ehapter synopsis of the book. Chapter 2 intreluces the hootsteap extimate af rianwlardl cere Torn si mican, Chapters 3-6 contan some basic background matertal, and may be skimmed by readers eager to get to the details of the bootstrap in Chapter 6. Random samples, populations, and ‘basic probability theory are reviewed in Chapter 3. Chapter 4 defines the empirical distribution function estimate of the popula- ‘on, which simply estimates the probability of each of n data items to be I/n. Chapter 4 also shows that many familiar statistics ean be viewed as “plugin” estimates, that ie, estimates obtained by plugging im the empirical distribution function for the unknown Sistribution of the popblation. Chapter & reviews standard error festinialion for a mean, and shows how the usual textbook forrmula can be derived as a siraple plugein estimate. ‘The bootstrap is defined in Chapter 6, for estimating the stan ard ertor ofa statistic from a siagle sample. The bootstrap stan {dard error estimate is a plugin estimate that rarely can be com puted exactly; instead a simulation (“resampling") method is used for approximating it. ‘Chapter 7 describes the application of bootstrap standard or zors in two complicated examples: a principal components analysis AN OVERVIEW QF HHS HOOK 7 ‘and a curve fitting problem. Up (o this point, only one-sample data problems have been dis ‘cussed. The application of the bootstrap to more complicated data structures is discussed im Chapter 8. A two-sample problem and ‘a time-seties analysis are described. Regression analysis and the bootstrap are discussed and illus trated in Chapter 9. The bootstrap estimate of standard error 1¢ applied in a mitaber of different ways and the results are discussed in two examples ‘The use of the bootstrap for estimation of bias is the topie of Chapter 10, and the prot and cos of bias corection are dis. cussed. Chapter 11 describes the jackknife method m some detail, We see that the jeckknife isa simple closed-foria approximation to the bootstrap, in the context of standard error and bias estimation. ‘The use of the bootstrap for construction of confidence intervals is described in Chapters 12, 13 and 14. There are a aumber of different approaches to this important topic and we devote quite abit of space to them. Ia Chapter 12 we discuss the bootstrap-¢ approach, which generalizes the usual Student's t method for cot structing confidence intervals. The percentile method (Chapter 13) utes instead the percentiles of the bootstrap distribution to fine confidenes lita. ‘The BC, (hivw-correeted seeclernted sn terval) makes mportant corrections to the percentile miterval and is described in Chapter 14 Chapter 15 covers periuutation tests, a time-honored and use ful set of tools for hypothesis testing. Their close relationship with the bootstrap is discussed; Chapter 16 shows hiow the bootstrap can be uted in more general hypothesis testing problews. Prediction error estimation anees mn regression and classification problems, aud we describe some approaches for it in Chapter 17 Crone-validation and bootstrap methods are described abd illus- trated. Extending this idea, Chapter 18 shows hiow de bool: strap and cross-validation can be used to adapt estimators to a set of data, Like any statistic, bootstrap estimates are random variables ancl ‘ have inherent error associated with them. When using the boot- strap for making inferences, it is important to get an idea of the magnitude of this eor. In Chapter 19 we discuss the jackknife- after-bootstrap method for estimating the standard error ofa boot strap quantity. ‘Chapters 20-25 contam more advanced iuatesial ou selected ‘ berRopucriox topics, and delve nore deeply into some of the rnaterial introduced ive previews canptors. The relabeniiy between the oats and Jackkuife is studied via the “resampling pieture” in Chapter 20. Chapter 21 gives au overview of worrparametrie and para ‘uetric inference, and relates the bootstrap to a number of other techniques for extiating standard errors. ‘These include the deka ‘method, Fisher information, infinittsinal sackknife, and the sand- ‘wich estimator. Some advanced topics in bootstrap confidence intervals are dis ‘cussed in Chapter 22, providing some of the underlying basis for the techniqties introduced in Chapters 12-14. Chapter 23 de- scribes methods for effcient computation of bootstrap estates including control variaies aud importance sampling. In Chapter 24 the constiwetion of approximate likelihoods is discussed. The Dootstrap and other related metheds are used to construct a “non parametric” likelihood in situations where a parametric model is not specified, Chapter 26 describes m detail a bioequivalence study in which the bootstrap is used to estimate power and sample size. In Chap- ter 26 we discs some general istes concerning the bootstrap and its role in statistical inference, Finally, he Appendix contains a deseription of a number of dif. ferent computer programs Tor the methods discussed in this book 1.2 Information for instruct We envision that this book can provide the basis for (at least) too different one semester courses. An upper-year undergraduate or first-year graduate course could be taught from some or all of the first 19 chapters, possibly covering Chapter 25 as well (both authors havo done this), ts addition, a more advanced graduate ‘course could be tanght from a selection of Chapters 8-19, and a so- lection of Chapters 20-26, For an advanced course, supplementary material might be used, such as Peter Hall's book The Bootstrap ‘and Edgeworth Expansion or journal papers on selected technical topics. The Dibliogsaphie notes in the book conta many sugges- tious for background reading, ‘Wa bave provided nuinerows exercises at the end of each chap- ter. Some of these invalve computing, since it 8 important for the student to get hands-on experience for learning the material. The ‘bootstrap is most effectively used in « high-level language for dat, SOME OF THE NOTATION USED IN THLE BOOK ° analysis and graphies. Our language of eioie (al. present) is (or *S-PLUS"), suid ntnaber of S progroans aypear in thie Ap- pendix. Most of these progeais could be easily translated inte uber languages such as Gauss, Lisp-Stat, or Matlab. Details on the availability of S and SPLUS are given m the Appendix. 3.8 Some of the notation used in the book ower case bold lesiers such as x refer Uo vectors, that is) x (@1,22,..-tn). Matrices are denoted by upper case boid letters such as X, while a plain uppercase leter like X refers to 9 random variable, The transpose ofa vector is written as x A superscript “=” indicates a bootstrap random variable: for example, x* mdi cates a bootstrap data sot generated from a data set x. Parameters are denoted by Greek letters such as #. A hat on a letter indicates In estimate, such ae 0. The esters P and G rete 0 populations fs Chapter 21 the same symbols are used for the cumulative disteibu tion function of a population. Ic is the indieator function equal to 1 if condition € i true and 0 otherwise. For example, Ty2ca) = 1 itz < 2 and 0 otherwise. The notation tx(A) cefers to the trace of the matrix A, that is, the sum of the diagonal elements, ‘The Gervasives of a function g(x) are denoted Uy o'(2).9"(z) and s0 ‘The notation Payee 20) Indicates an independent and identically distributed sample drasea, from F. Equivalently, we also write 2,"%'F for 2 = 1,2,...n. Notation sucl as (1% > 3} means the number of 2:6 greater ‘has 3. log. refers to the natural logarthae of x CHAPTER 2 The accuracy of a sample mean ‘The bootstrap is a computer-based method for assigning measures ‘of accuracy to statistical estimates. The basic idea behind the boot- strap as very simple, and goes back at, least two centuries. Afier reviewing, soa background material, this book describes the boot= strap method, sts implementation on the computer, and its appliea- tion to some real data analysis problems. First though, ths chapter focuses on the one example of a statistical estimator where we re- ally don't need a computer to assess accuracy: Uhe saraple sean. In addition to previewing the bootstrap, this gives us a chance to review some fundamental ideas fiom elementazy statistis, We be- gn with a simple example concersing means and (heir estimated “Table 2. shows the results of a small experiment, m which 7 eut fof 16 auce were randomly selected to recrive a new medical treat- ment, while the remaining ® were assigned to the non-treatment (control) group. The treatment was mttended to prolong survival after a test surgery. The table shows the survival tite following surgery, in days, for all 16 mice. Did the treatment prolong survival? A comparison of the means for the two groups olfers preliminary grounds for optiaisin. Let reodts> sy dicate the ffelinves ie Che naan Poupy sy = ion, Tikewise let fay so indicate the control group Statimes. The group means are #= Lox /7= 86.86 and 7= 0/9 = 6.22, (21) 0 the difference 2 ~ 7 equals 30.63, suggesting « considerable life. ‘prolonging effet for the treatment. Bt how accurate are these estimates? Afterall, the means (2.1) are based on stall samples, only 7 and 9 mice, respectively. In ‘THE ACCURACY OF A SAMOLE MEAN a “Table 24. The mouse data. Sixteen mice were randomly assign! to a treatment group oF a control group. Shown are thes survival ines, 1m days, falioweng atest surgery. Did the treatment prolong sural? Estimated, (Sample Standard Group Data Size) Mean Error Treatment? 91 to? 16 a 99 al 2 (86 25.24 Contiol: 520k 146 10 51 30 48 5622 ata 3063 28.93 ‘order to answer this question, we need an estimate of the accuracy of the sample means 2 and J. For sample means, and essentially only for sample meons, an accuracy formula is easy to cbtain, "The estimated stendard error of @ mean Z based on n indepen- dent data poms 21,22,°> stay 2 = Dy ts/ny is given by the formula ye @2) wheve o? = D7 y(2, — 2)*/(n 1). (Thin formula, and standard errors in general, are discussed more careflly © Chapter 5.) The standard error of any estimator 16 defined to be the square root of fis sariancr, tint i, the eslimator's root mean square vatiabilly around its expeclation, ‘This is the moet comoon mesure of an | cstimator’s accuracy. Roughly speaking, an estimator will be Toss than one standard exox away (rom its expectation about 68% of the time, and les than two standard errors eway about 96% of the tie, If the estimated standard errs in the mouse experiment were very onal, say les than 1, then we would Keow that and wee ‘lous to their expected values, and that the observed difference of 2 ‘THE ACCURACY OF A SAMPLE MEAN capability of the treatment. On the other hand, if formula (2.2) gave big estimated standard errors, say 50, then the difference e=- ‘mate would be too maceurate to depend on, ‘The actual situation 1s shown at the right of ‘Ixble 2.1. The ‘estimated standaed errors, ealevlaled from (222), ate 25.24 for tue .14 for he standard exeor for the difference — j equals 28.98 = VH.LTF TUT (since the varvauce of the differen of two independent quantities 1s the sum of their vaviances). We see tat the observed difference 30.68 is only $0.63/28:93 ~ 1.05 as timated standard errors greater than zero. Readers faailiar with hypothesis testing theory will recognize this as an ansignificantre- sult, one that could easily arise by chance even if the treatment really bad no effect at all There are more precise ways to verify thie disappointing result, (ea the perinatation test of Chapter 15), but usually, as in tins ‘case, estimated standard errors are an excellent first step toward ‘thinking critically about statistical estimates, Unfortunately stan ard errors have a major disalvontage: for most statistical estima- tors other than the mean theee i no formula like (2.2) to provide festimated standard errors. In other words, i is hard to assess the accuracy of an estimate other than the mean, ‘Suppose for exanmple, we want to compaie the two groups in Ta ble 2.1 by their medians rather than their meaus. The two medians are Q1 for treatment and 46 for control, giving an estimated dif ference of 48, considerably more than the difference of the means. [But how accurate are these medians? Answering such guestions is where the bootstrap, and other computer-based techniques, come in, The remainder of this chapter gives a brief preview of the oot strap estimate of standard error, a method which will be fuily discusted im succeeding chapters, Suppose we observe independent data points 21,22, 2m, for ‘convenience denoted by the vector x = (ri, 322,- stu) from which ‘we compute a statistic of interest s(x). Por example the data might be the n = 9 control group observations in ‘Table 2.1, and s(x) right be the sample mean. ‘The bootstzap estimate of standard error, invented by Efron in 1979, looks completely different than (2.2), but in fact itis closely related, as we shall see. A bootstrap sample x" = (21,23, obtained by randomly sampling n times, with replacement, from the original data points 23,22,- ~,2 For instance, with n= 7 we might obtain x" ~ (29,2, 9 2427.21). tase Cmte ta) Figure 2.1, Schematic of the Bootstrap process for estimating the stan dard ervor of a statistic a(). Bf bootatvap samp. are generated frm the orginal data set. Each bootatrep sample has n elements, generated by sampling with replacement m times from the orgrnat date et, Boot abap replicates s(x°"), (2°), .-.9(x°#) are obtamned by enloalating the tale of the statistic #(x) of each tootstrap sample- Finally, the stan= dard devnation of the yalues 3{2°"),a(3°)y---s(X°8) 18 our estrmate of the stondosd error of (8) Figure 2 is schematic of the bootstrap process. The boot= strap algorithm begins by generating a large number of sepa Gant bootsteap samples x°13¢2, x"! each of size n. Typical values for B, the niimber of bootseap sasnples, range fram 80 to 200 for standard error estisnation. Corresponding to each bootstrap sample i¢ a Bootstrap replication of 2, naucly 6(X"), the vale of the statistic + erated for x". If s(x) isthe sample median, for + instance, then 3(¥*) i the median ofthe bootstrap sample. ‘The bootstrap eatiniate of standard errr i the standard deviation of © the bootstrap replications, oor = {Dveer) — 28 v}, @) where o(-) = DZ, (x"*)/B- Suppose s(x) is the mean #. In this “4 ‘TH ACCURACY OF A SAMPLE MEAN Table 2.2. Bootstrap estimates of standard ervor for the mean and me dian; treatment group, mouse data, Table 2.1. The median ws less accu rte (has larger standard error) than the mean for ths dataset B: 100250 500 1000 mean: 10-72 9363 2232 2379 23.02 23.36 median: $2.21 96.35 $448 36.72 S648 37.85 cease, standard probability theory tells us (Problem 2.5) that as B ‘gets very large, formula (2.3) approaches Oa 24) ‘This ns almost the same as formula (2.2). We could male 1b ex actly the same by multiplying definition (2.3) by the factor jn/(n— 1)]3, but there is no real advantage in doing 20. ‘Table 2.2 shows bootstrap estimated standard errors for the ‘mean and the median, for the treatment group mouse data of Ta- ble 2.1. the estinated standard ervors settle down to limiting vale ues as the number of bootstrap samples B increases. The limiting value 23.36 for the mean is obtained from (2.4). ‘The formula for the limiting value $7.88 for the standard error of the median is ‘quite complicated: see Problem 2.4 for a derivation. We are now mm a position to assess the precision of the differ- ence in medians between the two groups. The bootstrap procedure 02> 0) (a) (This includes values like x = (10°, 10°), but it doesn't hurt to let SS, be too hig.) Fara subset A of S, we woald etill write Peob( 4} Eto indicate the probably that occurs in A, {For example, we could take e A= {(v2):0- ~ a = N Figuee 32. The frequencsee fo fins fu Jom the bynormat distributions Bin,p), n = 25 and p — 25,-30, and 90. The poms have been con- nected by lines to enhance vib. Here the idealized proportion Prob{A} 18 an actual proportion, Only in cases where we have a complete census of the population 1s it possible to directly evaluate probabilities as proportions The probability distribution F of a i still defined to be any complete desesiption of 2s prabaiilites. In the law echool example, P cau be desesibed ns follows: for any subset AV of &, — 2 Probie © A} = #1X, € A}/82, (an) where #{X, € A} is the nuinber of the 82 points in the left panel of Figure 2.1 that lie in A. Another way to say the same thing 15 that F is.a discrete distribution putting probability (or frequency) 1/82 on each of the indicated 82 points. Probabilities cau be defiged contimmousl, rather than discretely 1s in (8:6) of (3.11). The moot famous example is the normal (oF Goussian, or de-shaped) disteibution. A reavalued random vari- alle is defined to have the normal distribution with mean y and PROBABILITY THEORY % variance 02, written 2~Nuwot) of P= Nino?) (a) reantce al» ff a for any subset A of the yen! line RE ‘Lhe wntegeal a (3-13) 1 over the values of « € A, {There are higher dimensional versions of he normtal distribu. Ey tion, which involve taking integrals similar to (3,18) over malti- |) dimensional sets A, We won't need continuous distributions for @ewiopwent of the bootstrap (though they will appear later in + some of the applications) and will avoid mathematical denvations + pase out calculus. As we shall see, one the main incentives for the | development of the bootstrap is the slesire to substitute computer | power for theoretical calculations volving special distributions. [og The expectation of real-valued random variable x, written E(2). Fe its average value, here Ube average is taken over the possible “outcomes of « weighted according to its probability distribution F Thus APs (s.r Es) S2(tra-or for 2~Bilnp), (8.14) SE % de for 2 Myo). (15) ele Uo show that 12) = mp for 2 ~ Bin. and infor 2~ Nino). (Soe Problema 2.6 and 3.7.) Ei We sometines write the expectation as Ep(a), to udicate thet foe average is taken with respect tothe dstebuten. F HSuppose r (2) is some function of the random vanable x. Eten E(0), the expectation ofr 8 the theoretical average of (2) Fieighted according tothe probability distribution of. Po ex Bible if x ~ N(y,0%) and r= x*, then sd eae, : [logget caro) ‘Probabilities area special ease of expectations. Let A be a subset Bory ea RANDOM SAMPLES AND PROBABILITIES of S_, and take Tiseay whece Izeay if the mdicator function ten = {LBS (aan reer eee econ FUjeca}) = Probfe € A} (a8) For example if 2 ~ N(q,02), the 1 o Lae, (a9) whic ie Probl € A) according t (3.13) ‘The notion of an expectation a8 a theoretical average is very general, and includes eases where the random variable 2 is not real-valued. In the law school situation, for instance, we might be interested in the expectation of the ratio of LSAT and GPA Whiting 2 = (y.2) as m (38), then r= y/2, and the expectation ofris E(LSAT/GPA) a Lowrey (20) where 2) = (ys, 4,) is the jth point in Table 3.2. Numer ation of (8.20) gives E(LSAT/GPA) Tat je = P(@), for 7 a real-valued random variable with dist bution F. The varsonce of 2, indicated by o3 or just 0, is defined to be the expected value of y = (2 ~ 1), In other words, ? isthe theoretical average squared distance of a random variable 2 from its expectation se, ob = Ep(e— me)? (a2) ‘The variance of « ~ N(y1,07) equals 0”; the variance of x ~ Dip) exes np(A— p) see Problem 3.9. The standard dev tion of reorom vasiale i dof to be the equare cool of Is “Two random variables y and £ are said to be independent if Bloty)h(2)] — Ebo(w)}P1A(2)] (a2) | E prosamasry THEORY ” || for all fonctions 9(y) and h{2). Independence 1s well named: (3.22) | implies that the random outcome of y doesn't affect the random levine of z, nnd vice-versa "To sce this let B and C be subsets of Sy and S, respecively, the sample spaces of y and 2, and take g and i to be the indicator functions o(9) = Tye) and M3) = Teecy. Notice that i TyeusTeee 1 if yeBand 26 (ote (29) 80 Tiycaytteccy ithe indicator fancton ofthe imersction {y € BY: €C). Then by (8.18) and the independency definition (22), Prob{(v2) BAC) = llyeu)Msecy) = Blliyem EU seca) probly € BWPrabl € C} (3.24) “Looking at Figure 3.1, we can see that (3.24) does not hold for ‘the law school example, see Problem 3.10, so LSAT and GPA are not independent, ‘Whether or not y and # are independent, expectations follow the simple addition rule Eloty) + W(=)] = Bloty)] + BU(2)) (325) BS ated = Yo Placed (3.26) n vides 2435-2 jm sampling with replacement guavantees independence: i (2122, yn) ism random sazmpe of size n from a popula tion then all w observations , are «dentically distributed and /- mutually mdependent of each other. In other words, all of the 2, fave the same probability distribution F, and Erlo(zs)g2l22)y-sgaltn)] = Erlox(i))Erlge(22) Erlon(tn)} (3.27) for any functions 91,99, dn: (This 8 almost 2 definition of what random swnpling means) We will wte ie Fs (tua) (2.28) 8 RANDOM SAMPLES AND PROBABILITIES to indicate that x = (21,22; — rq) is a randosn sample of size frost a population with probability distribution P, This is some: times written ac om (9.29) ‘where 1d, stands for independent and identically disteibuted, 4.4 Problems 34 A tanwlom samupte of size 1 is Gakou ath replacement from A papulation of axe N Show that the probability of having | 1o repetitions in the sample is given by the product 3.2 Why might you suspect that the sample of 15 law schools in ‘Table (3.1) was obtained by sampling without replacement, rather than with replacement? 38 The mean GPA for all 82 law schools if 3.13, How does this compare with the uean GPA for the observed sample of 18 law schools in Table 3.1? Is ths difference compatible with the estimated standard error (2:2)? 344 Denote the mean and standard deviation of a set of numbers Xi Nays Xy by X and § respectively, where RS yy s= (S50 - Re? (a) Asaunpte.rs. ry wvelected fous Ny, Xay Xn by radon sampling wi replneement. Dele the s dud deviation of Ute sample average =" zany ‘usually called the standard error of £, by so(z). Use @ asic result of probability theory to show that s ste) = 5 (b) f Suppose instead that 21,22; -.tn J selocted by ‘anudom sampling arthoul replacement (So we must have sens » n 200, ‘Table 6.2 compares cv(sz) with ev(s.) for various choices of B, assuming B= 0, Vry often we can expect to have evn) no Staller thn 1, in whieh case B = 100"ves sate satactry oul Tiere ac two rules of thumb, gathered fro tke authors exper {1) Bven a sll number of bootstrap repations, say B = 25, is usually informative, B = 50 is often enough to give a good estimate of sep(6), (2) Very seldom ace more than B= 200 replications needed for ‘olimating a standard exon. (Much Digger vale of Bare co Gite for bootstrap confidence inervals, awe Chaptes 12-14 and'19) Approximations obtained by random sampling or simulation are called Monte Carlo estimates. We wi see in Chapter 28 that com putational methods other than sizaightforvard Monte Caslo simu- lation ean sometimes reduce manyfold the number of replications Lat bp be the hurts of b* = a(x"), bes bp = Elbe — ay /(Eg(br — 1)82 3, where = Eg(6"). Then Ase the expected ylue of bp, where F fo the cmpureal dsrbtion fue on andor sample of sis fcam FE 3 then eenls about ifm Uns the karan of Fal, See Section 9 at Eton and Tiel (2080). (6.9) {THE PARAMETRIC BOO TRAP 9 ‘Table 62. The coefictent of variation offen as a function of the conf cent of variatvon of the ede! bootstrap estimate den» and the nwnber 9f bootstrap samples B, from formula (6.9) assuming b= 0. Bo 2550 100200 co Cn) 229TH 1 20 24 22 at 2120 15 2. 8 7 16 5 1047 M12 M40 05 15 09.07.05 00 sd 10 07.05.00 B needed to attain a prespecified accuracy. Meanwhile * pays to ‘exnember that bootstrap data, like real data, deserves close nok, 4m particular, itis almost never a waste of time to display the his- ‘gram of the bootstrap replications. 6.5 The parametric bootstrap Jt might seem strange to use a resampling algorithm to estimate standard errors, when a textbook forma could be used. Ia fact, bootstrap sampling can be earried out pasametnically and when it is used in that way, the results are closely related to textbook standard error formulae, ‘The parametric bootstrap estimate of standard error is defined 6,6) (6.0) where Far i6 an estimate of F dened ftom a parametrte madel Tor the data. Parametric models are discussed in Chapter 2 here we will give a simple example to ilustrate the idea. For dhe law school data, instead of estimating F by the empincal distribution F, we could assume that the population has a bivariate normal: distribution. Reasonable estimates of the mean and covariance of this ponmation are given by (3) and 2S ¢ Diu Dw ale HDG Me -9 De 2) 79) aay oe ‘THE BOOTSTRAP ESTIMATE OF STANDARD ERROR Deuote the byvariate normal population with this mean and co- variance by Phorm; it 8 an example of a parametric estimate of the population F Using this, the parametric bootstrap estimate of standard error of the correlation 9 is seg, (6"). As im the non- parametric car, the Weal paramettie boot ti estimateeannot be easily evaluated except when @ isthe mean, Therefore we apprexi- mate the ideal bootatrap estimate by bootsteap sampling, but. in a different manner than before, Instead of sampling with replacernent from the data, we draw B samples of size m from the parametric estimate of the population Fa Fone — (ahs, 2) Alter generating the bootstrap vauples, we proceed exactly aa ia steps 2 and 3 of the bootstrap algorithm of Section 6.2: we evali- ate our statistic on each bootstrap sample, and then compute the faudatd deviation ofthe I! bootstrap seplentons In the coveelation evict exp, sein ivan nor ‘mal population, we draw? souples of size 15 (ont Fyn aid com [nite the correlation coeficint for each bootstsap sample. (Prob- Jeni 6 Sshows how to generate bivariate normal random variables) ‘The left panel of Figare 6 shows the histograee of B = 3200 boot- slrap replicates obtained in ths way, It looks quite similar to the histogratss of Figure 6.2. The parametric booksteap estisate of sandard error from these replicates was -124, close tothe value of 131 obtained from nosparampetric bootstrap sampling, “The textook formula for the standard erzor of the correlation coefficient is (1 ~ 62)/V/m—3. Substituting @ = .776, this gives a value of 115 for the law school data. We can mabe a further comparison (0 our parametric bootstrap result, Textbook results also state tha! Fisher's transformation of é (12) is approximately normally dist ated with mean ¢ = 3:log (84) Bf ‘and standard deviation 1/V/m=3, 0 being the population correla ‘on coofficient. From this, one typically carries out inference for ¢ ‘and then transforms back to make an inference about the corre- {ation sooicrent. Te compare dir wath ent paraineteie bootstray falysin, we ealetatod ¢ rathor (utd for eel of oUF S200 boots ‘THE PARAMETRIC BOOTSTRAP 55 Figuce 63. Left pone: hustogram of 9200 poramatreehootsrmy repli tions of ore") rom the aw schoo date, n= 18. Right panel: hs togram of 3200 repintions of &, Fishers trasformation of the corre tebon coffe, defined (18). The eff hatin too meh ike the hstogrens of (62), ate the rght hog fakes guile normal a8 preity satistial theory. swap samples, A histogram of the & values ts shown in the right panel of Figure 6.3, and lool quite normal. Furthermore, the stan ard deviation of the 3200 ¢* values was 200, very close to the value 1/V/i5 —3 = 20. This agreement holds quite genecally. Most textbook formulae for standard errors are approximations based on normal theory, and will typically gives answers close to tho parametric bootstrap that drsus samples from @ norinal distribution, ‘The relationship between the bootstwap and traditional statistical theory is a more advanced topic mathematically, and is explored m Chapter 24, The bootstrap has two somewhat different advantages aver tea ditional textbook methods: 1) when used m nonparametric mode, it relieves the analyst from having to make parametric assump tions about the form of the underlying population, and 2) when used in parametric mode, it provides more accurate answers thas, textbook formulas, and can provide answers m problems for whic no vextbook formulae exist “Mont of this baok concentrates on the nonparametric application ofthe bootstrap, with same exceptions being Chapler 21 and exas piles is Chapters LA aul 25. The paeametsie bootstrap a5 useful i 56 ‘THE BOOTSTRAP ESTIMATE OF STANDARD ERROR problems where some knowledge about the form of the underlying poptllation is available, and for comparison to nonparametric anal- ‘yses. However, a main reagon for making parametric assumptions in traditional statistical analysis is to facilitate the derivation of texthook formulas for standard errors. Since we don’t need form- las in the bootstrap approach, we ean avoid restrietive parsmettic sextinptions. Phally, we mention that 1 Chapters 13 and 14 wo describe bootstrap niethods for constriction of eutfidene intervals a which transformations such a8 (6.12) age Incorporated in an automati way. 6.6 Bibliographic notes ‘The bootstrap was introduced by Eom (107%), with further gene ‘eral developmenta given in Bfvon (19810, 1981), The monogeaph ‘of Bron (1982) expands on many of the topics in the 1079 pa- per and discusses some new ones. Expositions of the bootstrap for ‘statistical audience include Bfton and Gong (1983), Efron and ‘Tibshirani (1986) and Hinkley (1988). Efron (1982) outlines some statistical questions that arose from bootstrap research, ‘The lec- ture notes of Beran and Ducharme(1991) and Hall's (1982) mona- raph give @ mathematically sophisticated treatment of the boo!- strap. Nor-technieal descriptions may be found in Diaconis and Efron (1983), Lunneborg (1983), Rasmussen (1987), and Efron and ‘Tibsturan (1901), A general discussion of computers and statistics ray be found m Biron (19790). Young (1988a) studies bootstrap- ring, of the coreelation coefficient While EGon's 1979 paper fermally mtroduced and studied the Dootstzap, similar ideas had been suggested wm diferent contexts, ‘These include the Moute Calo hypothesis testing methods of Barnard (1963), Hope (1968) and Marriott (1979). Particnlarly notable contributions were wade by Hartigan (1969, 1971, 1975) im his typical value theory for constructing confidence interval JL. Simon discussed computational methods very similar to the bootstrap in a sociometrzs textbook of the 1960's; see Simon and Bruce (1991). “The jackkaife and exoes-validation techniques predate the boo! strap and are closely related to t, References to these methods are ‘ven in the bibliogrephic notes ia Chapters 11 and 17, : 1 PROBLEMS 7 6.7 Problems 6.1 We might have divided by 1 mstend of B— 1 in definition (6.6) of the bootstrap otandard error estimate. How would that change Table 6.1? 6.2 With sy defined as in (6.6), show that Bplieh) = #2, (6.13) where 62, equals the ideal bootstrap estimate sep(@*). In other words, the variance estimate 3, based on boot strap replications has bootstrap expectation equal to the ‘deal bootstrap variance 3, 6.31 Show that Ep (}) = Er(@,), but vare(Sh) > vare (#2). In other words &%, has the samme expectation as g#., but larger varianoe, (Notice that these results involve the usual expectation and variance Ey and vary, not the bootstrap ‘quantities Ep and vary.) 6.4 The data in Table 3.2 allow us to compute the quantities Cv{sine) and Ain formula (6.9) for the law sekoo! data cvlgin.) ~ sll, A= 4. What value of B makes ev(s¢n) only 10% larger than cv(S#.0)? 5%? 1%? 6.5" Given m dataset of n distinct values, show that the number of distinct bootstrap samples is Cy ‘) (oa) How many are there for n= 15? 6.6 A biased but more robust estimate of the bootstrap standard : ol — §uti-a) ) Bn9- (615) where 6° in the 100a¢h quantite of the bootsteap repli- cations (Je. the 100ath largest value in an ordered list of the 6°(0)}, and =) as the 100ath percentile of a standaed normal distribution, 2°) = 1645 ete. Here 1s a table of the quantiles for the 3200 bootstrap replications of 8° in "Table 6.1 and the left panel of Figure 6.2: “me woors RAP ESTIMATE OF STANDARD ERROR e065 1016 50st 8, Sa 596 GAT 703906 927.948 (0) Compute p,q for @ = .05,.0, and $4 (b) Suppose that a transcription error caused one of the (0) values to change from 42 to ~#200, Approzimately how much would this change sey? S.? 6.7 Suppose a bootstrap sample of size m, drawn with replace ‘ment from 2,225... contains jy copies of £3, 32 copies oF 22, and 0 0n, Up Lo jn copies of zy, with j,4J2 tia = 1 Show that the probability of obtaining this sample 48 the ihinomnial probability (con) 1G)". (6.16) Guna) 6.8 Generation of bivartate normal random suriables, Suppose wwe have a random number generator that produces inde- pendent standard normal varates? my and rz and we wish to generate bivariate randem variables y and = with means dys dts and covariance matrix (2%) Let p = oy/(oyoe) and define where (6.17) al V= By boys Het Goeg i tee Hs ia (3 ) where e= Y/(7pT=T. Show that y and s have the requised bivariate normal distribution. 6.9 Generate 100 bootstrap replicates of the correlation coef ficient for the law school data. From these, empute the 2 Mout saul pacage ve he elly fc gneatng independent ane ‘rd normal vatiees. Fort compyehenive flere on the subject, eo eeroe (1086) PROBLEMS so bootstrap estimate of standard error for the correlation co- clicient, Compare your results to those in ‘Table 6.1 and Figure 6.2 6.10" Conaider an artificial data cot conai g of the 8 numbers 1,2,85,4,7,78,86, 124,138, 18.1. Let 6 be the 25% trimmed mean, computed by deleting the uvallst (wo nonabers andl largest bwo numbers, and thea ‘aking the average of the remaining four numbers. (} Caleulate sy for B = 25, 100,200, 500, 1000,2000. From ‘these results estimate the ideal bootstrap estimate Sea, (b) Repeat part (a) using ten different random number seeds and hence aseess the variability in the estimates. How large should we take B to provide satisfectory aceu- () Caleulate the ideal bootstrap estimate @x. directly us- ng formula (6.8). Compare the answer to that obtained in past (a). | Tadicates a difficult or more advanced problem. CHAPTER T Bootstrap standard errors: some examples 7-1 Introduction Before the computer age statisticians ealeulated standard errors using a combination of mathematical analysis, distributional as ssinuptions, and, often, a lot of hard work on mechanical calla tors. One classical result was given in Section 6.5: it conoema the sanple correlation coefficient cre(t, 2) defined in (4.6). If we are willing to assunue that the probability distribution F giving the n data points (y;,2) i bivariate normal, then a reasonable estimate for the standaed ervor of co¥ 8 Soa = (= OR) MEAT. ey An obvious objection to Saorma concerns the use of the bivariate normal distribution. What night do we have to assume that P ig normal? To the trained eye, the data plotted in the right panel of Figure 3.1 look suspicioasly non-normal ~ the point at (576,3.39) is too far removed fam the other 14 points. The real reason for eo sidering bivariate normal distributions is raathematical tractabi ity, No other distributional form leads to a simple approximation for se(car), ‘There is a second important objection to yormal it requires ‘Jot of mathematical work to derive formulas like (7.1). If we choose a statistic more complicated than G7, or a distribution Jess tractable than the bivariate normal, then no amount of math- ematical cleverness will yield « simple formula. Breause of such Limitations, pre-computer statistical theory focused on a small sot of distributions and’ limited class of statistics. Computer-based ‘methods like the bootstrap free the statstician from these con- straints, Standntd errors, aund other raeasures of statistical accu EXAMPLE 1 TBST SCORE DATA a ‘acy, are produced automaticaly, without regard to mathematical ‘complexity. * Bootstrap methods come into their own in complicated estinsa- tion problems. This chapter discusses standard errors for two sucly prableins, exe concerning the elgenvalues and eigenvectors of a covariance matrix, the other a computer-based curve-fitting algo- rithaa called “loess.” Deseribing these problems requires some ma (ix terminology that may’ le uutuuiliar to the ceader. However, tmatrix-theorelic calculations will he avoided, and in any case the theory isn't necessary to understand the main point being made here, that the simple bootstrap algorithm of Chapter 6 ean provide standard errors for vety complicated situations. [At the end of this chapter, we discuss simple problem: in which the bootsteap fails and fook ab the reason for the failure. 7.2 Example 1: test seore data ‘Table 7.1 shiows the score data, from Macdia, Kent and Bibby (1979); 41 = 85 students each took 5 tests, in" mechanics, wetors, algebra, analysis, nnd statistics. ‘The fist two teste were closed book, rhe last throe open book, vis convenient to think of the score data as an 88 x 5 data matrix X, the th row of X be xi (ayes tia. Base), (72) the 5 scores for student i = 1,288 “The mean vector X= 2h, x/88 i the veto of column meas, x (So 25/88. 320/88), S26 /88) (38.95, 50,59, 50.60, 46.68, 4.31), (ray ‘The emparvcal covariance matrix G is the 9 5 matrix with (3, th 2 rhe nota pure gasn, Theva formas Bike (7.1) can hp under stand asltuation a differet way than che meal ovtpat of bootstrap froeram. (Later, on Chapter 21, wil xamise te ele conection be {een formalay lke (73) nt the bootstrap.) Ie paye to rember that Inethode lite the boost fee the tata t ke more ley at he a BOOTSTRAP STANDARD ERRORS: SOME EXAMPLES ‘Table 7.4. The score data, from Mardia, Kent and Bibby (1979); = 88 students each took fve tess, wn mechanics, vectors, algebra, analysts, dard statistics; c" and "0" andicate closed and open book, respectively Bree wee alg ana sta | we © 6 © ol” “oe Oo TW _s @ a 8l[@ aor 3% mon 6 ale 4 40% 2 6 7 ola 2 ue 5 6 8 6 wm Bla ow 6 om a om ot male a Be SL GF 6s 6 es/si tt so 6s 2 56/52 17 om m@ ale a 24 bls oF & of 9 @ tle oa ®t of ss alsr gs os 6 63 58 56 Stl ki ns 6 oe oot 86. o ” » oo 8 a @ 4 2 6 6 are 2 6 6h oa 2 1 @ 1 a ws m0 al Be % Bo 2 oa oO co a 2 wo 2s oo o 3 0% o 2 o es oR © Bow 56 o om a 7 m4 oe a a a0 2 5 a % 15 30 26 6 " 6 8 & fo 8 % a oo” n a 4 0 22 8 & 2% 0 36s 1 ” 3% a 5 40 56 ~ 3 M6 oT a a8 @ 2 3 8 a ou om 2 0 & 8 a 8 is o 8 8 rr @ or a 23 55 39 s 3s 2 8 & o % 3 a 6 2s rf ao EXAMPLE 1: TEST SCORE DATA 5 element G Mazin — 2) 7k =1,2,3,45. (7.4) va abe Notice that the diagonal clement Ge i the plugein estimave (5.11) for the variance of the scores on test 7. We compute 3023 1258 1004 105.1 116.1 1258 1709 $4.2 936 97.9 104 842 LLG 110.8 1205 (75) 105.1 936 Los 2170 1538 16.1 979 1205 1888 20h4 Educational testing theory 1s often concerned with the exgen= talues and exgensectors of the covariance matrix G. A. 5 x 5 6O- variance matrix has 5 positive eigenvalues, labeled in devreasing order My > Ap 235 2 Aq > Ag. Corresponding to each Ay 18 a 5 dimensional eigenvector ¥ = (iis, tay aay buy fis). Readers not familiar with eigenvalues and vectors may prefer to think of a fune- tioneigen”, a black box? which inputs the matrix G and outputs the 4 and corresponding i, Here are the eigenvectors and values for matrix (7.5) 505, .368, 346, 451, 535) =.749,—207, 076, 801,548) ¥q = (300, 416.145, 597, —.600) = (296,783, —.008, 518, -.176) (079,89, ~.024, 286, 151), (70) Of what interest are the eigenvalues and eigenvectors of m ¢o- variance matcix? They help explain the stracture of multivariate ate like that in Table 7.1, data for which we have many inde- pendent units, the n = $8 students in this case, but correlated ‘measurements within each unit. Notice that the § test scores are bighly correlated with each other. \ etucdent who did well on the mechanics testis likely to have done well on vectors, ete. A very 2th eigenvalues and caenveciore of matrix are actually computed by & complicated seres of algebrve raniplions requiring onthe fede of p= sitlations whon Gs pp mated Chapter 3 uf Golub snd Yas Loan, 1983, describes the alr. a BOOTSTRAP STANDARD ERRORS: SOME FKAMPLES simaple model for corelated scores is x= Qn r= 12,88 @ Here Q, isa single number representing the capability of stadt i, while v= (01, 02,0, 4%) 18 fixed vector of S numbers, applying to all students. Qj can be thought of as student's scientific Intel- ligence Quotient (1Q). IQ were originally motivated by a mode! Just slightly more complicated than (7.7) Tf model (7.7) were trae, then we would find this cut from the eigenvalues: only 41 would be positive, Az = Jy = Jy = dg = 0; also the first eigenvector % would equal v. Let @ be the ratio of the largest eigenvalue to the total, WA (3) ‘Model (7.7) is equivalent to d = 1. Of course we don’t expect (7:7) to be exactly true for noisy data like test score, even ifthe model is basicaly correc. Figace 7.1 gives a stylized illustration, We have taken just two ofthe scores, sad on the let depicted what their seatterplot would look ike if single nazabec Q, eaptured both scores. The scores lie exactly on a line: Q, could be defined asthe distance along the ine of each pont from the origin. The right panel shows a more realistic Bituation. The points do uot ie exactly ou a Tine, but are fatly collinear. ‘Te lie shown in the plot points nthe diection given by thefts eigenvector of the covariance matrix. Iti sometines called the first principal component line, and has the property that. _ninimlae the stm of squared orthogonal distances fc the points to the line (in contrast to the least-squares line which minimizes the sum of vertical distances from the points to the line). The crthogoual disances are shown by the short live segments in the right panel, [5 dificult to make suel a geaph for the score data: the principal component line would be a line in fve dimensional space lying closest to the data. If we consider the projection of ‘each data point onto the line, the prineipal component line so ‘maximizes the sample variance of the collection of projected points For the score dala 6 or. way wey pare i) EXAMPLE I: TEST SCORE DATA 6s - igure 7.1 Hypothetical plot of mechanics and vector scones. On the lef, the pares tine exactly on a straight line (that 18, have carvelation 1) and hence single measure explares the two scores. On the righ, the scores hhave correlation less than one. The principal component line minumzes the tum of orthogonal distances tothe line and has direction given by the largest exgenvectr ofthe covariance mats. In many situations this would be considered an interestingly large value of, iadicating a high dageee nf explanatory power for tel (7.2). The value of @ measures the percentage of the variance ex plained by the fest principal component. The closer the points lie to the principal component tine, the higher the valve of @ How accurate is 7 'Thiss the kind of question thatthe bootstrap vas designed to answer. ‘The mathematical complexity’ going into the computation of 9 Is relevant, as long as we can compute O° for any bootstrap data set. In this case a bootstrap data set 18 an £88 x 5 matrix X*, The rows x} of X" are a random satmple of sive {8 from the rows ofthe actual data matetx X, Nod he = Xin (719) as in (6.4). Some of the rows of X appear zero times as rows of X", some once, some twice, ete, for a total of 88 rows, Having generated X", we calculate its covariance matrix G* as ra BOOTSTRAP STANDARD ERRORS: SOME EXAMPLES ‘Table 7.2. Quantiles ofthe botstrap distribution of defined im (7.12) a 05101650908 quantile 54) 557 576 629 670678 698 ‘We then compute the eigenvalues of G*, namely 3j,. finally A/D, (7.12) {he hootstrap cetiation of @ ; igure 7.2 i a histogram of B = 200 bootstrap replications “These gave estimated standard error apg =.O47 for 8. The mean ofthe 200 replieations wa 625, ony slightly larger than @ = 619. ‘This indicates that 8 lose to unbierd. ‘The histogram looks reasonably normal, but B ~ 200 is not enough replications to sce the distributional shape cleaey. Some quantiles of the empirical disteibution of the & valies are shown in Table 7.2. [The ath treantle isthe number qa sich that 100% ofthe 6s are less than g(a). Te 30 quad the median “The stondord confidence snterel for the true value of 6, (the value of we would see ifn — 0) 8 66220) ce (with probability 120) (7.19) wvnte °°) ithe 100(1 ~ oth porceuie of a standard nora Titration 2027) = 1.960, 205" = Lots, sb = T.0, ee "This 1s based on an asymptotic theory which extend (5.6) to gon eral summary statisties 4. In our case @€ 619.047 = [.572,.666] with probability 683 € 619+ 1.645, 047 = [542, 696] with probability 90 EXAMPLE 1: TEST SCORE DATA @ 080 055 ag 085 07 am gute 72, 200 tote epltions of he attic B= Ay A he Sevsrp standad ror sD. The athe ine yates fh observed tee 3 el. ‘Chapters 12-14 discuss improved bootstrap confidence intervals that are less reliant on asymptotic normal distribution theory. ‘The eigenvector % corresponding to the largest eigenvalue is called the first principal component of G. Suppase we wanted to summarize esch student’s performance by a single mumber, rather than 5 aumbers, perhaps for grading purposes. % ean be shown that the best single linear combination of the scores i= w= Diure (ray ‘that is, the linear combination vkat uses the components of @ a ‘weight, This linear combination is “best” in the eens that it eap ‘res the largest amount of the variation in the suigival five scores among all possible choices of v If we want a two-niimber summery os BOOTSTRAP STANDARD ERRORS: SOME EXAMPLES for each student, say ( be the second linear combination should (rs) with weights given by the second principal component Vo, the sec- fond eigenvector of G. ‘The weights assigned by the principal composenuts often give rusight into the structure of a multivariate data set. Far the seore sta the interpretation might go as follows: the first principal com- ponent Vy = (51, 87,85, 45, 54) puts positive weights of apprax- imately equal szo on each test score, soy, io roughly equivalent to taking student :s total (or average) score. The second principal ‘component ¥2 = {~.75, —21,.08, 20,55) puts negative weights on the twa closed-book teats and positive weights on the thive open book tests s £36 a contest between a student's open and closed book performances. (A student with a high = ecore did much better ‘on the open book tests than the closed book tests.) "The principal component vectors Vy and Vp are summary ties, just like 6, even though they have several components each, We can use a bootstrap analysis to learn how variable they are ‘The same 200 bootstrap samples that gave tle 6's also gave boot- strap replications 6 and V3. These are calculated as the fist two ‘eigenvectors of G*, (7.11). Table 7.3 shows sjo9, for each component of @ and #2. The first thing we notice is the greater accuracy of #4; the bootstrap standaed error for the components of ¥ ave less than half those of %, Table 7.3 also gives the robust percantile-based bootstrap standard errors 699.4 of Problem 6.6 caleulated for a = 84.00, and .95. For the components of, Sipo.q nearly equals Sapo. This tsn’t tho case for ¥2, particularly vot for Uke first and fh compo- nionls, Figure 7.3 shows what the toute is. ‘his figure (lie cuaptical distaibution of the 200 bootstrap repeat separately for 1 = 1,2, k= 1,2, - ,5. The empirical distributions are indieatod by hozplots. The canter line of the box indicates the ‘median of the distribution; the lower and upper ends of the bax are tha 25th and 75th percentiles; the whiskers extend from the lower land upyer ends of the bax to caver the entire range of the dist Dution, except for points deemed outlier according to a certain definition: these outliers are individually indicated by stars. PXAMPLE TEST SCORE DATA ‘Table 2, Bootstmp standard errors forthe components of the frst and second principal components, ¥1 and V2 sano 14 the usual bootstrap standard error estimate based on B= 200 bootstrap replications Sen 18 the standard error estimate Sép.9 of Problem 6.0, mith B= 200,0 A; likewrse saxo,s0 and seayaos. The wales of Stage for far and Bay fare greatly anflated by a few ontyeng bootstrap replications See Figures 19 and 7. us | fay ion ion fot ts Baw OST 045 02794019 [1898 9G0 129150 Somes 988 DL O28 aT] Ov “am “Wot “IU “Lg Feaosso 055 0M 027 owe Ke | ORL ABD "O67 “ILL “12s Eoores 051 048 029 ta0 047 | 080 “190 “U6 “384 “tam ‘The large values of Sno for dar and fg are soon to be caused by 1a few extreme values of #3, The approximate confidence interval 6 € d+ 20-56 will be more accurate with # equaling S000 rather than Sapa, at least for moderate values of a lke 843. A histogram of the 63, values shows @ normal-shaped central bulge swith mean at ~.74 and slandard deviation 075, with a few points fac away from the bulge. This indicates a small probability, perhaps 19 or 2%, that in, is grossly wrong as an estimate ofthe true value ‘a1 I this geoss error hasn't happened, then 8a, 18 probably within ‘one oF (Wo Sop Units oft Figure 74 graphs the bootstrap replications ¥(8) and 93(0), > = 1,2,---,200, connecting the components of each vector by straight lines, This is los precise than Teble 7.3 or Figure 7.3, but fives a nice visual impression of the increased variability of V2 ‘Three particulae replications labeled “1", "2", “3". ace seen to be coatliers on severe! components, ‘A reader familiar with principal components may now sve that part of the diffeulty with the second eigenvector 38 definitional TTeclosicaly, thw definition of an eigenvector applies as well Lo —v as to v. The computer routine thst calculates eagenvalues and eigen vectors maker a somewhat arbiteasy choice of the signs given to 1,2," . Replications “I” and °2" gave X* matrices Tor which the sign convention of ¥j was reversed. This type of definitional stability i ususlly not important in determining the statistical properties of an estimate (though it 18 nice to be reminded of it by the bootsteap results). Throwing away “I and “2°, ns Sapo does, we sev that € ts atill much loss accurate than 9. » BOOTSTRAP STANDARD ERRORS: SOME EXAMPLES 5 Figure 78. 200 Bootstrap replications ofthe first two prencpal component rectors 1 (left panel) and va (rt panel): for each component of the tin wetons, the tpt muicalen the enapsrd dstrhation of the 210 boatteap replication fy. We aoe thal Wa 18 less unreal than Ws, ham fgrentr boatsteyp vnrasiiy for each consponent. A fw ofthe boatlrap amples gave completely dierent results then the others for v2. 7.8 Example 2: curve fitting In this example we will be estimating a regression function in to ‘ways, by a standard least-squares curve and by a modem curve- fitting algorithan called “loess.” We bogin with a brief review of regressiot theory. Clupler 9 looks at the regression problem again, and gives an alternative bootstrap method for estimating regres sion standard errors, Figure 7.5 shows a typical data set for whieh regression methods are used: n = 164 men took part in an exper~ ‘ment to see if the drug cholostyramine lowered blood cholesterol lovels. The men were supposed to take six packets of cholostyra- mine per day, but many of them actually took much less. The horizontal axis, which we will eall 2", measures Cusnplianee, a a percentage of the intended dose actually taken, ‘2¢= percentage compliance for man 1, 1 = 1,2,--- 164. Compliance was measured by counting the number of uncon- sumed packets that each mau returned. Men who took 0% of the Gove are at the extreme left, those who took 100% are at the ex- treme right. The horizontal axis, labeled “y", is Improvement, the decrease in total blood plasina cholesterol level frm the beginning, EXAMPLE 2: CURVE FITTING n compara component Figure 74. Graphs of tie 200 moter replications of V+ (lft panel) and Va fright panel). The nurnbers 1, 2% en te raght panel follow three of Use repatons 93(6) that gave the most discrepant values fon the iat component. We sce that these replications were also discrepant for other components, porteslarly component 5 to the end of the experiment, ‘y= decrease in blood cholesterol for man t, += 1,2, 184, ‘The full dataset is given an Table 7.4 he gure shows that men who took mae cholostyramive tended to got bigget improvements in their cholesterol level, Just as We right hope. What we see in Figure 7.5, oat let what we think te seo, ie an increase im the average response y as = sreases from Oo 100%. Figure 7.6 shows the data along with wo curves, Feuna(2) and fioca(2) (736) Boch of these is an estimated regression curve. Hore is brief re view of rgression curves and their estimation. By definition the regression of a response variable y on an explanatory variable = is the conditional expectation of y given 2, written *(2) = Bul). any Suppose we ha availnle the entine population 2 of men eligible for the cholostyramine experiment, and obtained the population X= (Xi Xa Xu) of their Compliance-Improverment scores, Ky AY), 7 hy. VN. Then for each value of 2 ay n BOOTSTRAP STANDARD ERRORS: SOME EXAMPLES ‘Table TA. The cholostyramne data, 164 men were supposed to take 6 packets per day ofthe choesterot-owering drug cholastyramine. Compl- ‘ance "z"16 the percentage ofthe yntended dose actually laken. Improve. ment “y" ts the decrease total plasina cholesterol from the begnnang Lil the end of treatment 0825 [a7 180, ean | os o 725] a aso| tar | os tos 0 62529 s300| 7 es00} a5 iss 0 ula 425} 2 “noo | os r600 2 ao} a ws78| 7 a0] 05 2575 2 as00/ 32 850] 74 aia fos 77s 2 “ers|as 335 | ts ae2s | os tars 3 aas[ ss ants | 78 caso | os 77.0 3 fs] se dozs| 7 airs | oe caun 4 a5) iso] rao} — T300 41025 { 34 “L00| 7 ae00 | bane 7 tos} st 775] t ans0 | os 267 8 tors] 35 2575) 1 ‘io | oe t800 3 950] 36 sso] = saan} oe 4750 8 a2s}as 625] se aan | ae 3098 3 625] ar 550 soo | 9% 21.00, 3 175 | a8 25.00 39.00 | 9 re.00 © asa] 4 2036 025 | 97 9.00, 9 aras| 42 33.25 too |r $100 380 | 45 5675 sens | 97 S600 nis | 4s “tas, 1% Burs aso | 47 nao 2m | 98 835 2400 | 50 sso | ss as76 | a — ano0 250s sa2s| a0 S65 | 98 anze soos a7s| 90 2028) a “Son sso | ate} ar 7250 98 wo4re aia} sz aa] or ats | oe ‘aka zor | 33 so2s| 2 tas0 | oe tae va0| a 147s | o2 612s | oe 408 i 1680 | st tras | oe aan | oo th. 2 450] 5¢ ton] 02 sn75 | 99 S275, 2» 900] st is75| 09 7100] 9 ss.00 n ar5| st ta7s os ar75 | 9% F000 ni -2100| s dren | an ston | too 200 2 025 | 60 ars] as 978 | to) 375 m nas) 2 iso | 93 save] iar su00 2 050| 6 mo} er aan] too ens 25 900} e¢ -1450| 9 soco | 100 so1s0 z ists | 61 “a7 | oa 335 | im “enon a 600] 67 “ae25 | 0% oo.00 | 100 75 BXAMPLE 2: CURVE FITTING n groaned copia Figure 7.5, The cholostyramme date. 164 men were supposed to take ‘6 packets per day of the cholesterotlowermg drag cholostyramne; hor zontal azis measures Compliance, percentage of assigned dose actually token; vertical ans measures Improvement, an terms of Blood cholesterol decrease over the course of the expersment. We see that better compliers tended to hae greater Hnprowernent = = 0%, 1%, 2%,» 100%, the regression wuald be the conditional expectation (7.17), ‘sum of ¥, values for men mn with Z, = 2 fa) = Som EY values for mon in ¥ with Zp s oy (2) =" umber of men an a” with Z, (ras) In other words, r(2) is the expectation of ¥ for the subpopulation of men having Z Of course we do nol have available he eutire population 2. We heave the sample 2 = (£1y%2y--- Xie), Where X= (35,4) 8 show in Figure 7.5 and Table 7.4, How can we estimate r(z}? The u BOOTSTRAP STANDARD ERRORS: SOME EXAMPLES : : i : zg : | oa Figare 7.6, Eelsmatod rgression curves of y = faprivement on z= Compliars, The sashid curves Fama(2)y the erinary least-square ‘rundrate regeession of y on 2; the solid curve 48 Finan(2), a computer- Used Tocal linear vegresnion. We are particularly wterestd mi estsmating the true regression r{z) af = = 60%, the averuge Compliance, and at 2 = 100%, full Compliance ‘obvious plug-in estimate som of yi values for men im x with 24 = He) = (719) Tnumber of men in x with = One can imagine drawing vertical strips of width 1% over Fig. sre 7.5, and averaging the y, values within each strip to gat #(2) “The results are shown in Figure 7.7 ‘This ie our frst example where the plug-in prineiple doesn't work. very well. The estimated regression #(z) is much rougher than we ‘expect the population regression r(z) to be. The problem is that EXAMPLE 2: CURVE FITTING ™ — = ao compte Figure £7. Solid curve 1 plug-in estate #2) for the soqression of improvement on compliance; averages of Ye Jor sinps of wu 1% on the aris, as in (7.19). Some strips 2 art not represented emase no ofthe 164 men had 21 = 2. The function #(2) 18 much rougher than we Ee pan amon corte) bs The dase re Fawn?) thre aren't enovgh points in each strip of width 1% r0 estimate r(2) very well. In some strips, lke that for = = 5%, there are no poinis at all. We could make the steip width larger, say’ 10% Instead of 19%, hut this leaves us with only’ a few points plat, and, perhaps, with problems of veriabilty still remaining. A more logant and efficient solution is availabe, based on the method of least-squares. "The method begins by assuming that the population regression fonction, whatever it may be, belongs toa family R of smoot Fune- 6 BOOTSTRAY STANDARD EIRONS: SOME EXAMPLES tions indexed by a vector parameter f = (8p, 14°*- Bp)". For the cholestyramine example we will consider the family of quadratic functions of =, say Reus Rant ral) = Bot Bas + Poe? (720) (821, 82)". Later we wil discuss the choice ofthe quadkatic farnily Rgyas, bt for now we wil just accel a8 given. "The reader can imagine choos teal value of B, say 9 = (0,.75,.005)", and plotting r9{z) on Figure 7.5. We would like the curve rp(2) to be near the data points (2,44) in some overall sense. It 8 particularly convenient for mathematical calculations to measure the closeness of the enrve to the data points in terms ofthe residual squared err, RSE(B) = fyi — role)? (7.21) The residual squared error is obtained by dropping a vertial line from each pomt (sj,y) to the curve rg(s,), and suenming the squaced lengths ofthe veetieals "The method of leat-equares, orguated by Legendre and Gaess in the early 1800's, choowes among the curves in R by muiming the residual equared ceror. The best ting curve in Ria declaced to be r(z), where 9 minimizes RSE(A), RSB(B) = myo RELA) (7.22) “The curve uns) Figure 7.618 r3(s) ~ Ay + Bis + Bre, the best ting tadratic curve forthe elilstyramine date Tuegendre and Gauss discovered wondertal mathematical for- mula for the least squares solution 3. Let C be the 164 x 3 matrix toto ih row is eo= (Lene), (723) and let y be the vector of 164 y, values. ‘Then, in standard matrix notation, B=(Clortety, (724) We will exammme this formula more closely in Chapter 9. For our bootstrap purposes here all we neo to know is that a data set ofr. pairs x= (1,%2)- Xn) Produces a quadratic least-squares curve PEAMPLE 2: CURVE HTETING 7 19(2) via the mapping x —~ r9(z) that happens to be deseibed by (7.28), (7.24) and (7.20) (Ose can think of r4(2) as a staoothed version of the piug-in esti- snate (2)- Suppose that we introased the family R of smooth fune- tions under consideration, sty to Rut the elass of eubye polyno- mils ins, ‘Then the least-squares ealtion r9(+) would come closer to the data points, but would be humpier tan the quadratic last squates curve. AS we considered higher and highe: degree polyno mins, r9(2) would more and more resemble the plug-in estimate (3). Our choice of a quadratic regresion function 13 implicitly a dhoice of how smooth we belive the true regression 7(3) to be Looking at Figure 7.7, we cat see directly that fquu(2) 18 much smoother than ¥(2), bat generally follows F(2) as» function of =. lis cary to believe that the true regression r(2) is a smooth faction of = I is harder to beliove that ts « qvadate function of acres the entie range of = values. ‘The smonthing function “loess, pronounced “Low”, attempts to compromise besween a alobal assumption of frm, like quadratiity, and the purely local averaging of (2). ‘A user of lone at asked to provide a number “a” that will be the proportion of then data points used at each pot of the cot struction, The curve frem(2) 8 Figure 7.6 ysed a = 30. For onrh value of 2, the value of Fog (2) 18 obtained as follows (1) The m points x4 = ( — 100%. Bootstrap standard terrors are given for each value. These were obtained from B = 50 bootstcap ceplications of the algorithm shown in Figure 6. Tn thie case F is the distribution putting probebilty 3/764 on each of the 164 points %, = (2,14). A bootstrap data sets x° (log, -y3Rea), where each x7 equals any one of the 164 mem- bers of x with equal probability. Having obtained x*, we calculated Faui() and Fgga(2), the quadratic and loess repression curves, based on x, Finally, we read of the values fy(60) Fen (60), Foq4(100), aiding (100). The B = 50 values Of Fyyg (GO) bad sample standard error 808, ete, as reported in ‘Table F'. ‘Table 7.5 shows that ficew(2) is substantially les accurate than Fowl EXAMPLE 2: CURVE FYFEING ” 8 TT [ 7 g ® 8 : Ee (: | i " yf pe a a Figure 7.9, ‘The frst 25 bootstrap replications of fail?) left panel, and oe), right panel the mereased vartebilty of onl 2) 48 etndent ousa(2). This is not surprising since fica(#) 18 based on less data than Fauou(z), only a a¢ much, See Problem 7.10, The overall ‘eater variability of fs) is evident m Figure 7.9. tis usofol\a plot the bootstrap curves to see if interesting fea- ‘tures of the original curve maintain themselves under bootstrap sampling. For example, Figure 7.6 shows fioan increasing, much more rapidly from 2 = 80% to 2 ~ 100% than from 2 = 0% to = 80%, The difference in the average slopes is 5 = Focg(100) —Ftomel80) _ Fion(80) — fraes(60) D 0 278 37.50 32.50 — 2403 20 20 = 1.84, (728) ‘The corresponding number for Fauud #8 only 6.17. Most of the: bootstrap loess curves Faqs(2) showed a similar sharp upward bend fat about x = 80%. None of the 50 bootsteap values 0° were leas than 0, the naiituum being 23, with most of the values > 1, se Figure 7.10. ‘At this poist we may legitimately worry that Fayaa(s) 18 loo Py BOOTSIRAP STANDARD ERRORS: SOME EXAMPLES ‘Table 7.5. Values of jyas(2) and Fioon(2) ats = 60% and 2 = 100%, leo bootstrap standard errors base on = 50 bootstrap replications Feuna(60) _fioen(G0) _Funa(100) fese(100) value «27723088. TR.T 0 303 4d 3.85 6a smooth an estimate of the true regression rz). If the value of the true slope difference (10) = r(sv) _ r(s0) — r(60) 0 30 (729) is anywhere near @ = 1.50, then »(2) wil look mote like fis than Funi(s) for = between 60 and 10C, Estimates based of cea(2) tend ta be highly variable, ax in Table 7.5, hut thy alan tend to Ihave smal bias. Both ofthese properties come from the local nature ofthe loess algorithm, which estimates »(2) using only data points with 2 nee 2 ‘The estimate 6 = 1.59 based on Fons bas considerable vari ity, a0 = .61, but Figure 7-10 strongly suggests that the tue 6, whatever it may bo, i greater thas the value 8 = .17 based cn Faas. We will examine this type of argument more closely in Chapters 12-14 on bootstrap confidence intervals ‘Table 7.5 suggests that we should also worry about the esti Fawn(60) 20 Fyyns(100), which may be substantially on low. One pion is 19 cose higher palynonnil models sue a cubic, quartic, tc. Blaborate theories of model building have been prt forth, in an effor to say when to go on to a bigger model land when to stop. We will consider regrssion mde further in Chapter 0, where the cholesterol data will be looked at again. The simple bootatrap estimates of varinbility discussed sa this chapter are often » useful step toward understanding resssion model, particulasly nontraditional ones Hike Foca). AN EXAMPLE OF BOOTSTRAP FAILURE TA An example of bootstrap failure * Suppose we have data Xy,X2,...X, ftom a uni tion on (0,0). The macimur likelihood estimate 4 is the largest sample value Xiq). We generated a sample of 50 uniform muro bets in the range (0,1), and computed = 0.988, The left panet of Figure 7.11 shows a histogram of 2000 bootstrap replications of O° obtained by sampling with replacement from the data. The right panel shows 2000 parametric bootstrap replications obtained by sampling Grom the uniform distribution on (0,4). It is evident, thal the left histogram is a poor approximation to the right his- ‘togram. In pacticular, the left histogram has a lacge probability mass at @: 695 of the values 6* equaled 6, In general, it is easy to show that Prob(@" = 6) = 1 (1~1/n)" —» 16°? = 632 fas n+ co. However, iu the parametric seting of the night panel, Prob(@" = 6) "What goes wrong with the aouparametric bootstrap? ‘The dili- culty occuts because the empirical distribution function F is not ‘ good estimate ofthe true distribution F in the extreme tail. Bi- ther parametric knawledge of F ot some smoothing of Fis needed to rectify matters. Details and references on this problem may’ be found in Beran and Ducharme (1991, page 23). "The nonparanel- ric bootstrap can fail in other examples in which @ depends on the smoothness of F. For example, if 9 is the number of atoms of F, then 9 = n is a poor estimate oft 7.5 Bibliographic notes Principal components analysis is deecribed in most books ot smal tivariate analysis, for example Anderson (1958), Mardia, Kent and Bibby (1979), or Morrison (1975). Advanced statistical aspects of the bootstrap analysis of a covariance matrix may be found fn: erat ann Stnstava (1985). Corvette is descr bed in Es bank (1988), Iiedle (1990), sue Hastie aud Tibshiran (1990), ‘The loess method 18 due to Cleveland (1979), and is described Chambers and Hastie (1991), Hardle (1990) and Hall (1902) dis ‘cuss methods for bootstrapping curve estimates, and give a mune ber of further references. Efron and Feldman (1091) discuss the cholortyramine data and the use of compliance es an explanatory reading ee OOTSTRAP STANDARD ERRORS: SOME EXAMPLES NE tha Soper Figuee 7.29. Fifty hotstmp replications of the slope difference statistic (7.28). All of the yatwes were pontive, and most were greater than 1. ‘Dae bootstrap standard ervor estimate ts fyo(@) 61. The vetial bine ve drawn af = 1.03, variable, Leger, Politis and Romano (1999) give a number of ex- anmples illustrating the use of the bootstrap. 1.6 Problems TA The sample covariance matrix of multivariate data XigGy 14Xq) when each x, 18 a peimensional vector, 8 often defined tobe the p xp matin having th element ay =, Eu Liew a) —t) b= 1,2,- here #5 = D2 ay/n for y= 1,2) p. This ites from ‘he empirical covariance madre G, (7.4) in dividing by = father tian (0) What ote feat row of $ forthe seore data? PROBLEMS, ss alll) __all ots 05750985 095 ase 087 098 ase Parana enor Figure 711. The lft pane! shows histogram of £000 bootstrap epic tions of — Xo) oblamed by sampling with replacement from a sample 2130 sor onter. The rag pan howe 2000 paranetre utr Fencatons oblaced by snp fr th naar dar aton om (0). () The following fact is proved in linear algebra: the cigonvalues of matrix cM oqual c times the eigenvalses of M for any coustant e. (The eigenvectors of €M equal those of M.) What are the eigenvalues of for the sore data? What is 6, (7.8)? 72 (a) Whatis the sample correlation coefficient between the imechaaies and vectors test scores? Between vectors and algebra? (b). Whatis the sample correlation coefficient between the algebra test score and the sum of the mechanics and vee- tors tost secres? (Hint: Bf(x+-y)2] = B(z=)-+ (y=) and Bile + u)"] = Ba") + 28(2y) + By").) 7.3 Calculate the probability that any particular row ofthe 88 x 5 data matrix X appears exactly & times in # bootsteap matrix X, for k 74 A random varinhle 2 is said to have the Potsson distribution. swith expectation parameter ex Pod), (7.30) 16 uw BOOISTRAP STANDARD ERROMS: SOME EXAMPLES If the sample apace of 2 the non-negative integees, and Haye Proble=}= "SP for k= 01,2, (731) A useful approximation for a binomial distribution Bi(np) is the Poisson distribution with A= np, Po(nr) (73) ‘The approximation in (7.82) becomes more accurate & n gots large and p gets small Bi(n,) (8). Suppose x* = (x}.x},-+-423) ie a bootstrap sample obtained from x = (Xp,X2,""-, Xn). What is the Pois- son approximation for the probability that any particular member of x appears exactly k times in x"? (b) Give @ numerical comparison with your answer to Problets 7.3} Notice viat mn the right panel of Figure 7.4, the main bundle ‘of bootstrap curves notably narrower half way between “I” ‘sud °2" on the horizontal exis. Suggest a reason Why. "The sample correlation matrer corresponding, to G, (7.4), is the matrix C having jkth element GilGiy Gael? b= 120° 5. (788) Principal cosaponent analyses are often done in terms of the exgenvalues and vectors of C rather than G. Carry out a bootstrap analysis of the prinsipal components based on C, and produce the corvesponiing plots to Figures 7.3 and 7.4 Diseuss any diflerencas between the two analyses A generalized version of (7.20), called the linear regression. ‘motel, assumes that yy the sth observed value of the re- sponse variable, depends ona covarsate wectoy. = (e1se4a,2569) and a parameter vector (8:,Bay-"- sy). The covariate €, is observable, but 8 is not. ‘The expectation of y is assumed to be the linear fonetion 8 = ¥ esi (7.34) PROBLEMS, 6 18 79 710 {tn (7.20), ©. = (142422), 8 = (Bohs 8)", and p = 3] Legendre and Gauss showed that @ = (C7C)"'C"y mun izes 50h ~ 648} that is B as given by (7.24), he Ieas-eares cotimate of 8. Here Cis the mp matsix with lr row ey, assumed to be of fll rank, and y is the vector ‘responses. Use this result to prove that j= 9 mininiges Schelde ny? among all choles of 1 For conventent notation, let 7 equal Raves equal the family of cubic functions of, Ry Gf quattie fanetions, ete. Define A(j) asthe least-squares es timate off in the class R, 0 Aj) in @ 7+ 1 dimensional vector, nd Jet RSEY(B) = 52yuy(04~raun(@0)? (2) Why is RSE,(9) « non-mereasing function of j? (6) Suppose that all n of the = values are distines. What isthe limiting value of RSE,(P), and for what value of it reached. (Hint: consider the potysomial ny, TTS (x all (c)' Suppose the 2, are not distinct, as in Table 7.4. What isthe lating value of RSE, (A)? Problem 7.80 says that increasing the class of polynomials decreases the residual errer of the fit. Give an intuitive ar sument why’ rj,5(2) might be a poor estimate of the true regression Func ons rf) if ws tthe y He very Ines ‘The estimate Fows(2) In Table 7.5 has greater standard et- ror than Faua(2), but it only uses 30% of the available data. Suppose we randamky selected! 205 of the (2, ys) pais fox Table 7-4, fit a quadratic least-squares regression to this data, and called the eurve Fayg(3). Make a reasonable guess ‘6 to what 5p would be for Fapx (3), = = 60 and 100. j Indicates a dificult or mote advanced problem. CHAPTERS More complicated data structures 8.1 Introduetion The bootsteap algorithm of Figure 6.1 is based on the simplest possible probability model for random data: the ove-sample model, where a single unknown probability distribution F produces the data x by random sampling Pox alentn 20) (sa) ‘The individual data points 2, in (8.1) can themselves be quite ‘complex, perhaps being sumbers of vectors or maps oF images or anything at all, but the probability mechanism is simple, Many data analysis problems involve more complicated data structues, These structures have names like time series, analyst of variance, regression models, multi sample problems, censored data, stratified sampling, and so on. The bootstrap algorithm ean be adapted to general data structures, as is diseussed here and in Chapter 9, 8.2 One-sample problems Figave 8.1 is a schematic diagram of the bootstrap method as it applies to one sample problems. On the left 1s the real werd, ‘where an unknown distribution F hae given the observed data x = (#122 tq) by random sampling. We have calculated a statitie of interest from x, @ = a(x), and wish to know something ‘about és statistical behavior, perhaps its standard error ser (0) ‘On the right side of the diagram is the bootstrap world, to use David Freedman's evocative temiaology: In the bootstrap world, the empirical distribution” F gives bootatrap » samples (aj, 2,---.2}) by random sarmpling, from which we ealeu- ONE-SAMPLE PROBLEMS co — a REAL WORLD BOOTSTRAP WORLD, SSE neers nano si Oana Sao Fe ety 22 ag Pe eds ow | eee) Booty Ropleston FFiguee 8.1. A schematic diagrom of the bootstrap ab it applies to onc- tammple protlems. In the real world, the unknown probability distribution F gues the data x = (8i,23,"+ x) by random sampling: from x we calculate the statetic of tnterest 6 = s(x), In the Boolstrap world, F generates x* by random sampling, ging 6° = a(x"). There x8 only one ‘observed value of 8, but we can generale 06 many bootstrap replications & as affordable. The eructal step nthe bootetrap process ug “—>", the process by which we construct from x an estimate P of the unknown population F late boctstrap replications of the statistic of mterest, & = s(x") The big advantage of the bootstrap world is that we can calculate 8 many replications of as we want, of at leat ae many as we fan alford, Ths allows us to do probabilistic calculations dzectly, for exaraple using the observed tatiability ofthe 8" to estimate the unobservable quantity sep(@) ‘The double arrow in Figure 8.1 indicates the esleuation of F from F- Conceptually, this is the crucial step inthe bootsteap pro cess, even (hough its computationally imple. Every othr patt of the bootstrap pictuce i defined by analogy: F gives x by random sampling, s0 F gives x° hy randoun sampling 6 obtained fom via the function s(x), s0 9 i obtained feom x" in the same wa Bootstrap calculations for move complex probability mechanisms turn out to be straightforward, onus we Kuow how to earry out {he double artow proces estimating the eive probability mech ais from the data, Torvanately this is easy to do forall of the 88 MORE COMPLICATED DATA STRUCTURES common data smcures "To foclitate the study of more complicated data structures, we will use the notation Pox (8.2) fo mdicate Chat an unkao observed data sot X. a probability model I has yielded the 8.8 ‘The two-sample problem ‘To understand the notation of (8.2), consider the mouse data of Table 2.1. The probability model P ean be thought of as a pair of probability distributions F and G, the frst for the TeoaLment group and the second for the Control group, FG) (3) Let n= (21.20). sp) indicate the Treatment observations, and y= (Muda stn) fadiate the Control observations with m= 7 and m9 (different notation than on page 10) Then the observed data comprises # and y, zy) (64) We can think of x as a 16 dimensional vector, as long as we ro- member thatthe first seven coordinates come from F and the last nine come from C. The mapping, P — x is deseribed by F—2 independently of G—y es) In other words, 2 isa randota sanple uf size 7 from Fy is a cancom sample of size 9 from G, with 2 and y mutually dependent of each ther. This setup is called a two-sample problem, In this cave it is enry to estimate the probebility mechanism P Let F and G be the empirical distributions based on and y, respectively. Then the natural estimate of P= (F,G) is P= (RG) (60) Having obtained P, the definition of a bootstrap sample x* is obvious: the arrow in Pax ws) rnust mean the same thing as the arrow in P + x, (8.2). In the ‘THE TWO.SAMPLE PROBLEM 80 two-sample problem, (8.5), we have x* = (2",y*) where Pa" independently of G—y* (88) ‘The sample swes for 2* and y* are the same as thove for z aud y 5 dhe hislogran of = 1400 bootsizap replica dons of the statistic 6 = fa fyae B.86'~ 56.22 = 80.68, 69) the difference of the means between the Treatment and Contsol froups for the mouse data, This statistic estimates the paraneter Er(2) ~ Baty). (8.10) £0 is really much greater than 0, a8 (8.0) seems to indicate, then the Treatment is a big improveinent aver the Control, However the bootstrap estimate of standard error for @ = 30.63 1s Bro = (1) — HOP /1399}4? he Hy 6.85, (B.AL) so d is only 1.14 standard errors above zero, 1.14 = 3063/2635. “This would uot usualy Ve eousideredetrong evideuce Us the rue value of @ is greater than 0. ‘The bootstrap replications of 6* were obtained by using a ran dom number generator to carry out (8.8). Each bootstrap sample se was computed a X= (BLY) = (Zar tiae Fey Myer Yaad (8.12) where (4,42:-'.t7] was a random sample of ize 7 from the inte- gers 1,2, -»,7, and (71,32, °-,J9) was an independently selected atom seme of sie 0 from the integers 1,2, 0. For instance, the frst bootstrap sample had (tytn. svi) = (682,1.6,3) and (1538-38) = (2.8 2,9,6,7,344,2). “The standard error of@ can be written as sop(@) to indicate its dependence on the unknown probability mechanism P = (FG). ‘The bootstrap estimate of sep (6) is the plug-in estimate 20 (0°) = (varp(s* — 9°}? (8.13) ‘As in Chapter 6, we approximate the ideal bootstrap estimate sep(0") by sp of equation (6.6), m this ease with B= 1400. The ~ MORE COMPLICNSED DATA STRUCTURES = UA *° ° o 100 Figure §2. 1400 bootstrap replications of the difference between the Treatment and Control means fr the mouse data of Table 2.1; Boot- strop estimated standard error was Seyo0 = 26.85, s0 the observed value 6 "30.63 (troken line) ws only 11M standand errors ahove sero, 13.1% of the 1400 0° values wore tess than zero. This 1s not srall enbugh to be considered convincing evens that ie Treatment worked beter than ‘he Contr fact that 6” is compted from two samples, 2” and y*, doesu't affect Uofiuition (66), namely Sp = (A606) - EAB — YEP. 8.4 More general data structures Figure 83 38 a version of Figure 8.1 that applies to genczal data structures P — x. There is not much coneeptual difference between the two figures, except for the level of generality involved. In the real world, an unknown probability mechanism P gives an observed data set X, according to the rule of construction indieased by the MORE GENERAL DATA STRUCTURES o — — peace (eee — SEE comnesom | Si ‘ice Pe wet tpt | fa) Bm vag asa | ant) 2onetap Recon, Figure 8. Schemetie diagram of the bootstrap applied to proHlerns wath fa general data structure P—» x. The crucial step "=e" praiuces an fstomate P of the entire probability mechanssm P from the observe data 2X, The rest af the tootsimp piewure 18 determmned by the real world: SB ie the some as “Po x", the mapping from x” + 9, 300), 1 the same as the mapping from x —» 8, s(x) arrow “—." In specific applications we need to define the arrow ‘more carefull, ag in (8.5) for the two-sample problem. The data ot 2¢ may a0 longer be a single vector. It has a form dependent fon the data structure, for example x = (zy) in the two-sample problem, Having observed x, we calculate a statistic of interest 6 fiom x according to the funetion s(:). The bootstrap side of Figure 8.3 1s defined by the analogous ‘quantities in the real world: the arrow in P — x* is defined to mean the same thing as the arrow in P — x. And the function ‘mapping x" to 6° i the same function s(-) as from x to 0 ‘Two practical problems arise n actually carrying out a bootstrap analysis based on Figure 8.3 (1) We need to estimate the eutire probability mechanin P from the observed data x. This is the step indicated by the double arrow, x =» P. It is surprisingly easy to ds for most familiar data siructurea. No general prescription is posible, but quite natural ad hhoc solutions are available in each ease for example P = (FG) for the two-sample problem, More exemples are given in this chapter a MORE COMPLICATED DATA STRUCTURES "Table 8.1. The lutenszing hormone dete, period level | period _Jovel | period level | period level T 3a; ie 22f 5 23) aT Os 2 24! ou is] 26 20) 33 1a 3 24] 15 32) 2% 20| 39 24 422] ww a2] 2% 29) 0 a3 s 21) i 27) 2 29/ 4 35 o 1s} as 22] go 27) 42 35 7 23| 19 22) a 27] 43 a1 8 23) m i9| 32 23) 44 26 9 25) mn 19) 33 26) 4 24 wo 20] 2 1s) 3 24] 46 34 no1o] 23 27] 3 Is} a7 30 wou] oa so] 36 a7] ae ao and the nest (2) We need to simulate bootstap data fom P according to fhe relevant dats structte, This sth step P — x" in Figure 8 “This step is comeeptually sraghtorward, boing Phe seme ns P — x, but can require sme cae in the progrsnming if eormpoatnnal ciency is meossry. (We will soa example in the hieniing hormone analyss below.) Usually the generation of the bootstrap dale P +3 roqite less time, offen mc let tine, than the calculation of = a0). 8.5 Example: lutenizing hormone Figure 8.4 shows a set of levels yy of « hutenizing hormone for each ‘of 48 time periods, talon from Diggle (1000); the dataset i ised in ‘Table 8.1. These are hormone levels ineasored! ta healthy woman 1m 10 minute intervals over a period of & hours. The lutenizing hormone is one of the hormones that orchestrate the menstrual eyele and hence itis important to understand its daily variation, 1Wis clear that the hormone levels are not a random sample from. any distribution. There 1s much toa much structure in Figute 8.4 These data are an example of a time senses a data strocture for which nearby values of the time paranioterf ndieate elowely related EXAMPLD: LUTENIZING HORMONE a elo A APY eV MIN “Y ’ = » « 18.4. The Iutentzeng hormone data. Level of hatemssing hormones plated versus time period t fort from 1 to 48. In Uns plot and other lots the pores are connected by tines to enhance vsbiily. The average alue ji 24 18 mdicted by a dashed Hine. Table 8.1 sts the data values of the measured. quantity yy. Many interesting probabilistic models have been used to analyze time series, We will begin here with the simplest model, a first arder autoregressive scheme. Lot be the expectation of ys, assumed to be the same for all times ¢, and define the centered messueemeuts Eun (es) All ofthe # have expectation 0. A fist-order autoregressive scheme is one in which each 2; isa linear combination of the previous value 2c, and an independent disturbance term és, UU FLUFR WV (815) Hore 2 is an unknown parameter, @ real nuiber between —1 and 1 The disturbances ¢ In (8.18) are assumned to be a random sample from an unknown distribution F vith expectation 0, Fe (euseugss tsa, sev) {Er() =9], (8.16) he dates U and V are the beginning and end of the time period under analysis. Here we have UH? mol V =a. 7 2a Gia te for o MORE COMPLICATED DATA STRUCTURES Notice that the first equstion in (8.15) is Bute (8.18) 50 we nen the number zu-1 to get the aulorogressive process started. In our ease, 2y-1 = #1 Suppose we beliove that model (8.15), (8:16), the first-order au, toregressive process, applies to the lutenizing hormone data. How ‘can we estimate the valve off from the dat? Oue answer is based fon a least-squares approneh. Fitst of all, We estimate te expec- tation pin (1d) by the observed average g (lus is 2.4 for the Iuteuizing hormone data), and sot wound (6.19) {or all values of t, We will ignore the difference between definitions (8.14) and (8.19) in wht follows, see Problema 84. ‘Suppose that b is any guess for the true value of fin (8.15) Define the residual squared error for this guess to be Yer - bay. (8.20) Using (8:16), and the fact that Eip(c) = 6, iti easy to show that TSE(B) has expectation E(RSB() = (0 8)7E(P ay s+ (V — T+ Iyecee). This is minimized when b equals the true value 2. We are led to believe that RSE(6) should achieve its minimum somevthere near the true sale of 2. ‘Given the tine series data, we can calculate RSE(B) asa funetion ‘of, and choose the minizaizing value to be our estimate of 8, RSE(A) = mn RSE) san RSEO ‘The lutenizayg boron data has least-squares estimate A= 586, (22) How accurate is tne estimate 3? We ean use the general boot strap procedure of Figure 8.3 to answer this question. The prob- ality mechanism P described in (8.15), (8.16) has tvo unksown cloments, 8 and F, say P= (9.F). (Here we axe considering jin “For amply of expotuon we un leat gquare rather than ort theory {psn Laliico eatimatton Te diferencetetvecn the Gn eolatre ually mal, EXAMPLE: LUTENIZING HORMONE 08 (G.14) as known and equal to g.) The data x consist of the obser- ations yy and their corresponding time periods t, We know that the rule of cous\ze-tion P — x is described by (8.19)-(8.19). The statistic of interest is B v0 the mappaig s() a ven implicitly by (6.2). One step remains before we can carry out the bootstrap algo- rithm: the double-azrow step x —> Py in which P = (3,F) fstimated froin the data, Now les slteady boon estimated by 8, (8.21), a0-we need only estimate the disaibution F of the di turbances. If we knew f, then we could calculate ¢ = # — 3: for every t, and estimate F by the empirical distbution of the 7's We don't know 9, but we can use the estimated valu of to compute approimate disturbances aati for UU+LU+2 VV (8.23) Let T= V~U +1, the number of terms in (8:28); 7 = 47 fe the choice (8.17). The obvious estimate of F is F, the empirical dlistetbution ofthe approximate disturbances, B > probability 1/7 on é for UT the Ve (824) Figuce 8. shows the histogram of the = 47 approximate dis- turbances és = 21 — Bea for the first-order autoregressive scheme applied to the lutenizing data for years 2to 48 ‘Wess that the distribution Fe not normal, having a long til to the right. The distribution bas taean 0.006 and standard deviation (0.454. 1¢ no accident that the mean of Fis neas tt, see Prablera 855. IFiL wast, we could honor the defination Bp(6} = 0 (8.16) by centering Fs Usa hy caus enc probability yom a (8:23) froin és o & ~€, where ¢ = 34 &/T. [Now we are ready to carry out a bootstrap accuracy analysis of the estimate ) = 0.586. A bootstrap daa set P — x" is generated by following through definitions (8.15)-(8.16), except with P = (8,2) replacing P = (8, ). We began sit the initial valve n=, whichis considered to be a fixed constant. (ike the sample size nin the one-samaple problem). The bootstrap time series =f 1s caleulated recursively 96 MORE COMPLICATED DATA STRUCTURES L [L Jit Figure 85. Histogram of the Si appmzamate disturbances é = 24 ~ Bry, for t= 2 through 48; B equals 0.536 the least-squares estimate {for the first-order qutoregessie scheme. The distrtation 18 longtaled to the right. The disturtances averaged 0006, with a standar# demation of 0.484, and so are neoly centered et ser. a= fata B= Bl+g a= bate Ba = baiy te (28) ‘The bootstrap disturbance terms ¢? are a random sample from F PG, cial (820) In other words, each ¢f equals any one of the T approximate dis. tuurbances (8.23) with probability 1/T- ‘The bootstrap procos: (8.25)-(8.28) was run B ~ 200 times, ving 200 bootstrap time-series, Each Uf these gave # bootstrap. EXAMPLE: LUTENIZING HORMONE or replication 9° forthe least-squares estimate 9 (8.21) Figure 86 shows the histogram of the 20, §* values. The bootstrap standard escort fori san9 = O16 The histogram ty oral in shape Tn frsorder autoregressive seliome, each 24 depends on its predesotors only rough tne value of 5-1 (lst kd of depen dence 1s known as @ first-order Markov process.) A second-order Sttoreesie scheme extends the dependence back toa Bini thnate for C= UU 41042, Vi (827) Here 9 = ((,.2)* is a two-dimensional unkuown parameter vec- tor. The e are independent random disturbances as in (8.16). Cor- responding to (8.18) ate initial equations Bvt Beate Biz ~ Bastiat vss (628) so we need the numbers zip and =y-1 £0 get started, Now 7 3,V =, andT=V—U+1=46 ‘The leastquares approach leads directly to an estimate of tte vector 9. Let 2 be the T-dimensional veetor (2u.u44.° sv)" and let Z be the T2 matrix with fist coluran (277-1, >, 21-1)" second colunan (21-2, u—1420,~°*52v=a)". Then the least-squares estimate of f is Baa (329) For the lutensing hormone data, the second-order autoregressive scheme had least-squares estimates = (0.771, -0.202)" (630) Figurn 8.7 shows histograms of JB = 200 bootstrap replications of the two components of f= (1,45). The bootstrap standard Fron(Gh) = OAT, senn( ja) = 0.148. 831) Both histograms age roughly normal in shape. Problem 8.7 asks the seader to describe the steps leading to Fgete 8.7. "A second-order autoregresive acheme with Bz = 0 isa Srst- order autoregressive scheme In doing the accutary analtis forthe fecond-ccder scheme, we check lo soe If yi owe than 2 marae os MORE COMPLICATED DATA STRUCTURES Figure 8.6. Histagram of B = 200 bootstrap replications of B the first- ‘order autoregressive parameter estimate forthe lytonting hormone date; {from (8.25), (3.26) the lotstra estimate of standard erro w Seo0 (0.116. The broken tine 9 drawn atthe atserved value = 0.58. sre away fron whic would usally be iulerpreted a8 fs bexag uot significatly alleen tas zero, Hlee by is about 1.5 standard ‘eczors say from 0, n which caso we have no ston evidence that 8 Grst-order autoregressive scheme dos not give a reasonable rep resentation of the lntenizing hormone dats Do we now for sure that the fstorder scheme gives & good representation ofthe lutenizing hormone series? We cannot defi {Svely answer ths question without considering still more general models such as higher-order autoregressive schemes. A rough aa ‘wer can be obtained by comparison of the bootstrap tine series with the aftual sores of Figure 8. Figure 88 shows the fst four bootstrap seties from the fst order acer, left pase, and four ealization® obtained by sampling wth replacement from the orig inal time series, right pauel. ‘The orginal data of Figite 8 looks Guite abit like the left panel realizations, and not at all like the Felt panel realizations Further analywis shows that the AR(1) model provides a rea THE MOVING BLOCKS booTsTRAP ~ Ms, ln a Figure 87. B = 200 bootstrap replications of = (0.771, ~0:222), the second-order autoregressive parameter vector entimate forthe Iutensing hormone data, As the other nstegrams, a iroken tine a drawn at the porameter evimate. The fastograms are roughly normal sm shape. sonable fit to these data. However, we would need longer a time series to discriminate effectively between different models for Uhis hormone eueral, i pays lo remember that mathematical models are conveniently simplified representations of complicated real-world phenomena, and are usually not periectly correct. Often some cot promise is necessary between the complication ofthe model and the scientific needs of the investigation. Bootstrap methods are partic- ularly useful if complicated models soem necassary, since mathe- matical complication is no impediment to a bootstrap analysis of ‘aceuracy. 8.6 The moving blacks bootstrap 1a this lst section we briefly describe a dillerent method for boot: strappang time seri. Rather than fitting a model and then sam. pling feom the residuals this method take an approach closes 0 that zed for one-sample problems. ‘The idea is ilustrated in Fig. xe 89. The oripual tne series is represented by the black czelen ‘To gonerate a bootstrap realization of the fime series (white ci- cles} we choose a block length (*3” in the diagram) and consider All possible contigeous blocks of Une length. We sample with re- en MORE COMPLICATED DATA STRUCTURES LA oh Ph aA V Vv Moa Melted Herd ya A Ala\ leh pray ey ‘THE MOVING BLOCKS BOOTSTRAP. 0 eeececresecececeoe Figure 8.9. A schematic diagram of the moving blacks bootstrap forte series. The black circles are the orginal tune serves. A boostrop rect tsation of the time series flute ceces) +8 generated by choosing a Block length (°9" onthe diagrayn) anu Sampling with replacement from all pos sible contiguous block of ths length placement froin these blacks and paste them together Lo form the bootstrap time series. Just enough blocks are sampled to obtain a series of roughly the sane length a8 the orignal series, I tke block length is &, then we choose k blocks so that n= kf ‘To lustrate this we eared 1 out forthe luvenizng hormone date. The statistic of interest was the AR(1) least-squares esti- mate 3, We chose a block length of 3, and wed the moving blocks bootstrap te generate a bootstrap realization ofthe hutenizing lot tmone data, Atypical bootstrap realization is shown in Figure 8.10, and it looks quit sllar to the orginal time eeiee. We then ft the AR(1) model t this bootstrap time series, and eatimated the [AR(1) coeliciont 5°. This entire process as repeated B = 200 times. (Note that the AR) model i being wed here to estimate B, but is not being used in the generation of the bootsrap tea {zation ofthe te series) The resulting bootstrap standard error sas dap9(3) = 0.1202 This is approximately the same asthe value (0.116 obtained from AR(1) generated samples a the previous gec- tion. Inereasing the block size to 6 eased this vlc to docrense to ona ‘What isthe justification for the movirg blocks bootstrap? As we have seen earlier, we cannot simply resale from the individ 2 arma notexnciydimaible by we need to multiply the bootstrap Mandar Tj to non fn the dilleence mlegtin ofthe erie, Ts ror 1 for €= 8 and O07 for (= an our example, tnd beace mando Tie Siterenee 02 MORE COMPLICATED DATA STRUCTURES pr Figure 810. 4 bootstrap realisation of the Iutentrng hormone dats, sing the monang books btstng wath hla teugth equa Ly observations, as this would destroy the correlation that we're trying ‘to capture. (Using a block size of one corresponds to sampling with replacement from the data, and gave 0.139 for the standard exor ‘stimate,) With the moving blocks bootstrap, the ides 1s to choose ‘8 block size £ laxge enough so that observations more than ¢ time units apart will be nesely independent. By sampling the blocks of length ¢, we retain the correlation present sn observations les than, units apart. ‘The moving blocks bootstrap has the advantage of being less “model dependent” than the bootstrapping of residuals approach used earlier. AS we have seen, the latter method is dependent on the model that is fit to the original sime series (for example an AR(1) or AR(2) model). However the choice of block size £ ean be ive important, and effective metiods for making this choice have not yet been developed. Tn the regression problem discussed in the next chapter, we en- counter different methods for bootstrapping that are analogous to the approaches for time series that we have discussed here, 8.7 Bibliographic notes ‘The nnalysis of time series a described in many books, includ wg Box and Jenkins (1970), Chatfield (1980) and Diggte (1990) ‘Application of the bootstrap to time tories t discussed in Efron PROBLEMS 10s and Tibshirani (1986); the moving blocks method and related tech- niques ean be found in Carlstein (1986), Kénsch (1989), Lia and ingh (1992) aud Poltis and Romano (1992) 8.8 Problems 8. Iz and y are independent ofeach other, then var(#— 9) ~ var(=) + s00(9) (8) Hor could wo use the one-sample bootstrap algorithm ‘of Chapter 6 to estimate se), for 6 = ¥—i asin (8)? (6) The booker data going inta en = 2685, (8.11), consisted of 1400 x 16 mats, each Tow of which wes at independent replication of (8.12), Say how your answer to {a) gout! bo inplemevte an Lore the rant, Wonk Uke ner wl ea 20.8658 £82 Suppose the mouse experiment was actually conducted as follows: lage population of candidate laboratory mice were identified, say = (U,,U2, - , Uy); 8 vandam sample of size 16ywas selected, say u(y thos Hae) fall, fale comm tras independeatly flipped sitet times, with ue assed to Treatment or Control as the fh flip was heads or tails. Dis cus how wel the two-sample model (9.5) fils ths stuation A Arsuaing model (8.14)8.10), show that B(RSE(D) = (2 — APOEE ata) + (VU + Nvarr(o 8:4 The bootstrap anaiysia (8.25) was carried ont asig = 24 was the true vain of = Ely.) Carefully state how to calculate (3) i we take the move honest poiat of view that 1 an untmown parameter, estimated by J. 85 Lot gu equal Diy we/T, Gea equal DY, wT, and de- fine Bas in (819) (8.21), exoept with == yr — Bu (a) Show that Yu - 9) - Bly - wv. » (saz) (b) Why be we expect F, (8.26), to have expectation 104 MORE COMPLICATED DATA STRUCTURES 8.6 Many statistical languages like "S” are designed for vector processing, That is, the command ¢ = a+b to add two long vectors is carried out much more quickly than the loop for (= Lton}fer= a) +} (833) ‘This fact was used to speed the generation of the B = 200 bootstrap replications of the first-order autoreg-essive scheme for the lutenizing hormone data. How? 8.7 Give a detailed desenption of the bootstrap algorithm for the second-order autoregressive scheme. CHAPTERS Regression models 9.1 Introduction ‘Ragression models are among, the most useful and most used af st tistical methods. They allow relatively simple analyses of compli ‘cated situations, where we are trying to sort oul the elfeets of many possible explanatory variables on a teapouse variable. Ia Chapter 7 ‘we use the one-sample bootstrap algorithu: ts analyze the accuracy of a regression analysis for the cholostyramine data of Table 7-4 Here we look at the regression problem tnore critically. The general ‘bootstrap algorithm of Figure 8.3 is followed through, leading to a somewhat different bootstrap analysis for regression jroblens 9.2 The linear regression model ‘Wo bogin with the classic linear regression model, ot tinea? model, going back to Legendre and Gauss easly in the 19th century. ‘Ve data set x for a linear regression model consists of 1 points X15%2y° soa white each x; i itself a pair, say He= (61.4) (oa) Hore e, is a 1x p vector ¢, = (cx1,0j2"° stn) called the covariate vector oF predictor, while y is a real number called the response. Let 44 indicate the conditional expectation of ath response 9, ven the predictor 6, = Blut) (8= 12> -yn). (9.2) ‘The key assumption in the Yiaear model is that ji is a linear com. bination of the components of the predictor ¢,, =e = Doss (ay 106 [REGRESSION MODELS ‘The parameter vector, or regression parameter, = (1,2, Bp)" is unknown, the usual goa of te regression analyss being to infer {from the observed data se = (3, =X). In the quadratic regresion (7.20) for the cholostyramine data, dhe responce y i the improvement for the sth man, the covariate cy is the vector (1.31y32), and 8 = (fy 182)", Nate: ‘The "Yneat” in linear 2e- aression refers to the linear form of the expectation (93). There So contradiction in the fact that the dear model (7.20) is a ‘quadratic fonction of = “The probability structure of the inear model is usually expressed WEB for 112% n (oa) ‘The ervor terms e, in (0.4) are agsumed 0 be a random sample from an unknown error distribution F having expectation 0, © Bel =9) (03) Pola, se) Notice that (9. (9.5) nnply Elules) = E(e¢ + eles) = E(eiBle,) + Blele.) ©, (06) which u the linearity assumption (93). Here we have used the Tact that the conditional expectation Bile) 8 the same as the unconditional expectation Bez] 0, since the eae selected inde pendently of, ‘We want toestimate the regresion parameter vector tom the observed data (61,31) (Ca), ~ +(€nsta) A thal value of 8, sy Dsgives rdual squared error RSE) = Dolor eh (02) as in equation (7.28). The least-squares estimate of lis the value B of b that mimmizes RSE(b), RSE(A) = min(RSE(D)]- (08) Let C be the n x p matrix with ith tow e, (the desim mein), and let y be the vector (y1,28,"°",a4)". Then the least-squares tetimate is the solution to the so-called normal ezuations o'op=c'y os) EXAMPLE: THE HORMONE DATA so? Table 91. The hormone date. Amount an mlligrams of anti snflaramatory harmone remaining wn #7 devices, efter @cerlein number Of hoars of wear. The demces were sarnpled rom & diferent mannfac- tw O15, ealled A, B, and C: Lot C looks like it had greater amounts of remasnang hormone, but 1 alsa was worn the least number of hours. (A regression analysis clarifies he etyation, Jot __ts_amnoont | ote atnount | Yot_nrt_atnount A 9 asl Bate esp Cid ee A a2 a} 6 as ns] Cows ma A ms ua[ Baul ns 297 A toma] Bom as] cen A we m6] Bowe 20] C38 as A (3 A) Baw Bo] Cas A some] Bit al casa A om 8) B was] Cee tT Aa mal Bo asa] c_ias r Bij at aT] a8 aud is given Vy the formula B=(CTCy "Cry. (9.39) 9.3 Example: the hormone data ‘Table 9.1 shows a small data set which is = good candidate for regression analysis. A msdical device for continuously delivering ‘an antiinflammatory bormone has been tested on n = 27 subjects, ‘The response variable ys is the amount of hormone remaining. in the device after woaring, romaining amount of hormone in device, = 1,2 27. u 1 rule (9.10) assumes that © of fll ak 88 will be the cate all ‘four exames. We wil not be using matroorele derivations in what follows. A reader uafuia with ratte deo eam think of tO} esnply fe func wach ips the Teapases py. v=o atd the priors Ciyezsn sen, and outpas the leat sare atiatefSanlriy the Boot Soap mod» wa Soto O- va kore tale sein of ie trate nati Sree (23) 108 REGRESSION MODELS ‘There are two predictor vantables, = mnniber of hours the i! device was wor, and 1, = manufacturing lot of device 4 "The doviees tested were randomly selected from three different manufacturing lots, éalled A, B, and C. ‘The lett panel of Figure 8.1 is a scatterplot of the 27 points (sts) = (hours,,amount,), with the lot symbol L, used ae the plotting character. Wesee that longer hours of wen leads to smaller amounts of remaining hormone, as might be expected, We can ‘quantify this observation by a segression analysis, Consider the model where the expectation of y is a linear fane- tion of 2, m= Elle) = Bot Maz 11,2427 (OA) This model ignores the lot L,: it is of form (9.8), with covariate vectors of dimension p = 2, = (1,5) (12) ‘The unknown parameter vector has been labeled (35,4) instead of (diss) a0 that subscripts match powers of 2 asin (7:20). The ‘ormial equations (0.10) give Jeastaquares estimate B= (34.17, —0674)" (ous) ‘The estimated lonst-squares regression line fig = 1B = Bo + Bix (9.14) 4s plotted in the right panel of Figure 9.1. Among all possible lies that could be drawn, this line minimizes the sum of 27 squared vertical distances from the points to the line. ow accurate isthe estimated parameter vector 8? An extremely vseful formula, also dating back to Legendre and Gauss, provides the answer. Let G be the pp snner product matri, G-cro, (0.15) the matrix with clement ghy = Fas Gny i 00 fh ealuma 3 Let ‘7h the variance ofthe error freien model (Bt), ab = vare(@. (9.16) EXAMPLE: THE HORMONE DATA 109 16 100 0 mo 0 Figure 9:1. Seatlerplot of the hormone data ports (ith) = (ours, amount), labeled ty lot It clear that longer hours of we result lower amounts of remaining hormone. The rg panel shows the leastsquares repression of th ont fis = Ba b By? where B= (4.17, —0574) Then the standard ereoy of the jth component of A, the square root of its variance, is s0(f,) = on VOR ost when G2 ae tho js dingo loment ofthe inverse smatiix G “The last fora sa generalization of forma (34) fr th ata- dard extor ofa sampie mean, ser(z) = n/n, see Problem 9 In practic, opis etimnted by formula alogous to (5.11), bp = (Do lv — ec) /n}? = {RSE(B)/ny”? (9.18) ‘we by a bias-cortected version of éF, Be = {RSE(B)/(re— p)}"? (o.19) ‘The corresponding estimated standard ertors for the components of Bare (8) —¢eVGH or HIB) = aeVGHT (8.20) “The telationsbip between #(f) and (2) is the same os that between formule (5:12) nad (22) forthe mes no REGRESSION ODE Table 9.2, Results of tang model (9.11) to the hormone data Estimate fe aT mn 063.0045 ‘Table 93. Results of iting model (9.21) to the hormone data, Estimate & fy 3213S Be L887 fe 35.00 866 fi 0601 9032 _.0035 Most packaged linear regression programa routinely print out (3,) nlong with the least-aquares estimate 8. Applying such « program to model (9-11) forthe hormone cata given the elt in Tale 92 Took atthe right pel of Figre 9.1, most ofthe point for tot A lie blow tne ited rogresit-lve, while must of tho for Hots B and C ti above te Tine This sggesta a defceney it mote 11). I the model were accurate, We Would expect bent ll Of tack lotto Tie above and half below the Bie line. In the usual termunlogy it loke like there isa ft efet inthe bormone dat Te is eagy to incorporate alot eft Into our near mel. We assume that the conditional expectation of y given Land 2 is of the frm E(yl,2) = 8. + Ais (on) Were By equals on of tive possible values, 34, 8p,Be, depending co which lot the device comes from. This is srllar to wade! (9.11), except that (0-21) allows diferent intercept for each lot, rather than the single mtercept Bp of (9-1). A least-squares analysis of ‘model (9.21) gave the resulls in Tae 8.3, Notice that Ba is several standard errors less then ip and fe. indicating that the devices in lot A contained significantly less hormone APELICATION OF THE BOOTSTRAP ne 9.4 Application of the bootstrap [None of the calculations s0 far require the bootstrap. However it is useful to follow through bootstrap analysis for the linear re- session model. It will tuyn out that ve bactsteap standard error estimates are the same as $0(),), (9.20). Thus reassured that the bootstrap is giving reasonable answers in a cage We can analyze mathematically, we ean go on to apply the bootstcap to more get ‘eral regsession models that ave no mathematical solution: where the regression function is ow-linear inthe parameters 8, asd where we use fitting methods other than least squares. ‘The probability model P —+x for linear regression, as described by (0.4), (25), has two components, P=(0F). (9.22) where is the parameter vector of regreasion coefficients, and F is the probabiliy distribution of the error terms. The general boot strap algorithm of Figure 8.3 requires us to estimate P We already have available 8, the least-squares wimaate of . Haw can we esti- rate F? If were known we could caleulate the erors ¢ ~ yi ei for 1 = 1,2, -n, and estimate F by their empirical distribution, We don't know (, but we can use to caleulate approzumate errors n (0.28) (The @ are also called residuals.) The obvious estimate of F is the fempieical distribution of the és, F. probability 1/moné for += 12> .n, (9.28) Pr Usually P wil have expectation 0 as requited in (0.5), see Problem 95. With P= (1, ) im hand, we know how to ealeuiate bootstrap data sets fr tse linear regaeason model: Px mst mean the saute thing as P —» x, the probability mechani (9.1), (85) 508 the actal data set x. Te generate x", we first select a randota sample of bootstrap error terms Pala 64) (9.25) Each ¢f equals any one of the n values z, with probability 1/ ‘Then the bootstrap responses yf are generated according to (04), 2 sm (9226) Mmebed fore an REGRESSION MODELS The reader should convince himself or herslf that (0.24), (8.28), (0.26) is the same as (9.4), (9.5) except with P = (B, F) replacing P =(8,P). Notice that 3 is a fixed quantity in (9.26), having the same vals for all. ‘The bootstrap data set x* equals (X],%4, x4), where x? (co). Ie may soem strange thatthe covariate vectors ¢, ae the sane for the bootstrap data as forthe actual data, This happens Tease we ae tein thee, ns ted uate, rather that ate alow. (Tho ramupe sie has bea rest this saute way all of ‘our examples.) This poe is farther discused below. ‘The bootstrap least-squares estimate 3” ie the minimizer of the residual squared ereot forthe Bootstrap data, Ybor ea mn Nt eb? (oz ‘The normal equations (9.10), applied to the bootstrap data give B= (CToyptery: (0.28) In this ease we don't need Monte Carlo simulations to figure ‘out hootstrap standard errors forthe components of A. An easy calealation gives a closed form expression fr sep (3) = Sac (3), the ideal bootsteap standard error estimate var(3") (Tq tcharty"}e(CF ey acre, (029) since var(y*) = 621, whore Fis the identity matrix. Therefore RAB) = OVE (0.0) In other words, the bootstrap extiniae of standatd ctror for Bis the same as the usual estinate * (3,), (0.20), 2 This pion hat an(B,) = (232)!H(0,), mtn a the same stsston| see encountered for the ean 2, (5.12) and (22). We could adjust tbe beotrrapstandatdettots by factor (G25) to get te familiar estimates 6B,), bat thi an meer Ue eg i ‘egret situations, The pou gee worsaome only Wp int lage action SESE a ces shes eco nae Re BOOTSTRAPPING PAIRS VS BOOTSTRAPPING RESIDUALS 11 9.5 Bootstrapping pairs vs bootstrapping residuals ‘The reader may bave noticed an interesting fat: we now have te different ways of bootstrapping regression made. ‘The method SFscuased in Chapter 7 bootstrapped the vaste X= (ent) 90 that a bootstrap data set x* was of the form x7 = {(ej Be )e(Crasthels (Counted (oan for 15 t25- say @ rsidom saanpll of he hakegeas J Uhrowgh 1. The ‘method discussed in this chapter, (9.24), (9.25), (9.26) can be called “bootstrapping the residuals.” It produces kootstrap data sets of the form {e113 +f) (c2ye28-+ ba) (env OnB + eq): (9582) Which bootstrap methad 1s better? The answer depends on how fax we trust the linear regression model (0.4), This model says that the error between yy and its mtean j1; ~ €,8 dosn't depend ot €s; it has the same distribution “F" no matter what ¢, may be, This Js a strong assumption, which can fail even if the model for the ‘expectation j, ~ c48 ie connect. It docs fail foF the cholostyssinie date of Figure 74 Figure 9.2 shows regression percentiles for the cliolostyramine data. For example the curve marked “755” sppzoximates the cot ditional 75th percentile of ianprovement y a8 a function af the compliance 2, Near any given value of =, about 75% of the plotted points lie below: the curve, Model (9.4), (9.5) predicts that these curves wil be the same diatasce apart forall values of z.Instead the curves separate as = increases, being twice af far apart at 2 = 100 fas at = 0, To pitt it anather way, the errors ¢, in (8.4) tend to be twiee as big for : = 100 as for 2 = 0. Bocistrapping, palts is less sensitive to assumptions Usa boot strapping residle, The standard error estimate obtained by boo strapping pits, (9.81, gives teasonable answers even if (9.4), (9-5) is completely wrong. The only assunspting behind (9.31) i that the original pairs x, = (c,,ys) were randomly saripled from some distribution F, where F is & disteibution on (p+ 1)-dimensoual vectors (e,y) Even if (9.4), (8.5) 18 correct, it 1s no disaster to bootstrap pairs asin (9.31) it ean be shown that the answer given by (0.31) approaches that given by (0.32) a8 the number of pairs rn giows large. The simple model for the Lormone data (0.12) was eanalyzed bootstrapping peiss. 8 = 800 booistrap replications nu REGRESSION MODELS 8 s i a : . a Figure 2. Regresaon percentiles forthe sholstyramne data of Fi are 7.6 Jor example the carve labeled “16% apprommates the «ond. tion Tith percentile ofthe Improvement y grew the Compliance Poles a fonction of 2, The pecenle cures are tune a far apert {2 100 as at 20, The liner reesion mode (24), (0.5) cat be forrect for ths dataset. (Regression perentiee celal sg ane ‘retne mars tebhoad, Bom, 1901) PXAMPLE THE CELL SURVIVAL DATA us awe ool) = 77, Reaa(h) = 0045, (99) ‘ot much diferent than Table 9:2 "The reverse arguinent ea laa be made. Model (9.4), (9.5) doesn't hhave to hold perfectly in order for bootstrapping resibaals 88 un (9.32) to give reasonable results. Moreover difletences in the error Aintributions, as in the cholestyramine data, can be incorporated sata tandel (9:1), (0.5) leading to. more appropriate version of bootstrapping residuals; see model (9-42). Pethaps the most in portant point here is that bootstrapping is nota uniquely Aetined concept. Figure 88 can be implemented in different ways forthe same problem, depending on low’ the probability model P— interpreted, When we bootstrap residuals, the bootstrap data sets x? = {lex yi) (C2, 3)s>-- (Gn y)} have covannate vectors e,, 3, exactly the same as those forthe actual dats sot x. ‘This seems un natural for the hormone data, where ¢, tavolves %, the hours wor, ‘hil! is just as auch. 2 vendona variable as isthe response variable 4, amount remaining. {Eyen when covariates are generated randomly, there are reasons to do the analysis ns if they are fixed, Regression coedlirents have larger standard error when tbe covariates have smaller standard deviation. By creating the covariates ag fixed constants we obtain fa standard error that reflects the precision associated with the sample of covariates actually observed. However, as (9.38) shows, the diflerence betwen e, fixed and e, random usually doesn't affect ‘he standard ettor estimate very much, the cell survival data situations where the covariates are more nat uraily considered fixed rather than random. The cell survival data, in Table 9.4 show such 2 situation. A ratiologil has run au ex periment involving 14 bacterial plates. The plates were exposed to varions doses of radiation, and the proportion of the surviving calls measured. Greater doses lead to smaller survival proportions, 1s would be expects. The question mark after the response for plate 13 reflects some uncertamty in that result expreseed Py the Investigator ‘Tho mvestigator was interostee! in a reyression analy with

You might also like