You are on page 1of 18

RESAMPLE.

XLS
Michael Wood, February 2003

The approach in this workbook is explained in more detail in Making sense of statistics - a non-
mathematical approach (Palgrave, August 2003). There are other worksheets at
http://userweb.port.ac.uk/~woodm/nms .

This workbook is for resampling, or for implementing the "two bucket model". Please note the two
general points below, then I'd suggest working through one of the examples to see how it works.
The maximum sample and resample sizes are both 100: there are some notes on increasing these
limts after the two examples.
Green cells are for keying in data and formulae. There are formulae in many cells that look blank,
so check you are not overwriting anything if you key anything in to a cell which is not green. (You
must leave A4 on the Lots of resamples sheet blank; otherwise the table on the left hand side will
not work.) If a cell is green but has something in it, you can either accept what's there, or change it:
eg you can give Variable 1 a name of your own if you want to.
The workbook is set to recalculate automatically - except for tables. The Lots of resamples sheet
uses a table, so you need to press F9 to calculate (or recalculate) the statistics and the graph in this
sheet. If you have a lot of data, or your computer is slow, it may be a good idea to set it to manual
calculation (Tools-Options-Calculation then tick the box for Manual and delete the tick for
recalculating before saving). You then have to press F9 to do the calculations.

Example 1 - the UK National Lottery


First click on the 'Sample (Bucket 1)' tab at the bottom of the screen. Enter 1 in the first six cells of
the green column under the heading Variable 1, and 0 in the next 43 cells. The 1s represent the six
numbers you have chosen; the 0's represent the other 43 numbers. The sample size of 49 should
then appear at the top. (You'll also see there's room for Variable 2. This is for where you've got two
things stamped on each ball. The random number column helps the computer with the resampling -
ignore this.)

Next click on the 'Single resample' tab at the bottom. You need to tell the spreadsheet how many
balls you will be drawing from Bucket 1. This should be six - the number of balls chosen by the
lottery machine. Enter this in the green cell at the top.
You will also notice here that there are two kinds of resampling - without replacement, and with
replacement. This refers to whether or not you replace each ball in Bucket 1 before drawing the
next. Which do we want here - sampling with or without replacement?
We want to sample without replacement because that is what the lottery machine does. Balls come
out and stay out. They are not replaced, so the same ball is never chosen twice.
This means that we want to use the block headed Resampling without replacement. The next step
is to enter the 'Resample statistic' we want to use in the other green cell. This should be the sum of
all the values of variable 1. Enter the formula
=SUM(C7:C12)
in this cell. In my case 2 appears in the cell, but if you are doing it yourself another number will
probably appear. If you press F9 the worksheet will be recalculated producing another set of
random numbers, another resample (draw from Bucket 1), and another score representing the
number of numbers correct.
You can also name this resample statistic. Type Number correct in Cell A4 (replacing resample
statistic).
Now click on the 'Lots of resamples (Bucket 2)' tab. There is a table on the left hand side which
contains the results of repeated resamples from the previous sheet. Here you need to fill in the
Values of interest in the green box - enter 0, 1, 2, 3, 4, 5, 6 in the top of this column, and then press
F9 to calculate the probabilities.
There is also a graph here. The scale is automatic; you can change it by keying the middle of the
bottom two bars in the green cells - I would suggest 0 and 1. Also, if you enter 3 for the cut value
the spreadsheet will work out the probability of getting a value less than 3, equal to 3, and more
than 3.

Example 2: A bootstrap confidence interval


Put the data in the top of the Variable 1 column in the Sample sheet. Let's say you've got a sample
of 12 numbers representing weights. (You'll also see there's room for Variable 2. This is for where
you've got two things stamped on each ball. The random number column helps the computer with
the resampling - ignore this.)
It's helpful, but not essential, to work out the mean of your sample: type Sample mean in Cell A4,
and in B4 the formula
=AVERAGE(C7:C18)
Next, in the Single resample sheet enter the resample size of 12 in B3, and in B4 the formula
=AVERAGE(F7:F18)
as you want to resample with replacement. If you want another statistic (eg stdev) then use this
function instead here and in the Sample sheet. Type Mean weight in Cell A4. If you press F9
another set of random numbers and another resample will be produced.
In the Lots of resamples sheet, the 2.5th and 97.5th percentiles will give you the 95% confidence
interval. You can change the percentiles if you like - eg 10 and 90 for an 80% interval.

Increasing sample sizes, etc


In order to keep the workbook to a reasonable size, and to ensure it runs reasonably quickly, the
sizes of the sample, resample and table of resamples are limited. However, it's easy to increase all
of these.
To increase the maximum sample (Bucket 1) size (from 100) you need to go to (say) A50, drag the
mouse down to select the number of rows by which you want to increase the sample, and insert this
number of rows (Insert-Rows). (The advantage of inserting the rows in the middle rather than the
end is that this will automatically extend the green block and several named ranges.) Next you need
to copy the formulae in A8:B8 down to the bottom of the data block. The formulae in the Single
resample sheet will also need adjusting: copy A8:G8 down to the bottom of the resample block.

To increase the maximum resample size (from 100) you need to copy the cells A106:G106 further
down, and make sure that the resample statistic formula refers to the right range.
To increase the number of resamples (balls in Bucket 2) from 200, you need to go to the Lots of
resamples sheet, then select A4:B204 and extend it as far down as you want to go, then Data-
Table-Column input A4 (leave Row input blank) and click OK. Then you need to extend the named
block Resamvals (Insert-Name-Define). Then copy the formulae in cells A204 and C204 down as far
as the table extends, and extend the range named cut (Insert-Name-Define).
Sample (Bucket 1)
If your data is in another spreadsheet file, open this, and then copy and paste the data into the green cells.
Sample size: 0

(The sample reference numbers and random numbers will appear automatically. They are for the resampling process.)
Sample Reference no Random number Variable 1 Variable 2
e green cells.

the resampling process.)


Single Resample

Resample size
Resample statistic
RESAMPLING WITHOUT REPLACEMENT RESAMPLING WITH REPLACEMENT
Resample reference no Sample ref no Variable 1 Variable 2 Sample ref no Variable 1
TH REPLACEMENT
Variable 2
Lots of resamples (Bucket 2) - Press F9 to calculate
Number of resamples: 200
Resample number Resample statistic Standard statistics
0 mean 0 Res
1 0 eq median 0 250
2 0 eq sd 0
3 0 eq ave dev from mean 0
4 0 eq lower quartile 0 200
5 0 eq upper quartile 0
6 0 eq interquartile range 0
7 0 eq percentiles: 150

8 0 eq 2.5 0
9 0 eq 97.5 0
100
10 0 eq
11 0 eq Cut value:
12 0 eq 50
13 0 eq Probability below cut
14 0 eq probability equal to cut
15 0 eq probability above cut 0
16 0 eq Total 0.00% 0 0 0 0

17 0 eq
18 0 eq Values of interest Probability
19 0 eq
20 0 eq
21 0 eq
22 0 eq
23 0 eq
24 0 eq
25 0 eq
26 0 eq
27 0 eq
28 0 eq
29 0 eq
30 0 eq
31 0 eq
32 0 eq
33 0 eq Total 0.00%
34 0 eq
35 0 eq
36 0 eq
37 0 eq
38 0 eq
39 0 eq
40 0 eq
41 0 eq
42 0 eq
43 0 eq
44 0 eq
45 0 eq
46 0 eq
47 0 eq
48 0 eq
49 0 eq
50 0 eq
51 0 eq
52 0 eq
53 0 eq
54 0 eq
55 0 eq
56 0 eq
57 0 eq
58 0 eq
59 0 eq
60 0 eq
61 0 eq
62 0 eq
63 0 eq
64 0 eq
65 0 eq
66 0 eq
67 0 eq
68 0 eq
69 0 eq
70 0 eq
71 0 eq
72 0 eq
73 0 eq
74 0 eq
75 0 eq
76 0 eq
77 0 eq
78 0 eq
79 0 eq
80 0 eq
81 0 eq
82 0 eq
83 0 eq
84 0 eq
85 0 eq
86 0 eq
87 0 eq
88 0 eq
89 0 eq
90 0 eq
91 0 eq
92 0 eq
93 0 eq
94 0 eq
95 0 eq
96 0 eq
97 0 eq
98 0 eq
99 0 eq
100 0 eq
101 0 eq
102 0 eq
103 0 eq
104 0 eq
105 0 eq
106 0 eq
107 0 eq
108 0 eq
109 0 eq
110 0 eq
111 0 eq
112 0 eq
113 0 eq
114 0 eq
115 0 eq
116 0 eq
117 0 eq
118 0 eq
119 0 eq
120 0 eq
121 0 eq
122 0 eq
123 0 eq
124 0 eq
125 0 eq
126 0 eq
127 0 eq
128 0 eq
129 0 eq
130 0 eq
131 0 eq
132 0 eq
133 0 eq
134 0 eq
135 0 eq
136 0 eq
137 0 eq
138 0 eq
139 0 eq
140 0 eq
141 0 eq
142 0 eq
143 0 eq
144 0 eq
145 0 eq
146 0 eq
147 0 eq
148 0 eq
149 0 eq
150 0 eq
151 0 eq
152 0 eq
153 0 eq
154 0 eq
155 0 eq
156 0 eq
157 0 eq
158 0 eq
159 0 eq
160 0 eq
161 0 eq
162 0 eq
163 0 eq
164 0 eq
165 0 eq
166 0 eq
167 0 eq
168 0 eq
169 0 eq
170 0 eq
171 0 eq
172 0 eq
173 0 eq
174 0 eq
175 0 eq
176 0 eq
177 0 eq
178 0 eq
179 0 eq
180 0 eq
181 0 eq
182 0 eq
183 0 eq
184 0 eq
185 0 eq
186 0 eq
187 0 eq
188 0 eq
189 0 eq
190 0 eq
191 0 eq
192 0 eq
193 0 eq
194 0 eq
195 0 eq
196 0 eq
197 0 eq
198 0 eq
199 0 eq
200 0 eq
Resample Frequencies
250

200

150

100

50

0
0 0 0 0 0 0 0 0 0 0 0

Resample statistic

Table for graph


Middle of bottom bar 0
Middle of next bar 0
Bottom Top MidFrequency
below 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 200
0 0 0 0
0 0 0 0
0 0 0 0
0 upwards 0
Total 200

You might also like