Professional Documents
Culture Documents
FINAL REQUIREMENT
IN
STATISTICS AND PROBABILITY
a sample unit is based on chance and (b) every element of the population has a known, non-zero
Random sampling helps produce representative samples by eliminating voluntary response bias
and guarding against undercoverage bias. Probability sampling methods rely on random
sampling.
Ex:
1.In a medical study, the population might be all adults over age 50 who have high blood
pressure.
2.In another study, the population might be all hospitals in the U.S. that perform heart bypass
surgery.
3.If we are studying whether a certain die is fair or weighted, the population would be all
In Example 3, it is fairly easy to get a simple random sample: Just toss the die n times, and
Selecting a simple random sample in examples 1 and 2 is much harder. A good way to select a
First, obtain or make a list of all hospitals in the U.S. that perform heart bypass surgery. Number
them 1, 2, ... up to to the total number M of hospitals in the population. (Such a list is called a
sampling frame.)
Then use some sort of random number generating process2 to obtain a simple random sample of
size n from the population of integers 1, 2, ..., M. The simple random sample of hospitals would
consist of the hospitals in the list that correspond to the numbers in the SRS of numbers.
In theory, the same process could be used in Example 1. However, obtaining the sampling frame
would be much harder -- probably impossible. So some compromises may need to be made.
Unfortunately, these compromises can easily lead to a sample that is biased or otherwise not
Indeed, even the sampling procedure described above is a compromise and may not be suitable
Parameters
are numbers that summarize data for an entire population. Statistics are numbers that
summarize data from a sample, i.e. some subset of the entire population.
Example
All Americans who played golf at least once in the past year , But populations can refer to things
as well as
Example
A sampling distribution is a statistic that is arrived out through repeated sampling from a larger
population.
It describes a range of possible outcomes that of a statistic, such as the mean or mode of some
populations.
Consider a population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18
potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly
one sack with each number. So the whole population has seven sacks. If I sample two with
replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I
replace it. Then I pick another. Every one of them still has 1/7 probability of being chosen. And
there are exactly 49 different possibilities here (assuming we distinguish between the first and
second.) They are: (12,12), (12,13), (12, 14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,13),
(13,14), etc
Consider the same population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or
18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly
one sack with each number. So the whole population has seven sacks. If I sample two without
replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I
pick another. At this point, there are only six possibilities: 12, 13, 15, 16, 17, and 18. So there are
only 42 different possibilities here (again assuming that we distinguish between the first and the
second.) They are: (12,13), (12,14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,14), (13,15),
etc.
6. Sample Distributions
Distribution of Sample Means
Sampling with Replacement
Example 1: The population from which samples are selected is {1,2,3,4,5,6}.
This population has a mean of 3.5 and a standard deviation of 1.70783. The next display shows a
histogram of the population.
Histogram of Population {1,2,3,4,5,6}
A computer was programmed to take all samples of size 4 (there are 1296) with replacement
from this population. A few of the samples are {1,1,1,1}, {1,1,1,2}, {1,1,1,3}, {1,1,1,4},...,
{6,6,6,3}, {6,6,6,4}, {6,6,6,5}, and {6,6,6,6}.
For each of these samples a statistic, the sample mean (i.e. the average of the numbers in the
sample), was computed. The sample means for the first few samples shown above are 1, 1.25,
1.5, 1.75,...,5.25, 5.5, 5.75, and 6. A histogram of all 1296 sample means is shown next.
Histogram of All Sample Means for Samples of Size 4 with Replacement Taken from Population
{1,2,3,4,5,6}
The mean of these 1296 sample means is 3.5 and the standard deviation of these 1296 sample
means is 0.853913.
From the histogram of sample means it appears that the sample means for samples of size 4
taken with replacement from the population {1,2,3,4,5,6} are normally distributed, at least
approximately.
Histogram of All 1296 Sample Means for Samples of Size 4 Taken with Replacement from
Population {1,2,3,3,3,10}
This histogram resembles a normal curve but it has some gaps and is skewed to the right. If a
larger sample size had been used the curve would look more like a normal curve. This is
suggested by the following histogram showing 400 sample means for samples of size 36 taken
with replacement from the same population. There are 6^36 sample means altogether--it would
take too long to compute all of them, and that is why only 400 samples are taken and the means
computed for each of them.
Histogram of 400 Sample Means for Samples of Size 36 Taken with Replacement from
Population {1,2,3,3,3,10}
The mean of the 400 sample means is 3.65278 and the standard deviation of them is 0.498121.
The mean of these sample means is very close to the population mean, 3.66667, and the standard
deviation is close to 2.92499/Sqrt[36] = 2.92499/6 = 0.487498.
These few examples suggest the following concerning the collection of sample means from all
random samples of size n taken from a population, the sampling distribution of sample means:
In sampling with replacement the mean of all sample means equals the mean of the population:
When sampling with replacement the standard deviation of all sample means equals the standard
deviation of the population divided by the square root of the sample size when sampling with
replacement.
Whatever the shape of the population distribution, the distribution of sample means is
approximately normal with better approximations as the sample size, n, increases.
This link takes you to a page which discusses the sampling distribution of sample means. When
you reach the page click the red die in front of exercise 1 to run a simulation showing the
distribution of sample means.