Final Requirement IN Statistics and Probability: Miranda ST, Angeles City

SYSTEMS PLUS COLLEGE FOUNDATION
Miranda St, Angeles City
FINAL REQUIREMENT
IN
STATISTICS AND PROBABILITY
Mary rose L. Almimurung

11- ICT B – Earth
Random sampling - is a procedure for sampling from a population in which (a) the selection of
a sample unit is based on chance and (b) every element of the population has a known, non-zero
probability of being selected.
Random sampling helps produce representative samples by eliminating voluntary response bias
and guarding against undercoverage bias. Probability sampling methods rely on random
sampling.
Ex:
1.In a medical study, the population might be all adults over age 50 who have high blood
pressure.
2.In another study, the population might be all hospitals in the U.S. that perform heart bypass
surgery.
3.If we are studying whether a certain die is fair or weighted, the population would be all
possible tosses of the die.
In Example 3, it is fairly easy to get a simple random sample: Just toss the die n times, and
record each outcome.
Selecting a simple random sample in examples 1 and 2 is much harder. A good way to select a
simple random sample for Example 2 would proceed as follows:
First, obtain or make a list of all hospitals in the U.S. that perform heart bypass surgery. Number
them 1, 2, ... up to to the total number M of hospitals in the population. (Such a list is called a
sampling frame.)
Then use some sort of random number generating process2 to obtain a simple random sample of
size n from the population of integers 1, 2, ..., M. The simple random sample of hospitals would
consist of the hospitals in the list that correspond to the numbers in the SRS of numbers.
In theory, the same process could be used in Example 1. However, obtaining the sampling frame
would be much harder -- probably impossible. So some compromises may need to be made.
Unfortunately, these compromises can easily lead to a sample that is biased or otherwise not
close enough to random to be suitable for the statistical procedures used.
Indeed, even the sampling procedure described above is a compromise and may not be suitable
in some situations, described in the next section.
Parameters
 are numbers that summarize data for an entire population. Statistics are numbers that
summarize data from a sample, i.e. some subset of the entire population.
Example
All registered voters in Crawford County
All members of the International Machinists Union
All Americans who played golf at least once in the past year , But populations can refer to things
as well as
Example
All widgets produced last Tuesday by the Acme Widget Company
All daily maximum temperatures in July for major U.S. cities
All basal ganglia cells from a particular rhesus monkey
A sampling distribution is a statistic that is arrived out through repeated sampling from a larger
population.
It describes a range of possible outcomes that of a statistic, such as the mean or mode of some
variable, as it truly exists a population.

The majority of data analyzed by researchers are actually drawn from samples, and not
populations.
Sampling With Replacement and Sampling Without Replacement
Sampling with replacement:
Consider a population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18
potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly
one sack with each number. So the whole population has seven sacks. If I sample two with
replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I
replace it. Then I pick another. Every one of them still has 1/7 probability of being chosen. And
there are exactly 49 different possibilities here (assuming we distinguish between the first and
second.) They are: (12,12), (12,13), (12, 14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,13),
(13,14), etc
Sampling without replacement:
Consider the same population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or
18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly
one sack with each number. So the whole population has seven sacks. If I sample two without
replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I
pick another. At this point, there are only six possibilities: 12, 13, 15, 16, 17, and 18. So there are
only 42 different possibilities here (again assuming that we distinguish between the first and the
second.) They are: (12,13), (12,14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,14), (13,15),
etc.
6. Sample Distributions
Distribution of Sample Means
Sampling with Replacement
Example 1: The population from which samples are selected is {1,2,3,4,5,6}.
This population has a mean of 3.5 and a standard deviation of 1.70783. The next display shows a
histogram of the population.
Histogram of Population {1,2,3,4,5,6}
A computer was programmed to take all samples of size 4 (there are 1296) with replacement
from this population. A few of the samples are {1,1,1,1}, {1,1,1,2}, {1,1,1,3}, {1,1,1,4},...,
{6,6,6,3}, {6,6,6,4}, {6,6,6,5}, and {6,6,6,6}.
For each of these samples a statistic, the sample mean (i.e. the average of the numbers in the
sample), was computed. The sample means for the first few samples shown above are 1, 1.25,
1.5, 1.75,...,5.25, 5.5, 5.75, and 6. A histogram of all 1296 sample means is shown next.
Histogram of All Sample Means for Samples of Size 4 with Replacement Taken from Population
{1,2,3,4,5,6}
The mean of these 1296 sample means is 3.5 and the standard deviation of these 1296 sample
means is 0.853913.
From the histogram of sample means it appears that the sample means for samples of size 4
taken with replacement from the population {1,2,3,4,5,6} are normally distributed, at least
approximately.

The observations made in Example 1 may have been true because the population had a uniform
symmetric shape. This example shows a population that is neither uniform nor symmetric.
Histogram of Population {1,2,3,3,3,10}
This population has a mean of 3.66667 and a standard deviation of 2.92499. Then a computer
found all 1296 samples of size 4 with replacement from this population and calculated the mean
of each of these samples. The mean of these 1296 sample means is 3.66667 and the standard
deviation is 1.46249.
A histogram of these sample means is shown next.
Histogram of All 1296 Sample Means for Samples of Size 4 Taken with Replacement from
Population {1,2,3,3,3,10}
This histogram resembles a normal curve but it has some gaps and is skewed to the right. If a
larger sample size had been used the curve would look more like a normal curve. This is
suggested by the following histogram showing 400 sample means for samples of size 36 taken
with replacement from the same population. There are 6^36 sample means altogether--it would
take too long to compute all of them, and that is why only 400 samples are taken and the means
computed for each of them.
Histogram of 400 Sample Means for Samples of Size 36 Taken with Replacement from
Population {1,2,3,3,3,10}
The mean of the 400 sample means is 3.65278 and the standard deviation of them is 0.498121.
The mean of these sample means is very close to the population mean, 3.66667, and the standard
deviation is close to 2.92499/Sqrt[36] = 2.92499/6 = 0.487498.
These few examples suggest the following concerning the collection of sample means from all
random samples of size n taken from a population, the sampling distribution of sample means:
In sampling with replacement the mean of all sample means equals the mean of the population:
When sampling with replacement the standard deviation of all sample means equals the standard
deviation of the population divided by the square root of the sample size when sampling with
replacement.
Whatever the shape of the population distribution, the distribution of sample means is
approximately normal with better approximations as the sample size, n, increases.
This link takes you to a page which discusses the sampling distribution of sample means. When
you reach the page click the red die in front of exercise 1 to run a simulation showing the
distribution of sample means.
Sampling without Replacement

A computer selected all samples of size 4 without replacement from this population. There are
360 such samples. Then the mean of each sample was taken. The mean of all of these sample
means is 3.5, and the standard deviation is 0.540062. So the mean of the sample means equals
the mean of the population from which the samples are selected. However, the standard
deviation does not follow the rule expressed above. Dividing the population standard deviation
(found in example 1 in the section on sampling with replacement), 1.70783, by the square root of
the sample size, 2, results in the number 0.853915, which is not the standard deviation of the
sample means, 0.540062.
In sampling without replacement, the formula for the standard deviation of all sample means for
samples of size n must be modified by including a finite population correction. The formula
becomes: where N is the population size, N=6 in this example, and n is the sample size, n=4 in
this case. The finite population correction is the the second square root in this formula. Using
this formula, you get the correct standard

Final Requirement IN Statistics and Probability: Miranda ST, Angeles City

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Requirement IN Statistics and Probability: Miranda ST, Angeles City

Uploaded by

Copyright:

Available Formats

SYSTEMS PLUS COLLEGE FOUNDATION

Miranda St, Angeles City

Mary rose L. Almimurung

probability of being selected.

possible tosses of the die.

record each outcome.

simple random sample for Example 2 would proceed as follows:

close enough to random to be suitable for the statistical procedures used.

in some situations, described in the next section.

All registered voters in Crawford County

All members of the International Machinists Union

All widgets produced last Tuesday by the Acme Widget Company

All daily maximum temperatures in July for major U.S. cities

All basal ganglia cells from a particular rhesus monkey

variable, as it truly exists a population.

Sampling With Replacement and Sampling Without Replacement

Sampling with replacement:

Sampling without replacement:

Example 2: The population from which samples are selected is {1,2,3,3,3,10}.

A histogram of these sample means is shown next.

Sampling without Replacement

You might also like