You are on page 1of 7

SYSTEMS PLUS COLLEGE FOUNDATION

Miranda St, Angeles City

FINAL REQUIREMENT
IN
STATISTICS AND PROBABILITY

Mary rose L. Almimurung


11- ICT B – Earth
Random sampling - is a procedure for sampling from a population in which (a) the selection of

a sample unit is based on chance and (b) every element of the population has a known, non-zero

probability of being selected.

Random sampling helps produce representative samples by eliminating voluntary response bias

and guarding against undercoverage bias. Probability sampling methods rely on random

sampling.

Ex:

1.In a medical study, the population might be all adults over age 50 who have high blood

pressure.

2.In another study, the population might be all hospitals in the U.S. that perform heart bypass

surgery.

3.If we are studying whether a certain die is fair or weighted, the population would be all

possible tosses of the die.

In Example 3, it is fairly easy to get a simple random sample: Just toss the die n times, and

record each outcome.

Selecting a simple random sample in examples 1 and 2 is much harder. A good way to select a

simple random sample for Example 2 would proceed as follows:

First, obtain or make a list of all hospitals in the U.S. that perform heart bypass surgery. Number

them 1, 2, ... up to to the total number M of hospitals in the population. (Such a list is called a

sampling frame.)

Then use some sort of random number generating process2 to obtain a simple random sample of

size n from the population of integers 1, 2, ..., M. The simple random sample of hospitals would

consist of the hospitals in the list that correspond to the numbers in the SRS of numbers.

In theory, the same process could be used in Example 1. However, obtaining the sampling frame

would be much harder -- probably impossible. So some compromises may need to be made.
Unfortunately, these compromises can easily lead to a sample that is biased or otherwise not

close enough to random to be suitable for the statistical procedures used.

Indeed, even the sampling procedure described above is a compromise and may not be suitable

in some situations, described in the next section.

Parameters

 are numbers that summarize data for an entire population. Statistics are numbers that

summarize data from a sample, i.e. some subset of the entire population.

Example

All registered voters in Crawford County

All members of the International Machinists Union

All Americans who played golf at least once in the past year , But populations can refer to things

as well as

Example

All widgets produced last Tuesday by the Acme Widget Company

All daily maximum temperatures in July for major U.S. cities

All basal ganglia cells from a particular rhesus monkey

A sampling distribution is a statistic that is arrived out through repeated sampling from a larger

population.

It describes a range of possible outcomes that of a statistic, such as the mean or mode of some

variable, as it truly exists a population.


The majority of data analyzed by researchers are actually drawn from samples, and not

populations.

Sampling With Replacement and Sampling Without Replacement

Sampling with replacement:

Consider a population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18

potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly

one sack with each number. So the whole population has seven sacks. If I sample two with

replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I

replace it. Then I pick another. Every one of them still has 1/7 probability of being chosen. And

there are exactly 49 different possibilities here (assuming we distinguish between the first and

second.) They are: (12,12), (12,13), (12, 14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,13),

(13,14), etc

Sampling without replacement:

Consider the same population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or

18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly

one sack with each number. So the whole population has seven sacks. If I sample two without

replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I

pick another. At this point, there are only six possibilities: 12, 13, 15, 16, 17, and 18. So there are

only 42 different possibilities here (again assuming that we distinguish between the first and the

second.) They are: (12,13), (12,14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,14), (13,15),

etc.
6. Sample Distributions
Distribution of Sample Means
Sampling with Replacement
Example 1: The population from which samples are selected is {1,2,3,4,5,6}.
This population has a mean of 3.5 and a standard deviation of 1.70783. The next display shows a
histogram of the population.
Histogram of Population {1,2,3,4,5,6}

A computer was programmed to take all samples of size 4 (there are 1296) with replacement
from this population. A few of the samples are {1,1,1,1}, {1,1,1,2}, {1,1,1,3}, {1,1,1,4},...,
{6,6,6,3}, {6,6,6,4}, {6,6,6,5}, and {6,6,6,6}.
For each of these samples a statistic, the sample mean (i.e. the average of the numbers in the
sample), was computed. The sample means for the first few samples shown above are 1, 1.25,
1.5, 1.75,...,5.25, 5.5, 5.75, and 6. A histogram of all 1296 sample means is shown next.
Histogram of All Sample Means for Samples of Size 4 with Replacement Taken from Population
{1,2,3,4,5,6}

The mean of these 1296 sample means is 3.5 and the standard deviation of these 1296 sample
means is 0.853913.
From the histogram of sample means it appears that the sample means for samples of size 4
taken with replacement from the population {1,2,3,4,5,6} are normally distributed, at least
approximately.

Example 2: The population from which samples are selected is {1,2,3,3,3,10}.


The observations made in Example 1 may have been true because the population had a uniform
symmetric shape. This example shows a population that is neither uniform nor symmetric.
Histogram of Population {1,2,3,3,3,10}
This population has a mean of 3.66667 and a standard deviation of 2.92499. Then a computer
found all 1296 samples of size 4 with replacement from this population and calculated the mean
of each of these samples. The mean of these 1296 sample means is 3.66667 and the standard
deviation is 1.46249.

A histogram of these sample means is shown next.

Histogram of All 1296 Sample Means for Samples of Size 4 Taken with Replacement from
Population {1,2,3,3,3,10}

This histogram resembles a normal curve but it has some gaps and is skewed to the right. If a
larger sample size had been used the curve would look more like a normal curve. This is
suggested by the following histogram showing 400 sample means for samples of size 36 taken
with replacement from the same population. There are 6^36 sample means altogether--it would
take too long to compute all of them, and that is why only 400 samples are taken and the means
computed for each of them.

Histogram of 400 Sample Means for Samples of Size 36 Taken with Replacement from
Population {1,2,3,3,3,10}

The mean of the 400 sample means is 3.65278 and the standard deviation of them is 0.498121.
The mean of these sample means is very close to the population mean, 3.66667, and the standard
deviation is close to 2.92499/Sqrt[36] = 2.92499/6 = 0.487498.

These few examples suggest the following concerning the collection of sample means from all
random samples of size n taken from a population, the sampling distribution of sample means:

In sampling with replacement the mean of all sample means equals the mean of the population:

When sampling with replacement the standard deviation of all sample means equals the standard
deviation of the population divided by the square root of the sample size when sampling with
replacement.

Whatever the shape of the population distribution, the distribution of sample means is
approximately normal with better approximations as the sample size, n, increases.

This link takes you to a page which discusses the sampling distribution of sample means. When
you reach the page click the red die in front of exercise 1 to run a simulation showing the
distribution of sample means.

Sampling without Replacement


Example 1: The population from which samples are selected is {1,2,3,4,5,6}.
A computer selected all samples of size 4 without replacement from this population. There are
360 such samples. Then the mean of each sample was taken. The mean of all of these sample
means is 3.5, and the standard deviation is 0.540062. So the mean of the sample means equals
the mean of the population from which the samples are selected. However, the standard
deviation does not follow the rule expressed above. Dividing the population standard deviation
(found in example 1 in the section on sampling with replacement), 1.70783, by the square root of
the sample size, 2, results in the number 0.853915, which is not the standard deviation of the
sample means, 0.540062.
In sampling without replacement, the formula for the standard deviation of all sample means for
samples of size n must be modified by including a finite population correction. The formula
becomes: where N is the population size, N=6 in this example, and n is the sample size, n=4 in
this case. The finite population correction is the the second square root in this formula. Using
this formula, you get the correct standard

You might also like