You are on page 1of 38

Sampling, Sampling Distributions

and Confidence Interval


Sampling
• Sample is a part of the population from which it is
selected. The process of selecting a sample is known
as sampling. Thus, the sampling theory is a study of
relationship that exists between the population and
the samples drawn from the population.
• There are three main reasons for selecting a sample:
1. Selecting a sample is less time-consuming than
selecting every item in the population.
2. Selecting a sample is less costly than selecting
every item in the population.
3. Analyzing a sample is less cumbersome and more
practical than analyzing the entire population
Non Probability Sample
• In a non probability sample, items included are chosen
without regard to their probability of occurrence.
- In convenience sampling items are selected based only on
the fact they are easy, inexpensive or convenient to sample.
- In judgment sampling you get opinions of pre-selected
experts in the subject matter.
- Quota sampling - 4000 UG, 3000 PG – 100 samples
4000/7000*100 ; 3000/7000*100; Stratified random
Sampling
- Snowball sampling
Non Probability Sampling
• In convenience sampling, items selected are easy,
inexpensive, or convenient to sample. For example, if you
were sampling tires stacked in a warehouse, it would be
much more convenient to sample tires at the top of a
stack than tires at the bottom of a stack.
• In a judgment sampling, you get the opinions of
preselected experts in the subject matter. For example, if
one is studying the performance of sales staff in a
marketing organisation, the people here are classified into
top grade, medium grade and low grade performers.
Having specified qualities that are important in the study,
the expert (possibly here the Vice-President-sales)
indicates the people who, in his or her knowledge, would
be representative of each of the three categories
Non Probability Sampling
• Quota Sampling

• Snowball Sampling
Probability Sampling

• In simple random sampling, drawing of elements from the


population is random and the choice of an element is made
in such a way that every element has the same probability
of being chosen.
• For example, Suppose the warden of a student’s hostel with
200 occupants wants to constitute a welfare committee with
the members randomly selected. The lottery method of
selecting these five members from a group of 200 would be
first to prepare 200 slips of identical shape and size and
write the name of each student on a slip. Fold these 200
slips identically and mix them well in a container. Then
select five folded slips, from the container at random. The
five students so selected would constitute a welfare
committee of the hostel.
• In stratified random sampling, the population
is sub-divided into strata before the sample is
drawn. Strata are so designed that they should
not overlap.
• For example, if a sample size of 500
elementary units have to be drawn from a
population with 10,000 units divided in four
strata in the following way:
Probability Sample: Stratified Sample
• Divide population into two or more subgroups (called
STRATA) according to some common characteristic.
• A simple random sample is selected from each group
with sample sizes proportional to strata sizes.
• Samples from subgroups are combined into one.
• This is common technique when sampling population
of voters, stratifying across racial or socio-economic
lines.
• Under Cluster Sampling method, the random selection is made of
primary, intermediate and final (or the ultimate) units from a given
population or stratum. There are several stages in which the
sampling process is carried out. At first, the first stage units are
sampled by some suitable method, such as simple random
sampling. Then, a sample of second stage unit is selected from each
of the selected first stage units, again by some suitable method
which may be same as or different from the method employed for
the first stage units. Further stages may be added as required.
• Suppose we want to take a sample of 5,000 households from the
State of Haryana. At the first stage, the state may be divided into a
number of districts and a few districts are selected at random. At
the second stage, each district may be sub-divided into a number of
villages and a sample of villages may be taken at random. At the
third stage, a number of households may be selected from each of
the villages selected at second stage
Probability Sample-Cluster Sampling
• Population is divides into several clusters each
representative of the population.
• A simple random sample of clusters is selected
• All items in the selected clusters can be used or items can be
chosen from a cluster using another probability sampling
technique.
• A common application of cluster sampling involves election
exit polls where certain election districts are selected and
sampled.
• Note: With in the groups there is heterogeneity and across
groups homogeneity.
• In Systematic sampling, Another sampling form, members of
population to be sampled are arranged in order, the order
corresponding to consecutive numbers. The arrangements of
names in a telephone directory or income-tax returns in the
income tax department are the illustrations of such orderings.
A sample of suitable size is obtained by taking every unit say,
seventh unit of the population, one of the first seven units in
this ordered arrangement is chosen at random and the sample
is completely by selecting every seventh unit from the rest of
the list. If the first unit selected is the fifth, the researcher will
include in his sample 12th, 19th, 26th, 33rd, etc.
Probability Sample-Systematic Sample
• Decide on Sample Size: n
• Divide frame of N individuals into groups of k
individuals: k=N/n
• Randomly select one individual from the 1st group
• Select every Kth individual thereafter
N=56
N=8 ꝿ



















ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ

k=7 ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ ꝿ
The Unbiased Property of the Sample Mean
The sample mean is unbiased because the mean of all the possible sample means (of a
given sample size, n) is equal to the population mean,

Because the mean of the 16 sample


means is equal to the population mean,
the sample mean is an unbiased
estimator of the population mean.
Therefore, although you do not know
how close the sample mean of any
particular sample selected comes to the
population mean, you are assured that
the mean of all the possible sample
means that could have been selected is
equal to the population mean.
Equation defines the standard error of the mean when sampling with replacement or
sampling without replacement from large or infinite populations.

If you randomly select a sample of 25 boxes without replacement from the


thousands of boxes filled during a shift, the sample contains much less than
5% of the population. Given that the standard deviation of the cereal-filling
process is 15 grams, compute the standard error of the mean.
Optional slide For derivation of standard error
Central Limit Theorm
The Central Limit Theorem states that as the sample size (i.e., the number of values
in each sample) gets large enough, the sampling distribution of the mean is
approximately normally distributed. This is true regardless of the shape of the
distribution of the individual values in the population.
• Panel A shows the sampling distribution of the mean
selected from a normal population. When the
population is normally distributed, the sampling
distribution of the mean is normally distributed for any
sample size.

• Panel B depicts the sampling distribution from a


population with a uniform (or rectangular) distribution
When n=2 samples of size are selected, there is a
peaking, or central limiting, effect already working. For
n=2 the sampling distribution is bell-shaped and
approximately normal. When n=30 the sampling
distribution looks very similar to a normal distribution. In
general, the larger the sample size, the more closely the
sampling distribution will follow a normal distribution.
As with all other cases, the mean of each sampling
distribution is equal to the mean of the population, and
the variability decreases as the sample size increases.

• Panel C presents an exponential distribution .This


population is extremely right-skewed. When n=2 the
sampling distribution is still highly right skewed but less
so than the distribution of the population. For n=5 the
sampling distribution is slightly right-skewed. When n=30
the sampling distribution looks approximately normal.
Again, the mean of each sampling distribution is equal to
the mean of the population, and the variability
decreases as the sample size increases.
Using the results from the normal, uniform, and exponential distributions, you
can reach the following conclusions regarding the Central Limit Theorem:
• For most population distributions, regardless of shape, the sampling
distribution of the mean is approximately normally distributed if samples of at
least size 30 are selected.
• If the population distribution is fairly symmetrical, the sampling distribution
of the mean is approximately normal for samples as small as size 5.
• If the population is normally distributed, the sampling distribution of the
mean is normally distributed, regardless of the sample size.

The Central Limit Theorem is of crucial importance in using statistical inference


to reach conclusions about a population. It allows you to make inferences
about the population mean without having to know the specific shape of the
population distribution.
Confidence Interval
Consider the following business situations.
1. Tourism is a major source of income for many Caribbean countries, such as Barbados.
Suppose the Bureau of Tourism for Barbados wants an estimate of the mean amount
spent by tourists visiting the country. It would not be feasible to contact each tourist.
Therefore, 500 tourists are randomly selected as they depart the country and asked in
detail about their spending while visiting the island. The mean amount spent by the
sample of 500 tourists is an estimate of the unknown population parameter. That is, we
let the sample mean, serve as an estimate of the population mean.
2. Centex Home Builders, Inc., builds quality homes in the southeastern region of the
United States. One of the major concerns of new buyers is the date on which the home
will be completed. In recent times Centex has been telling customers, "Your home will
be completed 45 working days from the date we begin installing drywall." The customer
relations department at Centex wishes to compare this pledge with recent experience.
A sample of 50 homes completed this year revealed the mean number of working days
from the start of drywall to the completion of the home was 46.7 days. Is it reasonable
to conclude that the population mean is still 45 days and that the difference between
the sample mean (46.7 days) and the proposed population mean is sampling error?
3. Recent medical studies indicate that exercise is an important part of a person's overall
health. The director of human resources at OCF, a large glass manufacturer, wants an
estimate of the number of hours per week employees spend exercising. A sample of 70
employees reveals the mean number of hours of exercise last week is 3.3. The sample
mean of 3.3 hours estimates the unknown population mean, the mean hours of exercise
for all employees.

A point estimate is a single statistic used to estimate a population parameter.

Suppose Best Buy, Inc. wants to estimate the mean age of buyers of high-definition
televisions. They select a random sample of 50 recent purchasers, determine the age
of each purchaser, and compute the mean age of the buyers in the sample. The mean
of this sample is a point estimate of the mean of the population.

The sample mean, is a point estimate of the population mean and the sample standard
deviation, is a point estimate of the population standard deviation.

A point estimate, however, tells only part of the story. While we expect the point estimate
to be close to the population parameter, we would like to measure how close it really is. A
confidence interval serves this purpose.
The 95 percent confidence interval is computed as follows, when the number of observations
in the sample is at least 30.

Similarly, the 99 percent confidence interval is computed as follows. Again we assume


that the sample size is at least 30.
-
CONFIDENCE INTERVAL : A range of values constructed from sample data so that the
population parameter is likely to occur within that range at a specified probability. The
specified probability is called the level of confidence.

For reasonably large samples, the results of the central limit theorem allow us to
state the following:

1. Ninety-five percent of the sample means selected from a population will be within
1.96 standard deviations of the population mean.

2. Ninety-nine percent of the sample means will lie within 2.58 standard deviations
of the population mean.

The standard deviation discussed here is the standard deviation of the sampling
distribution of the sample mean. It is usually called the "standard error.“

Intervals computed in this fashion are called the 95 percent confidence interval and
the 99 percent confidence interval.
Can You Ever Know the Population Standard
Deviation?
To solve above equation , you must know the value for the population standard
deviation. To know implies that you know all the values in the entire population.
(How else would you know the value of this population parameter?) If you knew all
the values in the entire population, you could directly compute the population mean.
There would be no need to use the inductive reasoning of inferential statistics to
estimate the population mean. In other words, if you knew you really do not have
a need to use the above equation to construct a “confidence interval estimate of the
mean ( known).”
More significantly, in virtually all real-world business situations, you would never know
the standard deviation of the population. In business situations, populations are often
too large to examine all the values. So why study the confidence interval estimate of
the mean ( known) at all? This method serves as an important introduction to the
concept of a confidence interval because it uses the normal distribution. In the next
slides, you will see that constructing a confidence interval estimate when is not known
requires another distribution (the t distribution).

Reference Slide
Student’s t-Distribution
• In situation, where the population is normal
but its standard deviation is unknown, the
student’s t distribution should be used instead
of the normal z- distribution.
• This is particularly important when the sample
size is small. When is unknown, the formula
for a confidence interval resembles the
formula for known except that t replaces z
and s replace
At the start of the twentieth century, William S.
Gosset was working at Guinness in Ireland,
trying to help brew better beer less
expensively .As he had only small samples to
study, he needed to find a way to make inferences
about means without having to know Writing
under the pen name “Student,” Gosset solved this
problem by developing what today is known as
the Student’s t distribution, or the t distribution,
Sample Size Determination for the Mean
To develop an equation for determining the appropriate
sample size needed when constructing a confidence interval
estimate for the mean, recall equation

The amount added to or subtracted from is equal to half


the width of the interval. This quantity represents the amount
of imprecision in the estimate that results from sampling error.
The sampling error, e, is defined as
Solving for n gives the sample size needed to construct the
appropriate confidence interval estimate for the mean.
“Appropriate” means that the resulting interval will have an
acceptable amount of sampling error.
If a quality control manager wants to estimate, with
95% confidence, the mean life of light bulbs to within
hours and also assumes that the population standard
deviation is 100 hours, how many light bulbs need to be
selected?

Solution: 97
A survey is planned to determine the mean annual family medical expenses of employees
of a large company. The management of the company wishes to be 95% confident that the
sample mean is correct to within of the population mean annual family medical
expenses. A previous study indicates that the standard deviation is approximately $400.
a. How large a sample is necessary?
b. If management wants to be correct to within how many employees need to be
selected?

You might also like