You are on page 1of 14

Sampling Distribution

Definition: The Sampling Distribution helps in determining the degree to which the


sample means from different samples differ from each other, and the population mean
to determine the degree of closeness between the particular sample mean to the
population mean.

In other words, the sampling distribution constitutes the theoretical basis


of inferential statistics that involves determining the extent to which the sample
statistic vary from each other and the population parameter. Here, the sample statistic
is the sample mean, and the population parameter is the population means.

The concept of a sampling distribution is perhaps the most basic concept in inferential
statistics. It is also a difficult concept because a sampling distribution is a theoretical
distribution rather than an empirical distribution. The introductory section defines the
concept and gives an example for both a discrete and a continuous distribution. It also
discusses how sampling distributions are used in inferential statistics.

Sampling Distributions and Inferential Statistics As we stated in the beginning of this


chapter, sampling distributions are important for inferential statistics. In the examples
given so far, a population was specified and the sampling distribution of the mean and
the range were determined. In practice, the process proceeds the other way: you
collect sample data, and from these data you estimate parameters of the sampling
distribution. This knowledge of the sampling distribution can be very useful. For
example, knowing the degree to which means from different samples would differ
from each other and from the population mean would give you a sense of how close
your particular sample mean is likely to be to the population mean. Fortunately, this
information is directly available from a sampling distribution. The most common
measure of how much sample means differ from each other is the standard deviation
of the sampling distribution of the mean. This standard deviation is called the standard
error of the mean. If all the sample means were very close to the population mean,
then the standard error of the mean would be small. On the other hand, if the sample
means varied considerably, then the standard error of the mean would be large. To be
specific, assume your sample mean were 125 and you estimated that the standard
error of the mean were 5 (using a method shown in a later section). If you had a
normal distribution, then it would be likely that your sample mean would be within 10
units of the population mean since most of a normal distribution is within two
standard deviations of the mean. Keep in mind that all statistics have sampling
distributions, not just the mean. In later sections we will be discussing the sampling
distribution of the variance, the sampling distribution of the difference between
means, and the sampling distribution of Pearson's correlation, among others.

 A sampling distribution is a statistic that is arrived out through repeated sampling


from a larger population.
 It describes a range of possible outcomes that of a statistic, such as the mean or
mode of some variable, as it truly exists a population.
 The majority of data analyzed by researchers are actually drawn from samples,
and not populations.
Types of Sampling Distribution
#1 – Sampling Distribution of Mean

 This can be defined as the probabilistic spread of all the means of samples

chosen on a random basis of a fixed size from a particular population. When

samples have opted from a normal population, the spread of the mean obtained

will also be normal to the mean and the standard deviation.

 If the population is not normal to still, the distribution of the means will tend

to become closer to the normal distribution provided that the sample size is

quite large.

The sampling distribution of the mean was defined in the section introducing

sampling distributions. This section reviews some important properties of the

sampling distribution of the mean. Mean The mean of the sampling distribution of the

mean is the mean of the population from which the scores were sampled. Therefore, if

a population has a mean μ, then the mean of the sampling distribution of the mean is

also μ. The symbol μm is used to refer to the mean of the sampling distribution of

the mean. Therefore, the formula for the mean of the sampling distribution of the

mean can be written as:

µM = µ

Variance The variance of the sampling distribution of the mean is computed as follows:

Sampling Distribution of the Mean


That is, the variance of the sampling distribution of the mean is the population variance

divided by N, the sample size (the number of scores used to compute a mean). Thus, the

larger the sample size, the smaller the variance of the sampling distribution of the mean.

The standard error of the mean is the standard deviation of the sampling distribution of the

mean. It is therefore the square root of the variance of the sampling distribution of the

mean and can be written as:

The standard error is represented by a σ because it is a standard deviation. The subscript (M)

indicates that the standard error in question is the standard error of the mean.

#2 – Sampling Distribution of Proportion


This is primarily associated with the statistics involved in attributes. Here the role of

binomial distribution comes into play. Generally, it responds to the laws of

the binomial distribution, but as the sample size increases, it usually becomes normal

distribution again.

:The Sampling Distribution of Proportion measures the proportion of

success, i.e. a chance of occurrence of certain events, by dividing the


number of successes i.e. chances by the sample size ’n’. Thus, the

sample proportion is defined as p = x/n.

The sampling distribution of proportion obeys the binomial probability

law if the random sample of ‘n’ is obtained with replacement. Such as, if

the population is infinite and the probability of occurrence of an event

is ‘P’, then the probability of non-occurrence of the event is (1-P).  Now

consider all the possible sample size ‘n’ drawn from the population and

estimate the proportion ‘p’ of success for each. Then the mean ( μp)
and the standard deviation (σp) of the sampling distribution of

proportion can be obtained as:

μ
mean ( p)=P
μp = mean of proportion
P = population proportion which is defined as 
P = X/N,

where X is the number of elements that possess a certain characteristic

and N is the total number of items in the population.

σp = standard error of proportion that measures the success (chance)

variations of sample proportions from sample to sample

n= sample size, If the sample size is large (n≥30), then the sampling

distribution of proportion is likely to be normally distributed.

THE CENTRAL LIMIT THEOREM

In the preceding section we discussed the sample distribution of sample means and of sample
ranges. The mean is the most commonly used sample statistic and thus it is very important.
The central limit theorem is about the sampling distribution of sample means of random
samples of size n. Let us establish what we are interested in when studying this distribution:

1) Where is the center?


2) How wide is the dispersion?
3) What are the characteristics of the distribution?

The central limit theorem gives us an answer to all these questions.

Theorem 2.11.1.

The central limit theorem Let µ be the mean and σ the standard deviation of a population
variable. If we consider all possible random sample of size n taken from this population, the
sampling distribution of sample means will have the following properties:
a) the mean µx of this sampling distribution is µ;
b) the standard deviation σx of this sampling distribution is σ/ √ n ;
c) if the parent population is normally distributed the sampling distribution of the sample
means is normal;if the parent population is not normally distributed, the sampling distribution
of the sample means is approximately normal for samples of size 30 or more. The
approximation to the normal distribution improves with samples of larger size.
In short, the central limit theorem states the following:

1) µx = µ, where x is the mean of the sample x;


2) σx = σ/ √ n, the standard deviation of the mean is equal to the standard deviation of the
population divided by the square root of the sample size.
3) the sample distribution of the sample means is approximatively normal regardless of the
shape of the parent population.

Point and Interval Estimation


To estimate an unknown parameter of the population, concept of theory of estimation is
used.There are two types of estimation namely, 1. Point estimation 2. Interval estimation

Point and Interval Estimation:


To estimate an unknown parameter of the population, concept of theory of
estimation is used. There are two types of estimation namely,

1. Point estimation

2. Interval estimation

1. Point Estimation

When a single value is used as an estimate, the estimate is called a point


estimate of the population parameter. In other words, an estimate of a
population parameter given by a single number is called as point estimation.

For example

(i) 55 is the mean mark obtained by a sample of 5 students randomly drawn


from a class of 100 students is considered to be the mean marks of the entire
class. This single value 55 is a point estimate.

(ii) 50 kg is the average weight of a sample of 10 students randomly drawn


from a class of 100 students is considered to be the average weight of the
entire class. This single value 50 is a point estimate.
Note

The sample mean ( ) is the sample statistic used as an estimate of population


mean (μ)

Instead of considering, the estimated value of the population parameter to be


a single value, we might consider an interval for estimating the value of the
population parameter. This concept is known as interval estimation and is
explained below.

2. Interval Estimation

Generally, there are situations where point estimation is not desirable and we
are interested in finding limits within which the parameter would be expected
to lie is called an interval estimation.

For example,

If T is a good estimator of θ with standard error s then, making use of general


property of the standard deviations, the uncertainty in T, as an estimator of q,
can be expressed by statements like “ We are about 95% certain that the
unknown q, will lie somewhere between T-2s and T+2s”, “we are almost sure
that q will in the interval ( T-3s and T+3s)” such intervals are called confidence
intervals and is explained below.

Confidence interval
After obtaining the value of the statistic ‘t’ (sample) from a given sample, Can
we make some reasonable probability statements about the unknown
population parameter ‘ θ’ ?. This question is very well answered by the
technique of Confidence Interval. Let us choose a small value of a which is
known as level of significance(1% or 5%) and determine two constants
say, c1 and c2 such that P (c1 < θ < c 2 |t) = 1 − α .

The quantities c1 and c2, so determined are known as the Confidence Limits and
the interval [c1,c2] within which the unknown value of the population
parameter is expected to lie is known as Confidence Interval. (1− a) is called as
confidence coefficient.

 
Confidence Interval for the population mean for Large Samples

(when  is known)


If we take repeated independent random samples of size n from a population
with an unknown mean but known standard deviation, then the probability
that the true population mean μ will fall in the following interval is (1− α) i.e

So, the confidence interval for population mean (μ), when standard deviation
(σ) is known and is given by

For the computation of confidence intervals and for testing of significance, the
critical values

Za   at the different level of significance is given in the following table:

Normal Probability Table

The calculation of confidence interval is illustrated below.


In Statistics, a confidence interval is a kind of interval calculation, obtained from the observed
data that holds the actual value of the unknown parameter. It is associated with the confidence
level that quantifies the confidence level in which the interval estimates the deterministic
parameter. Also, we can say, it is based on Standard Normal Distribution, where Z value is the z-
score. Here, let us look at the definition, formula, table, and the calculation of the confidence
level in detail.
Confidence Interval Definition
The confidence level represents the proportion (frequency) of acceptable confidence intervals
that contain the true value of the unknown parameter. In other terms, the confidence intervals are
evaluated using the given confidence level from an endless number of independent samples. So
that the proportion of the range contains the true value of the parameter that will be equal to the
confidence level.
Mostly, the confidence level is selected before examining the data. The commonly used
confidence level is 95% confidence level. However, other confidence levels are also used, such
as 90% and 99% confidence levels.

Confidence Interval Formula


The confidence interval is based on the mean and standard deviation. Thus, the formula to find
CI is
X̄ ± Zα/2 × [ σ / √n ]
Where
X̄ = Mean
Z = Confidence coefficient
α = Confidence level
σ = Standard deviation
N = sample space
The value after the ± symbol is known as the margin of error.
Note: This interval is only accurate when the population distribution is normal. But, in the case of
large samples from other population distributions, the interval is almost accurate by the Central
Limit Theorem.

Confidence Interval Table


The confidence interval table for Z values are given as follows

Confidence Interval Z Value

80% 1.282

85% 1.440

90% 1.645

95% 1.960

99% 2.576

99.5% 2.807
99.9% 3.291

How to Calculate Confidence Interval?


To calculate the confidence interval, go through the following procedure.
Step 1: Find the number of observations n(sample space), mean X̄, and the standard deviation
σ.
Step 2: Decide the confidence interval of your choice. It should be either 95% or 99%. Then find
the Z value for the corresponding confidence interval given in the table.
Step 3: Finally, substitute all the values in the formula.
Also, try out: Confidence Interval Calculator

Confidence Interval Example


Question: In a tree, there are hundreds of apples. You are randomly choosing 46 apples
with a mean of 86 and a standard deviation of 6.2. Determine that the apples are big
enough.
Solution:
Given: Mean, X̄ = 86
Standard deviation, σ = 6.2
Number of observations, n = 46
Take the confidence level as 95%. Therefore, the value of z = 1.960 (from the table)
The formula to find the confidence interval is
X̄ ± Zα/2 × [ σ / √n ]
Now, substitute the values in the formula, we get
86 ± 1.960 × [ 6.2 / √46 ]
86 ± 1.960 × [ 6.2 / 6.78]
86 ± 1.960 × 0.914
86 ± 1.79
Here, the margin of error is 1.79
Therefore, all the hundreds of apples are likely to be between in the range of 84. 21 and 87.79.

Example 8.11
A machine produces a component of a product with a standard deviation of
1.6 cm in length. A random sample of 64 componentsvwas selected from the
output and this sample has a mean length of 90 cm. The customer will reject
the part if it is either less than 88 cm or more than 92 cm. Does the 95%
confidence interval for the true mean length of all the components produced
ensure acceptance by the customer?

Solution:

Here μ is the mean length of the components in the population.

The formula for the confidence interval is

Therefore, 90 − (1.96 × 0.2) ≤ μ ≤ 90 + (1.96 × 0.2)

(89.61 ≤ μ ≤ 90.39)

This implies that the probability that the true value of the population mean
length of the components will fall in this interval (89.61,90.39) at 95% . Hence
we concluded that 95% confidence interval ensures acceptance of the
component by the consumer.

Example 8.12

A sample of 100 measurements at breaking strength of cotton thread gave a


mean of 7.4 and a standard deviation of 1.2 gms. Find 95% confidence limits
for the mean breaking strength of cotton thread.

Solution:
This implies that the probability that the true value of the population mean
breaking strength of the cotton threads will fall in this interval (7.165,7.635) at
95% .

Example 8.13

The mean life time of a sample of 169 light bulbs manufactured by a company
is found to be 1350 hours with a standard deviation of 100 hours. Establish
90% confidence limits within which the mean life time of light bulbs is
expected to lie.

Solution:

Given: n = 169,   = 1350 hours, s = 100 hours, since the level of significance is
(100-90)% =10% thus a is 0.1, hence the significant value at 10% is Za/2 = 1.645

Hence 90% confidence limits for the population mean are


Hence the mean life time of light bulbs is expected to lie between the interval
(1337.35, 1362.65) 

You might also like