You are on page 1of 60

CHAPTER ONE

Sampling and Sampling


Distributions
Under this chapter;
 Introduction
 Basic Terminologies in sampling theory
 The need of Sampling
 Sampling Error
 Probability Sampling
 Method of Probability Sampling
 Sampling Distribution
 Central Limits Theorem
 Sampling Distribution of the Standardized
Statistics
 Sampling distribution of the mean and
proportion
 Sampling distribution of the difference
between two means and two proportions
2
Introduction
Statistics is a science of inference. It is the
science of making general conclusion about the
entire group (the population) based on
information obtained from a small group or
sample.
Sampling in statistics is a common and
important as salt is in food. In homes, ladies
take out one teaspoonful to detect the quality
what she is cooking. In medical sciences, a few
drops of blood are taken and tested
microscopically or chemically to know whether
the blood contains some abnormalities or not..
3
Cont..d
Nowadays, sampling methods are extensively used in
socio-economic surveys to know the living condition,
cost of living index etc. of a class of people.
In biological studies, experiments are conducted on
some units (persons, animals or plants) and
inferences are drawn about the breed or variety to
which the units belong.
In the industries sampling procedures are
predominantly used for quality control.
Sampling theory: is the study of relationships
existing between a population and samples drawn
from the population.

4
Basic Concepts and Terminologies in sampling
theory
Population: is an aggregate of objects (animated or
inanimate)under study. Its is a collection of individuals
or of their attributes of results of operations which can
be numerically specified
Sample: is a subset selected from a population. It is a
portion chosen for study from a population
Census: Involves complete enumeration of full count
of the entire units comprising the population in respect
of the parameter of our interest.
Sampling : consists of drawing a sample from a given
population and using the sample data to gain
knowledge about the population parameters.
5
Cont..d
WHY SAMPLING ?
 The Destructive Nature of Certain Tests
Many experiments especially in quality control demand
destructing outputs. Consider the following tests:
Testing wine or coffee
Blood test for a patient
Testing strength of light bulbs
Seed test for germination etc.
Unless sample is taken from the entire population the
wine tester should drink all the wine, all the blood from
the patient should be poured-out, all the light bulbs
produced should be destroyed and nothing would
6
remain for sale. Here sample is a must.
Cont..d
 The Physical Impossibility of Checking all Items in
the Population
The populations of fish, birds and other wild lives are
large and are constantly moving being born and dying.
There is no mechanism to contact all items or
individual members of the population.

7
Cont..d
 The Cost of studying all the Items in a
Population is Often Prohibitive
Public opinion polls and consumer testing
organizations usually contact fewer families out of
millions. Consider a multi national corporation with
50 million customers world wide. If this company
plans to undertake market survey out of the 50
million it will take 2000 samples, if it takes 20 br. to
mail samples and tabulate the responses of 2000
samples, total survey will cost Br. 40,000. While the
same survey involving 50 million population would
cost about one billion br.
8
Cont..d
The Adequacy of Sample Results
Even if funds were available, it is doubtful whether the
additional accuracy of 100% sample i.e., studying the
entire population is essential in most problems.
To determine monthly index of food prices, bread, beans,
milk etc, it is unlikely that the inclusion of all grocery stores
and shops would significantly affect the index, Since, the
prices of such commodities usually do not vary by more
than a few cents form one store to another. 100%
accuracy cannot be all ways guaranteed by
studying the entire population. The chance of error
in collecting and analyzing bulk data has its own
disadvantage.
9
Cont..d
 To Contact the Whole Population Would Often be
-time Consuming
A market survey may take two or three days for field
interviews by taking a sample of 2000 customers. By
using the same staff and interviewers and working
seven days a week it would take nearly 200 years to
contact 50 million customers

10
SAMPLING ERRORS
 A very important consideration in sampling is to select
the sample in such a way that it is very likely to have
characteristics similar to the population as a whole. Other
wise, the sample could have characteristics quite
different from the population. In that case you could draw
erroneous conclusions about the population on the basis
of improperly chosen sample.
Sample surveys do imply the study of a small portion of
the population and as such there would naturally be a
certain amount of inaccuracy in the information collected.
This inaccuracy may be termed as sampling error or error
variance.
The discrepancies between population parameters
and estimates (statistics), which are derived from a
random sample is also the error or the sampling bias
11
Cont..d
 In short, sampling error is the difference between a
sample statistic and its corresponding population
parameter.
Error can be sampling or non-sampling error.
Sampling error is related with the sampling technique
and approaches while non-sampling error is related with
administering the survey.
Sampling errors can be identified and rectified using
some mathematical techniques. While,
 Non-sampling errors are very difficult to identify and
rectify before making conclusions and arise in all survey
whether it is a sample survey or census survey.
12
Method of Sampling
A) Probability/Random sample is a sample selected in
such away that each item or person in the population being
studied has a known (nonzero) likelihood of being included
in the sample.
B) Non-probability sample is a sample selected based
on contingency and judgment,experience,convenience
etc.
 If non-probability methods are used, not all items or
people have a chance of being included in the sample.
In such instances the result may be biased, the sample
result may not be representative of the population.
 Panel sampling and convenience sampling are non-
probability sampling. They are based on convenience
to the statistician. Statistical procedures used to
evaluate sample results based on probability sampling.
13
METHODS OF PROBABILITY SAMPLING

• All probability sampling methods have one goal, to


allow chance to determine the items or persons to be
included in the sample.
• There are different types of sampling techniques.
However there is no one best method of selecting a
probability sample.
• A technique best for a given circumstance or situation
may fail in another situations.
Commonly used probability sampling techniques are
the following:

14
Cont..d
A) Simple Random Sampling
 A sample formulated in such a manner that each item or
person in the population has the same chance of
being included in the sample. We can easily list
the name or identification of all items i.e. the population
on a piece of paper and properly fold and mixing and
ruing the lot until we have the required sample size. This
method is time consuming and awkward.
 More convenient method of selecting a random sample is
to use a table of random numbers. It is necessary first to
give identification for all elements in the population. We
will select the starting point arribitrarily and continue to
take the sample until we have the required sample size.
This method may be to use in certain research situations.
Mostly difficult when the population is very larger.
15
Cont..d
B) Systematic Random Sampling
The items or individuals of the population are
arranged in some way (alphabetical) or some other
method. A random starting point is selected and then
every Kth member of the population is selected for the
sample.
A systematic random sample should not be used, if
there is a predetermined pattern to the population.
Like inventory control, or if values are listed in
ascending or descending orders.

16
Cont..d
C) Stratified Random Sample
• A population is first divided into subgroups
called strata, and a sample is selected form each
stratum. Stratum can be
- Proportional sample / to the population or
- Non-proportional
Example. Studying advertising expenditure of
352 large companies. Profitability percentage is
used to stratify this population. We need to select
50 samples.

17
Cont..d
Stratified sampling has the advantage, in some cases,
of more accuracy reflecting the characteristics of the
population than those simple random or systematic
random
Stratum
(0) sampling.
Profitability
(1)
Number of (2) % of total
(3)
Number (4)
(50x(3))

1 30 % and over 8 2% 1
2 20-30% 35 10% 5
3 10-20% 189 54% 27
4 0 up to 10% 115 33% 16
5 deficit 5 1% 1
352 100 50
18
Cont..d
d) Cluster Sampling
It is sampling in which one divides the elements in the
population in to a number of clusters or groups. One
then begins by choosing at random a sample of these
clusters, called primary units; after which all or a
simple random sample of the elements in each chosen
cluster is selected. Some times, this is referred as two
stage cluster sampling.
This technique is Often employed to reduce cost
of sampling a population scattered over a large
geographic area.
19
Comparison of Stratified and Cluster Sampling
Most of the time students face difficulty in
differentiating stratified and cluster sampling. The
main distinguishing criteria of stratified from cluster
sampling is that in the case of stratified sampling the
population is divided in to well-defined groups,
where each group has homogeneity with in itself but
wider heterogeneity (or variation) among the groups.
 In the case of cluster sampling, the situation is the
reveres for stratified sampling (i.e., the different
clusters are homogeneous but elements in each
cluster are heterogeneous).
20
Non-Random Sampling/
Non-Probability/Judgment
 Accidental, Haphazard or Convenience
Sampling
 Purposive Sampling
 Modal Instance Sampling
 Expert Sampling
 Quota Sampling
 Snowball Sampling

21
Cont..d
Accidental, Haphazard or Convenience
Sampling
 One of the most common methods
of non-probability sampling goes
under Convenience Sampling. This
is the category of the traditional
"man on the street" (of course, now
it's probably the "person on the
street") interviews conducted
frequently by television news
programs to get a quick (although
non representative) reading of
22 public opinion.
Cont..d
It is argued that most researchers use, in their process of
data collection, primarily convenience sampling. In many
research contexts, we sample simply by asking for volunteers.
Clearly, the problem with all of these types of samples is that
we have no evidence that they are representative of the
populations we are interested in generalizing to and in many
cases we would clearly suspect that they are not.
Purposive Sampling
In purposive sampling, we sample with a purpose in mind.
We usually would have one or more specific predefined groups
we are seeking. With a purposive sample, you are likely to get
the opinions of your target population, but you are also likely
to overweight subgroups in your population that are more
readily accessible.
23
Cont..d
Modal Instance Sampling: - In statistics, the mode is the
most frequently occurring value in a distribution. In sampling,
when we do a modal instance sample, we are sampling the
most frequent case, or the "typical" case. In a lot of informal
public opinion polls, for instance, they interview a "typical"
voter.
There are a number of problems with this sampling approach. First, how
do we know what the "typical" or "modal" case is? We could say that
the modal voter is a person who is of average age, educational level, and
income in the population. But, it's not clear that using the averages of
these is the fairest (consider the skewed distribution of income, for
instance). And, how do you know that those three variables -- age,
education, income -- are the only or event the most relevant for classifying
the typical voter ? What if religion or ethnicity is an important
discriminator?
Clearly, modal instance sampling is only sensible for
informal sampling contexts.
24
Cont..d
Expert sampling involves the assembling of a sample
of persons with known or demonstrable experience and
expertise in some area. Often, we convene such a
sample under the support of a "panel of experts." There
are actually two reasons you might do expert sampling.
First, it would be the best way to elicit the views of
persons who have specific expertise. In this case,
expert sampling is essentially just a specific sub-case of
purposive sampling.
But the other reason you might use expert sampling is
to provide evidence for the validity of another sampling
approach you've chosen.

25
Cont..d
Quota sampling:-In
sampling:- quota sampling, you select samples
non-randomly according to some fixed quota. There are two
types of quota sampling: proportional and non- proportional.
proportional quota sampling -you want to represent the
major characteristics of the population by sampling a
proportional amount of each.
For instance, if you know the population has 40% women
and 60% men, and that you want a total sample size of 100,
you will continue sampling until you get those percentages
and then you will stop.

26
Cont..d
Non-proportional quota sampling is a bit less
restrictive. In this method, you specify the minimum
number of sampled units you want in each category.
Here, you're not concerned with having numbers that
match the proportions in the population. Another term
for this is sampling for diversity. In many
brainstorming or nominal group processes (including
concept mapping), we would use some form of
heterogeneity sampling because our primary interest
is in getting broad spectrum of ideas, not identifying
the "average" or "modal instance" ones.
In effect, what we would like to be sampling is not
people, but ideas.
27
Cont..d
• We imagine that there is a universe of all possible
ideas relevant to some topic and that we want to
sample this population, not the population of
people who have the ideas. Clearly, in order to
get all of the ideas, and especially the "outlier" or
unusual ones, we have to include a broad and
diverse range of participants.
• Heterogeneity sampling is, in this sense, almost
the opposite of modal instance sampling

28
Cont..d
Snow Ball Sampling:-In snowball sampling, you
begin by identifying someone who meets the criteria
for inclusion in your study. You then ask them to
recommend others who they may know who also
meet the criteria.
•Although this method would hardly lead to
representative samples, there are times when it may
be the best method available.
•Snowball sampling is especially useful when you are
trying to reach populations that are inaccessible or
hard to find.

29
SAMPLING DISTRIBUTION
Basic Terminologies
- Parameter: is a numerical characteristics of the
population
- Statistics : a numerical characteristics of a sample
- Population distribution: is the distribution of individual
measurements of a population
- Sample distribution: is a probability distribution of
sample statistics
- Sample mean: is an estimate of population mean that can
be more accurate by taking large samples.
30
Cont..d
• Sampling distribution of the mean is the probability
distribution of all possible sample means of a given size,
selected from a population. The number of samples that
can be drawn from population depends on whether we
sample with replacement or without replacement
A) Sampling with replacement : a selected element is
returned to the population before another selection is made. As a
result a number of samples of size-n that can be drawn from the
population of size –N is given as (N)n where ,N population size
and n Sample size;
Example: N=3 n =2 Number of sample = 32 = 9 Samples
Let us say we have ‘A,B and C are the elements in the finite
population, then the sample elements are ;
31
AA,AB,AC,BA,BB,BC,CA,CB and CC
Cont..d
B). Sampling with out replacement : in sampling without
replacement it will be impossible to get a sample in which
an element appears more than once.
• Thus ,the number of simple random sample size of n that
can be drawn with out replacement from a population of
size N is given by:

Cn=
N N!
n!(N-n)!
Example: N= 3 and n= 2 number of samples = 3C2=3
samples

32
Cont..d
Population parameter – A numerical measure of a
population, population mean,  population variance,
2, population standard deviation, , population
proportion, p etc.
Sample statistics / Statistic/ - or A numerical measure
of the sample ;x
Sample mean ,sample variance- S2 sample standard
deviation S, sample proportion p, etc.

33
Cont..d
 In establishing relationship between the population and
the sampling distribution of the mean the following
symbols will be used:
N = population Size
n = sample size
 x = Population mean
 x = mean of sample mean

x = Sample mean
x= Population Standard deviation
X = Standard deviation of sample mean

34
Cont..d
Sampling Distribution of the means( x )
 Sampling distribution of the sample means, is the
probability distribution consisting of a list of all possible
sample means of a given sample size selected from a
population, and the probability of occurrence associated
with each sample mean.

35
Cont..d
Example 2. The following distribution is the hourly wage of
seven employees
Employee Hourly wage

A 7
B 7
C 8
D 8
E 7
F 8
G 9
This population has a mean of 7.71 houlry wage i.e. 54/7
36
Cont..d
If we are planning to take sample of two employees, we will have 21 (7C2) possible
samples and corresponding sample means. The 21 possible samples with their mean are the
following:-
x
Possible Sample Sample mean ( )
AB 7.0
AC 7.5
AD 7.5
AE 7.0
AF 7.5
AG 8.0
BC 7.5
BD 7.5
BE 7.0
BF 7.5
BG 8.0
CD 8.0
CE 7.5
CF 8.0
CG 8.5
DE 7.5
DF 8.0
DG 8.5
EF 7.5
EG 8.0
FG 8.5
37  = 162
Cont..d
Summary of sampling distribution of the means for n=2 will be
Sample Mean No of Means Probability

7 3 0.1429
7.5 9 0.4285
8.00 6 0.2857
8.50 3 0.1429
Total 21 1
The mean of the distribution of sample means is obtained by
summing the various sample means and dividing the sum by the
number of samples. The mean of all the sample means is usually
written   reminds us that it is a population value because
x
we have considered all possible samples. The subscript
x
indicates that it is a sampling distribution of means.
=
x 7  7.5  ...  8.5 162
38
  7.71
21 21
THE CENTRAL LIMIT THEOREM
It is one of the most important theorems in statistics. In
selecting simple random samples of size n from a population of
N elements, the sampling distribution of the sample mean ( )
can be approximated by a normal probability distribution as the
sample size becomes large. The significance of the central limit
theorem is that it permits us to use sample statistics to make
inferences about population parameters without knowing
anything about the shape of the frequency distribution of that
population other than what we can get from the sample. That is
the crucial additional information provided by the central Limit
theorem is that whatever the distribution of the X i (a certain
random variable), provided that is finite, as a number of terms n
in the sum becomes large, the distribution of Z tends to the
standard normal distribution
39
Cont…d
For a population with mean  and Variance 2, the sampling
distribution of the means of all possible samples of size n
generated from the population will be approximately normally
distributed with the mean of the sampling distribution equal to
 and the variance equal to , 2 assuming that the sample

size is sufficiently large. n
The important facets of the central limit theorem bear
repeating.
if the sample size n is sufficiently large, the sampling
distribution of the means will be approximately normal
regardless of the distribution of the population from which the
random sample is drawn
if a population is large and a large number of samples are
selected from the population then the means of the sample
means will be close to the population mean.
40
Cont..d
the variance of the distribution of sample means is
determined by 2/n. This implies that as the sample
size increases the variation of xabout its mean
decrease. Note that a sample of 30 or more elements
is considered sufficiently large for the central limit
theorem to take effect
A larger minimum sample size may be required for a
good normal approximation when the population
distribution is very different from a normal distribution.
While a smaller minimum sample size may suffice for a
good normal approximation when the population
distribution is close to a normal distribution

41
Cont..d
One important application of the central limit theorem
is in the area of quality control. The manufacturing
process is variable and be monitored to be sure that the
variability does not get beyond acceptable levels.
A control chart is used to assist in monitoring the
variability chart is used to control variation in the
sample means.
The Chart has two limits about the mean 
Upper control limit (UCL)
Lower control limit (LCL)
Sampling Mean
The center line is the desired mean, .
42
Cont..d
The centerline is the desired mean, .

UCL(Upper Control Limit)

LCL(Lower Control Limit)

[[

1 2 3 4 5 6……… Sample
number

43
Cont..d
If a point is observed above UCL or below LCL the
process is stopped and find the problem.
The upper and lower control limits are generally
located one, two, or three times above and below 
depending on the nature of the product and the
process.

44
Property of the distribution of sample mean(x )
1. The mean of the sampling distribution of the
mean is always equal to the population mean;
i.e  x

x=

45
Cont..d
2. Standard error of the mean ( ) X

is a measure of dispersion of the distribution of sample means
and is similar to the standard deviation in a frequency
distribution and it measures the likely deviation of a sample
mean from the grand mean of the sampling distribution.

 X    Where N = number of sample means


2

X  X

a) if mean is given for a finiteNpopulation,

X 

.
 N  n
N  1 from a finite population.
n out replacement
If n> 0.05 N and if sampling is made with
Where  = Population standard deviation
N = Population size
n = Sample size.
.

46
Cont..d
b) If n< 0.05 N and if the sample is made with
replacement X   .
n

x
3. Form of distribution of
 There are two important theorems with respect to the
shapexof the sampling distribution of the means.
i. if is the mean of random sample taken from the
x
population and if the population values are normally
  ofx isalsonormally
distributed ,the sampling distribution 
.
X
distributed regardless of sample size with x= nand
x
47
Cont..d
ii. When the population values are not normally distributed ,we take
the advantage of central limit theorem.
The central limit theorem states that if the population is not
normal ,the distribution of the sample means will approximate a
  x with x X=  and.
normal distribution
n
if the sample size is sufficiently large. This approximation is
near perfect for n> 30 but n < 0.05 N
4. Determination of Z score,
Z= X -  x where X- the normally distributed random
variable x
X x
 X - mean of sample means which is equal to
 - the standard deviation of sample means
which is equal to n .
48
Self Checking Exercise
A population consists of the following ages;
10,20,30,40 and 50. A random sample of two is to
be selected from this population without
replacement
Calculate:
A.The mean of the population
B.Standard deviation of the population
C.Mean of sampling distribution of the means
D.Standard deviation of sampling distribution of the
means

49
Cont..d
Example 1: The distribution of annual earnings of all
economics graduates with zero year experience is skewed
negatively.
This distribution has a mean of 19,000 Birr and a standard
deviation of 2,000 Birr. If we draw a random sample of 30
fresh economic graduates, what is the probability that their
earnings will average more than 19,750 Birr annually?
Solution: given:
In order to answer this question, first let’s calculate the
standard error of the mean

Next, let’s convert the random variable in to standard


normal probability Value (Z)
50
Cont..d
Then, Z = 2.05 corresponds to area equal to 0.4798; we are
interested to area above Z = 2.05. This is obtained by taking the
difference between the area to the right of Z = 0, which is 0.5
and the area between Z = 0 and Z = 2.05 that is 0.4798. Thus,
the area to the right of Z = 2.05 is (0.5 – 0.4798) = 0.0202.
Thus, P(x > 19,750) = 0.0202. This is the area to the right of Z=
2.05 of the normal distribution graph shown below. In this
problem’s case we assumed that the sampling distribution of the
mean is normal, using the central limit the area since n = 30.
Homework 1: If the number of miles per gallon
achieved by all cars of a particular Model has mean of 25
and standard deviation of 2, what is the probability that,
for a random sample of 20 such cars, average miles per
gallon will be less than 24? Assume that the population
51
distribution is normal.
DISTRIBUTION OF THE STANDARDIZED STATISTICS FOR THE
SAMPLE MEAN
In order to use the central limit theorem, we need to know the
population standard deviation when it is not know the standard
deviation of the sample, designated by S is used to approximate
it. The standardized distribution of the sample means is Z and
Z = x   if the population standard deviation is known xor, 
 s
n n

if the population standard deviation is unknown.


Example 2: The annual wages of all employees of a company has a
mean of 20,400 per year with standard deviation of 3200. The personnel
manager is going to take a random sample of 36 employees and calculate
the sample mean wage. What is the probability that the sample mean will
exceed 21,000?
n= 36  = 20,400 and  =3200
P[ > 21,000] = x   = 21000  20400 = 1.125

n 3200

52 P(Z > 1.13) = 0.1292 36
Example 2
Hourly wages of workers in an industry have a mean wage
rate of Br. 5 and standard deviation of birr 0.60.What is the
probability that the mean wage of a random sample of 50
workers will be between Br.5.10 and Br.5.20?

53
Home Work- 2
A company makes engine used in speedboats.
The company’s engineers believe that the engine
delivers an average power of 220 horse power /
HP/ and that the standard deviation of power
delivered is 15 HP. A potential buyer intends to
sample 100 engines (each engine to be run a
single time). What is the probability that the
sample mean, will be greater than 217 HP.

54
Sampling Distribution of the proportion
Sampling distribution of the proportion is the probability of
all possible values of the sample proportion ,p. In sampling
distribution of sample proportions the following symbol should
be used;
Properties;
1)The true population proportion and the mean of sampling
distribution of theproportion are equal .
p= p
2) The standard deviation of the sampling distribution of
proportion is computed in two ways;
a) When the population is infinite or very large, the sampling
distribution is with replacement
55
Cont..d
b) When sampling is done without replacement and the sample
size exceeds 5% of the population size then,

3) Sampling distribution of the proportion will be normal;


a) When samples of a fixed size are drawn from normally
distributed population.
b) When sampling distribution of the proportions confirms with
the central limit theorem i.e n> 30 or n.p> 5 and n.q >5
4) The Z value is computed as;
56
Example 1;

A manufacturer of screws has noticed that on average two


percent of the screws produced are defective. A random
sample of 400 screws is examined for proportion of
defectives screws. Find the probability that the proportion of
defective screws in the sample is between 0.01 and 0.03.

57
Home work 3
Assume department of accounting and finance
second year student test I for the course cost
and management accounting reveals ,on
average 7% of students has score less mark
which is less than the pass mark. Assume again
if the department would like to investigate why
such rate has occurred by taking 80 previous
batch students as a sample. What is the
probability of a students who scored in between
5% and 9%.?

58
Sampling distribution of the difference
between two means
With a given two population, Population 1 and population 2
Properties
a) Mean of the difference between sample mean 1 and mean 2
is equal to its population mean
b) Standard deviation of two means is calculated;

59
Example
Consider two manufacturers, A and B of electric
bulbs. Bulbs manufactured by A have mean life of 900
hours and standard deviation of 30 hours. Those
manufactured by B have a mean life of 860 hours and
standard deviation of 20 hours. Find the probability
that a mean life of bulbs based on a random sample of
40 bulbs of manufacturer A will be less than 28 hours
of the mean life based on a random sample of 30 bulbs
of manufacturer B.

Required : P( x 1  x2
)< 28 ?
60

You might also like