You are on page 1of 40

CHAPTER ONE

1. SAMPLING THEORY AND


SAMPLING DISTRIBUTION
CHAPTER ONE
1. SAMPLING THEORY AND SAMPLING
DISTRIBUTION
1.1 Sampling Theory
Introduction
• Usually the population under study is very large or infinite
which makes studding it very difficult or impossible.
• Under such circumstances we take a sample or a subset of the
population to study the population.
• As we know Statistics is a science of inference. It is the
science of making general conclusion about the entire group
(the population) based on information obtained from a small
group or sample.
1.1.1 Basic Definitions: Sampling Verses Census
 In collecting information from the population, there are two alternative
approaches: census and sampling.
 Census refers to an approach by which every element of the population
will be included under the investigation so as to get information.
 Sampling refers to an approach taking only part of the population and
using the information gained from the part to make judgment about the
population.
 Sampling is a process what we all are essentially involved in doing in
our day-to-day activity consciously or unconsciously.
Cont’d
 A housewife makes judgment about the quality of wheat she
finally buys for domestic use by examining only a handful of
wheat grains taken from the offered for sale.
 A production manager does the same when he closely
examines only a few items of raw materials to be sure of its
quality before placing the bulk purchase order.
 A doctor examines a few drops of blood as sample and draws
conclusion about the blood constitution of the whole body.
 The ultimate purpose of any sampling process is to make
judgment about the totality, the whole by examining the
part.
 Population is the totality about which judgment is to be made
Cont’d
 Sample is part of the totality, which is used to make the
judgment.
 Sampling in effect is consists of selecting a sample and using
the sample data to gain knowledge about a given population
parameters.
 Investigators need to choose either of these two alternatives
based on their merits.
 Census approach is by far more reliable than that of sampling
as far as the validity and accuracy of the data to be collected
concerns.
 More often investigators prefer sampling to census especially
when the population size is very large.
1.1.2 The need for Samples
Major reasons why sampling is necessary.
1.Feasibility aspect: in some circumstance, it may not be
possible to include the whole elements of the population
under the investigation.
 This holds true in the case of infinite population, those
population that are otherwise on constant movement or very
large.
 e.g. population of insects ,fish, birds and the like.
 The same objective can be achieved through the use of
samples.
2. Destructive aspect:
• In some cases, the investigation may involve some kind of test
that may result in the damage of items included.
• In such cases, using census would not only be costly but also
self-defeating.
• For, e.g. supposes that, the quality control manager of kalit is
contemplating of conducting a test on the factory’s products.
• So as to determine this, products will be exposed to increased
heat until they start melting.
• In this regard, including every element of the product will
ultimately result in the damage of the whole products.
3. Efficiency aspect:
In some cases, information might be needed with in a
prescribed time limit.
In such cases, to contact the whole population through census
would be time consuming that it may lead to the dalliance of
the needed information.
Hence, to deliver required information in accordance with the
needed period of time, sampling is more preferable than that of
census as it takes much less time as compared to census.
4. Cost aspect:
• Obviously as it accommodates each element of the
population, census is often more costly than that of
sampling.
• This is so especially when the population is very large or
the cost of inspection per unit is very large.
• In such circumstances, the cost of census will often be far
exceeded from the budgetary limit that it prohibits complete
enumeration and in some cases the cost of studying the
population may be far greater than the value that the
research will have.
5. Manageability:
•All survey results, whether based on sample or census, are
subject to some kind of error.
•These errors might be emanated from poor planning, inefficient
execution and lack of desired control and coordination over the
survey staff.
•Owing to the out sizedness of elements included, census results
are more vulnerable to these errors than sample survey results.
1.1.3 Sampling and Non-Sampling Errors
Sampling error is the difference between a sample statistics and
the population parameter.
It is related with the sampling technique and approaches
Cont’d
 Non-sampling error is related with administering the survey;
 incorrect enumeration of population members,
 non random selection of samples,
 use of incomplete, vague or faulty questionnaire for data
collection,
 wrong editing, coding, and presenting of responses
received through the questionnaire.
Cont’d
• Population mean = sum of elements in a population
Number of elements
• Sample mean = sum of elements in the sample
Number of sample elements
• When a sample is chosen, we find that the difference between
a sample statistics and the population parameter consist
sampling and non sampling errors.
• Example: Suppose that you have a population consisting of
5 households, A, B, C, D and E. Their monthly income is as
follows:
Households A B C D E
MI (birr) 10000 7400 12000 20000 5600
 Assuming that you take sample of 3 households, A, C and
D, find sampling and non sampling errors
Solution:
• MI of the ppn = 10000 + 7400 + 12000 + 20000 +5600
5
55000/5 =
11000
• MI of 3 Hds = 10000 + 12000 + 20000 = 42000 = 14000
3 3
• Thus, the difference between sample mean and
population mean = 14000 – 11000 = 3000
• Now suppose the income of household, C has wrongly
been recorded as birr 15000. The sample
mean = 10000 + 15000 + 20000 = 45000 = 15000
3 3
• This means that the difference between sample mean and
population mean is not due to the sampling error alone.
There is also a non sampling error i.e. you may split the error
in to two components:
1. Sampling error = 14000 – 11000 = 3000
2. Non sampling error = (15000 – 11000) – sampling error
(3000) = 1000

Total error = birr 4000 (sampling error + non sampling error)


1.1.4 Types of samples
 Probability (Random)
o Each item or person in the population being studied has a
known (nonzero) likelihood of being included in the
sample.
 Non-probability (non random)
o The sample is selected based on contingency and
judgment.
I Random/ Probability/ Sampling Methods
 Every member of the population must have a known
nonzero chance of being selected.
a) Simple Random Sampling - Every member of the population
has an equal and independent chance of being selected.
Simple random sampling without replacement:
 Elements can enter the sample only once (ie) the units once
selected is not returned to the population before the next draw.
Simple random sampling with replacement:
 The population units may enter the sample more than once.
Methods of selection of a simple random sampling:
Lottery method- we can easily list the identification of all
items, place their identification numbered card in a bowl, mix
& then select the needed cards (samples).
 This method is time consuming and awkward.
Cont’d
Example: suppose a population consists of 400 employees in
small industries. A sample of 50 employees is to be selected
from that population. The way is:
 first write the name of each one on small slip of paper,
 deposit all of the slips in a box,
 thoroughly mixed,
 the first selection is made by drawing a slip out of the box
without looking at it.
Table of random – More convenient method of selecting a
random sample. The way is:
 first to give identification for all elements in the
population.
 Select a starting number by closing your eyes
Cont’d
 placing your finger on a numbering the table at random,
 then proceed downward until you have selected the needed
sample.
Example: In an area there are 500 families. Using the
following extract from a table of random numbers select a
sample of 15 families to find out the standard of living of
those families in that area.
4652 3819 8431 2150 2352 2472 0043 3488
9031 7617 1220 4129 7148 1943 4890 1749
2030 2327 7353 6007 9410 9179 2722 8445
0641 1489 0828 0385 8488 0422 7209 4950
Cot’d
Solution:
• In the above random number table we can start from any row
or column and read three digit numbers continuously row-
wise or column wise.
• Now we start from the third row, the numbers are:
203 023 277 353 600 794 109 179
272 284 450 641 148 908 280
• Since some numbers are greater than 500, we subtract 500
from those numbers and we rewrite the selected numbers as
follows:
203 023 277 353 100 294 109 179
272 284 450 141 148 408 280
b) Systematic Random Sampling
• The items or individuals of the population are arranged in
some way (alphabetical) or some other method / order/,
and selecting every kth (third or fifth or tenth, etc.) number
from the population to be included in the sample.
• This is done after the first number is selected at random.
The value K is called the sampling interval.
k (sample interval) = N/n
• For example: suppose a sample size of 50 is desired from a
population consisting of 1000 accounts receivable. The
sampling interval (K) is N/n = 1000/50 =20. Thus, a sample of
50 accounts is identified by moving systematically through
the population and identifying every 20th account after the
first randomly selected account number.
c) Stratified Random Sample
• This method is useful when the population consists of a
number of heterogonous subpopulations and the
members within a given subpopulation are relatively
homogenous compared to the population as a whole.
• Thus, population is first divided into subgroups called
strata, and a sample is selected from each stratum.
• E.g. we can divide a human population in to different
strata (subgroups) on the basis of age, sex, occupations,
education, region, religion etc.
Types of Stratified Sampling:
• There are two types of stratified sampling. They are
proportional and non-proportional.
• In the proportional sampling proportionate representation is
given to subgroups or strata. If the number of items is large,
the sample will have a higher size and vice versa.
• The population size is denoted by N and the sample size is
denoted by ‘ n’ the sample size is allocated to each stratum in
such a way that the sample fractions is a constant for each
stratum.
• That is given by n/N = c. So in this method each stratum is
represented according to its size.
• In non-proportionate sample, equal representation is given to
all the sub-strata regardless of their existence in the
population.
Example:
• A sample of 50 students is to be drawn from a population
consisting of 500 students belonging to two institutions A and B.
The number of students in the institution A is 200 and the
institution B is 300. How will you draw the sample using
proportional allocation?
Solution:
• There are two strata in this case with sizes N1 = 200 and N2 = 300
and the total population N = N1 + N2 = 500
• The sample size is 50.
• If n1 and n2 are the sample sizes,
• n1 = n × N1/N = 50×200/500 = 20
• n2 = n × N2/N = 50×300/500 = 30
• The sample sizes are 20 from A and 30 from B. Then the units from
each institution are to be selected by simple random sampling.
d) Cluster Sampling
• It is dividing the population in to small units. These units are
called primary units.
• Then select at random certain groups or clusters. After
cluster has been selected all of the elements in each cluster
are included in the sample.
• The remaining clusters are ignored. Precision will suffer if
the ignored clusters are not similar to the sampled clusters.
• It works best for homogeneous population. This
technique is often employed to reduce cost of sampling
a population scattered over a large geographic area.
II Non-random (Non-probability) Sampling Methods
• A random sample is ideal for statistical analysis but non
random sampling methods have been also advised when
random sampling is not feasible.
i) Quota sampling
• With random sampling, the investigator plays no part in the
selection of respondents; he/she merely has a list with
names and addresses of respondents.
• But, in quota sampling the selection of respondents lies
with the investigator, although in making such selection
he/she must ensure that each respondent satisfies certain
criteria which are essential for the study.
ii) Judgment/ purposive sampling
• Judgment sampling method can also be called as sampling by
opinion.
• In this procedure, the investigator selects units of the sample
that he/she feels are most representative of the population
with respect to the population characteristics under study.
• For example, a teacher may select two or three students
from his class, Judging that these students would reflect the
general opinion of all students in the class on certain issues.
iii) Convenience sampling
• In this procedure, units to be included in the sample are
selected at convenience of the investigator rather than by
any pre-specified or known probabilities of being selected.
• Convenience samples are easy for collecting data on a
particular issue.
• The results obtained by convenience sampling method can
hardly be said to be representative of the population
parameters. Therefore, the results obtained are generally
biased and unsatisfactory.
• However, convenient sampling approach is generally used for
making pilot studies, particularly for testing a questionnaire
and to obtain preliminary information about the population.
1.2 Sampling Distribution
 To arrive at the statistical inference, samples of a given
size are drawn repeatedly from the population and
statistic computed for each sample.
 The computed value of a particular statistic will be
differing from sample to sample.
 Theoretically or by observation it would be possible to
construct a frequency table showing the value assumed
by the statistic and frequency of occurrence.
 This distribution of values of statistic is called sampling
distribution, because the values are the outcome of a
process of sampling.
 A sampling distribution is a probability distribution
consisting of all possible values of a sample statistics.
Cont’d
 In a population containing N observations, the number of
samples each containing n observations = NCn if the
samples is done without replacement. But if the sample is
done with replacement the number of samples is equal to
Nn
Two important terms in sampling distribution:
 Population parameter – A numerical measure of a
population; population mean ( ) population variance ( ( 2)
population standard deviation () ( population proportion
(p) etc.
 Sample statistics / Statistic/ - A numerical measure of the
sample; Sample mean (x) sample variance (S2) sample
standard deviation (S) sample proportion ( p) etc.
1.2.1 Sampling Distribution of the means (x)
 A sampling distribution of sample means is a
distribution using the means computed from all
possible random samples of a specific size taken from a
population.
 Sampling error is the difference between the sample
measure and the corresponding population measure
due to the fact that the sample is not a perfect
representation of the population.
Properties of the Distribution of Sample Means
1. The mean of the sample means will be the same as the
population mean.
2. The standard deviation of the sample means will be
smaller than the standard deviation of the population
Cont’d
3.The sampling distribution of sample mean values from
normally distributed populations is the normal distribution
for samples of all sizes.
 If the sampling distribution of x is normal, the standard
error of the mean, x can be used in conjunction with
normal distribution to determine the probabilities of
various values of sample mean.
 For this purpose the value of sample mean is first
converted in to a Z value on the standard normal
distribution to know how any single mean value deviates
from the mean of sample mean values
 Z value is actually the number of standard deviations that a
particular X value is away from the mean. The variable Z is
also called Standardized normal random variable.
Cont’d
Example1: The average IQ scores of students in a school for
gifted children is 165 with standard deviation of 27. A
random sample of 36 students is taken. what is the
probability that:
a) the sample mean will be greater 170
b) the sample mean will be less than 158
c) the sample mean will be between 155 and 160
d) the sample mean will be either less than 170 or more
than 175
Example2: The mean length of life of a certain cutting tool
is 41.5 hrs with a standard deviation of 2.5hrs. What is the
probability that a simple random sample of size 50 drawn
from this population will have a mean between 40.5hrs
and 42hrs?
Cont’d
Example 3: The annual wages of all employees of a company
has a mean of 20,400 per year with standard deviation of
3200. The personnel manager is going to take a random
sample of 36 employees. What is the probability that the
sample mean will exceed 21,000?
Example 4: A company makes engine used in speedboats.
The company’s engineers believe that the engine delivers an
average power of 220 horse power / HP/ and that the
standard deviation of power delivered is 15 HP. A potential
buyer intends to sample 100 engines (each engine to be run
a single time). What is the probability that the sample mean
will be less than 217 HP.
1.2.2 Sampling distribution of sample proportion
 There are many situations in which each individual
member of the population can be classified in to two
mutually exclusive categories such as success/ failure,
accept/reject, head/tail of a coin etc. In such cases the
sample proportion p having the characteristics of interest
is the best statistics to use for statistical inferences about
the population proportion parameters, p.
Example 1: 15% of people in small community of sands
point have type B blood. A random sample of 500 persons
is selected. What is the probability that the sample
proportion of people with blood type B is:
a) more than 17.5%
b) less than 14%
c) between 16% and 18%
Cont’d
Example 2: The quality control department of a paints
manufacturing company, at the time of dispatch of decorative
paints, discovered that 30 percent of the containers are
defective. If a random sample of 500 containers is drawn with
replacement from the population, what is the probability that
the sample proportion will be less than or equal to 25 percent
defective?
Example 3: A manufacturer of screws has found that on an
average 0.04 of the screw produced are defective. A random
sample of 400 screws is examined for the proportion of
defective screws. Find the probability that the proportion of
defective screws in the sample is between 0.02 and 0.05.
1.2.3 Sampling distribution of the difference
between two means
 Sampling distribution of sample mean can also be used to
compare a population of size N1 having mean 1 and
standard deviation 1 with another similar type of
population of size N2 having mean 2 and standard
deviation 2.
 n1,n2 = independent random samples drawn from 1st and 2nd
population respectively.
 The sampling distribution of x1 - x2 is the probability
distribution of all possible values the random variable x1 - x2
may take when independent samples of size n1 and n2 are
taken from two specified populations.
Cont’d
 When sampling is done from two populations with means μ1
and μ2 and standard deviations σ1 and σ2 respectively, the
sampling distribution of the difference of sample means x1 - x2
approaches to a normal distribution with mean μ1 - μ2 and
standard deviation as the sample sizes n1 and n2 increases.

Example 1: Car manufacturer A have a mean of lifetime of


1400 hours with a standard deviation of 200 hours while
those of manufacturer B have a mean life time of 1200 hours
with a standard deviation of 100 hours. If random samples of
125 of each manufacturer are tested, what is the probability
that manufacture A will have a mean lifetime which is at least
(a) 160 hours more than manufacturer B. (b) 250 hours more
than manufacturer B.
Cont’d
Example 2: The makers of Duracell batteries claims that the
size AA battery lasts on an average of 45 minutes longer than
Duracell’s main competitor, the Energizer. Two independent
random samples of 100 batteries of each kind are selected.
Assuming σ1 = 84 minutes and σ2 = 67 minutes, find the
probability that the difference in the average lives of Duracell
and Energizer batteries based on samples does not exceed 54
minutes.
Reading Assignment
1.2.4 sampling distribution of the difference between
two proportions
• Let us assume we have two binomial populations labeled as 1
and 2. So that
– p1 and p2 denote the two population proportions
– n1 and n2 denote the two sample sizes
– p1 and p2 denote the two sample proportions
 The sampling distribution of p1 - p2 is the probability
distribution of all possible values the random variable p 1 -
p2 may take when independent samples of size n1 and n2
are taken from two specified binomial populations.

You might also like