You are on page 1of 7

1.

SAMPLING AND SAMPLING DISTRIBUTIONS


 Population - is the total group of people about whom you are researching and about which you want to draw
conclusions.
 Sampling element: are the individual cases in the population (usually, persons).
 Sample frame- is the actual list of sampling units from which the sample or some stage of sample is selected.
 Bias. Systematic errors produced by your sampling procedure.
1.2. REASONS / NEED FOR SAMPLING:
Sampling is a technique of collecting data only on a part of the population to reveal the characteristics of the entire
population.
Here are some of the reasons for sampling:
 Time: as it is difficult to contact each and every individual of the whole population.
 Cost: The cost or expenses of studying all the items (objects or individual) in a population may be prohibitive.
 Physically Impossible: Some population are infinite, so it will be physically impossible to check the all items in the
population, such as populations of fish, birds, snakes, mosquitoes.
 Destructive Nature of items: to test the quality of some products or items it may be necessary to consume or destroy
it. Under such circumstances a census would consume everything or destroying it. Sampling remains the only
choice when a test involves the destruction of the items under study.
 Reliability: Using a scientific sampling technique the sampling error can be minimized and the non-sampling error
committed in the case of sample survey is also minimum, because qualified investigators are included.
1.3. DESIGNING AND CONDUCTING A SAMPLING STUDY
Five decisions must be made in designing sample.
 Identifying the relevant population, Determining the method of sampling, Secure a sampling frame, Identifying
parameters of interest and Determining the sample size
1.4. TYPES OF SAMPLING
1.4.1 PROBABILITY SAMPLES
 Simple Random Sampling
The most widely known type of a random sample is the simple random sample (SRS) Simple random sampling is a
method of selecting n units from a population of size N such that every possible sample of size n has equal chance of
being drawn.
 Stratified Random Sampling
In this form of sampling, the population is first divided into two or more mutually exclusive segments based on some
categories of variables of interest in the research. It is designed to organize the population into homogenous subsets before
sampling, then drawing a random sample within each subset.
 Systematic Sampling: This method of sampling is at first glance very different from SRS. In practice, it is a variant
of simple random sampling that involves some listing of elements - every n th element of list is then drawn for
inclusion in the sample. Say you have a list of 10,000 people and you want a sample of 1,000.

1
Creating such a sample includes three steps:
 Divide number of cases in the population by the desired sample size. In this example, dividing 10,000 by 1,000 gives
a value of 10.
 Select a random number between one and the value attained in Step 1. In this example, we choose a number between
1 and 10 - say we pick 7.
 Starting with case number chosen in Step 2, take every tenth record (7, 17, 27, etc.).
 Cluster Sampling
Cluster sampling means a method of surveying a population based on groups naturally occurring in a population
1.4.2 NONPROBABILITY SAMPLING
There are four primary types of non-probability sampling methods:
 Convenience Sampling
Convenience sampling is a method of choosing subjects who are available or easy to find.
Quota Sampling: Quota sampling requires that representative individuals are chosen out of a specific subgroup
Purposive Sampling: Purposive sampling is a sampling method in which elements are chosen based on purpose of the
study.
Snowball Sampling: Snowball sampling is a method in which a researcher identifies one member of some population of
interest, speaks to him/her, and then asks that person to identify others in the population that the researcher might speak
to.
SAMPLING, NON-SAMPLING ERRORS
 Sampling error: is the error that arises in a data collection process as a result of taking a sample from a population
rather than using the whole population.
 Non-sampling error: is error caused by factors other than those related to sample selection.
Non-sampling error can include (but is not limited to):
 Coverage error: this occurs when a unit in the sample is incorrectly excluded or included, or is duplicated in the
sample
 Non-response error: this refers to the failure to obtain a response from some unit because of absence, non-contact,
refusal, or some other reason.
 Response error: this refers to a type of error caused by respondents intentionally or accidentally providing inaccurate
responses.
 Interviewer error: this occurs when interviewers incorrectly record information; are not neutral or objective
 Processing error: this refers to errors that occur in the process of data collection, data entry, coding, editing and
output.
1.6. SAMPLING DISTRIBUTIONS
1.6.1. Definitions
 A sampling distribution is a probability distribution for the possible values of a sample statistic, such as a sample
mean, sample proportion.

2
1.6.2. Sampling Distribution of the Mean

 The sampling distribution of the mean is the probability distributions of the means, X of all simple random samples
of a given sample size n that can be drawn from the population.
Properties of the Sampling Distribution of Means

1. The mean of the sampling distribution of the means is equal to the population mean. µ =
μX = X .
2. The standard deviation of the sampling distribution of the means (standard error) is equal to the population standard
δ
deviation divided by the square root of the sample size: x = δ/√n.
 This hold true if and only of n<0.05N and N is very large.

 If N is finite and n≥ 0.05N,


δx =
δ


N −n
√ n N−1 . The expression √ N −n
N −1 is called finite population correction
factor/finite population multiplier.
3. The sampling distribution of means is approximately normal for sufficiently large sample sizes (n≥ 30).
Example:
Hourly earnings of the production employees of xyz industries
Employee Joe Sam Sue Bob Jan Art Ted
Hourly 7 7 8 8 7 8 9
earning

a. What is the population mean?


b. What is the sampling distribution of the sample mean for sample size 2?
c. What is the mean of the sampling distribution?
d. What observation can be made about the population and the sampling distribution?
Solution:
a. The population mean is 7.71,found by

μ= ∑ x = 7+7+7+ 8+8+7+ 8+9 = 7.71


N 7
b. To arrive at the sampling distribution of the sample mean, we need to select all possible samples of 2 without
replacement from the population, then compute the mean of each sample. There are 21 possible samples, found by
using NCn= 7c2=21.
The 21 sample means from all possible samples of 2 that can be drawn from the population are shown below.
These 21 samples are used to construct a probability distribution.

This is the sampling distribution of the sample mean.


3
Sample employee earning sum mean Sample employee earning sum mean
1 Joe,sam 7,7 14 7 12 Sue,bob 8,8 16 8
2 Joe,sue 7,8 15 7.5 13 Sue,jan 8,7 15 7.5
3 Joe,bob 7,8 15 7.5 14 Sue,art 8,8 16 8
4 Joe,jan 7,7 14 7 15 Sue,ted 8,9 17 8.5
5 Joe,art 7,8 15 7.5 16 Bob,jan 8,7 15 7.5
6 Joe,ted 7,9 16 8 17 Bob,art 8,8 16 8
7 Sam,sue 7,8 15 7.5 18 Bob,ted 8,9 17 8.5
8 Sam,bob 7,8 15 7.5 19 Jan,art 7,8 15 7.5
9 Sam,jan 7,7 14 7 20 Jan,ted 7,9 16 8
10 Sam,art 7,8 15 7.5 21 Art,ted 8,9 17 8.5
11 Sam,ted 7,9 16 8
Therefore, Sampling distribution of sample mean is
Sample mean Number of means Probability
7 3 0.1429
7.5 9 0.4285
8 6 0.2857
8 3 0.1429
Total 21 1.0000

c. The mean of sampling distribution of the sample mean is obtained by summing the various sample means and
dividing the sum by the number of samples.

μx¯ =
∑ of all sample means =
7+7.5+…+ 8.5
= 7.71
total number of samples 21
d. The shape of the sampling distribution of the sample mean and the shape of the frequency distribution of the
population values are different. The distribution of the sample mean tends to be more bell-shaped and to approximate
the normal probability distribution.
Central Limit Theorem and the Sampling Distribution of the Mean
The Central Limit Theorem (CLT) states that:
1. If the population is normally distributed, the distribution of sample means is normal regardless of the sample size.
2. If the population from which samples are taken is not normal, the distribution of sample means will be
approximately normal if the sample size (n) is sufficiently large (n ≥ 30). The larger the sample size is used, the
closer the sampling distribution is to the normal curve.
The significance of the Central Limit Theorem is that it permits us to use sample statistics to make inference about
population parameters without knowing anything about the shape of the frequency distribution of that population other
than what we can get from the sample.
Example:
1. The distribution of annual earnings of all bank tellers with five years of experience is skewed negatively. This
distribution has a mean of Birr 15,000 and a standard deviation of Birr 2000. If we draw a random sample of 30 tellers,
what is the probability that their earnings will average more than Birr 15,750 annually?
Solution:
4
Steps:
δ
1. Calculate µ and x Z 15 ,750 =
15 ,750−15 , 000
= +2.05
µ = Birr 15,000 365
δ x = δ/√n= 2000/√30 = Birr 365.15
3. Find the area covered by the interval
X
P ( X > 15,750) = P (Z > +2.05)
2. Calculate Z for
X −X X −μ = 0.5 - P (0 to +2.05)
ZX = =
δX δX = 0.5 – 0.47892
= 0.02018
4. Interpret the results
There is a 2.02% chance that the average earning being more than Birr 15, 750 annually in a group of 30 tellers.
2. Suppose that during any hour in a large department store, the average number of shoppers is 448, with a standard
deviation of 21 shoppers. What is the probability of randomly selecting 49 different shopping hours, counting the
shoppers, and having the sample mean fall between 441 and 446 shoppers, inclusive?
Solution:
δ
1. Calculate µ and x
µ = 448 shoppers
δ x = δ/√n= 21/√49 = 3
2. Calculate Z for X
X −X X −μ
ZX = =
δX δX
441−448 446−448
Z 441 = = −2 .33 Z 446 = = −0 .67
3 3
3. Find the area covered by the interval
P (441 ≤ X ≤ 15,750) = P (-2.33 ≤ Z≤ -0.67)
= P (0 to -2.33) - P (0 to - 0.67)
= 0.49010 – 0.24857
= 0.24153

4. Interpret the results


There is a 24.153% chance of randomly selecting 49 hourly periods for which the sample mean falls between 441 and
446 shoppers.

1.6.3. Sampling Distribution of Proportions ( P )


Some times in statistics it is important to know the proportion of a certain characteristic in a population.
X
P=
n Where P = sample proportions
X = number of items in a sample that possess the characteristic
n = number of items in the sample
5
Like other probability distribution, sampling distribution of the proportion is described by two parameters: the mean of the
δ
sample proportions, E ( P ) and the standard deviation of the proportions, P which is called the standard error of the
proportion.

Properties of Sampling distribution of P


1. As the sampling distribution of the mean does, the population proportion, P, is always equal to the mean of the sample

proportion, i.e., P = E ( P ).

2. The standard error of the proportion is equal to:


δ P=
√ Pq
n , where P= population proportion
The CLT states that normal distribution approximates the shape of the distribution of sample proportions if np and nq are
greater than 5. Consequently we solve problems involving sample proportions by using a normal distribution whose mean
and standard deviation are:

μP = P , δ P =
√ Pq
n
and Z P =
P−P
δP

NB: The sampling distribution of p can be approximated by a normal distribution whenever the sample size is large i.e.,
np and nq>5.
Example:
1. Suppose that 60% of the electrical contractors in a region use a particular brand of wire. What is the probability of
taking a random sample of size 120 from these electrical contractors and finding that 0.5 or less use that brand of
wire?
Solution:

n = 120 P = 0.6 q = 0.4 P( p < 0.5) =?


Steps:
1. Check that np and nq> 5
120*0.6 = 120, and 120*0.4 = 48. Both are greater than 5.
δ
2. Calculate P

δ P=
√ Pq
n =
=
120 √
0.6∗0 .4
=0 .0477

3. Calculate Z for p
P −P
Zp =
δp
0.5−0. 6
Z 0 .5 = = −2 .24
0 . 0477
4. Find the area covered by the interval
6
P( p < 0.5) = P (Z < -2.24)
= 0.5 - P (0 to -2.24)
= 0.5 – 0.48745
= 0.01255
5. Interpret the results
The probability of finding 50% or less of the contractors to use this particular brand of wire is very low (1.255%) if we
take a random sample of 120 contractors.
2. If 10% of a population of parts is defective, what is the probability of randomly selecting 80 parts and finding that 12
or more are defective?
Solution:

n = 80
P = 0.1
√0 .10∗0 . 90
δ P = 80 =0 .0335

X = 12
p = X/n = 12/80 = 0.15

P( p > 0.15) =?
0 .15−0 .1
Z 0 . 15 = = +1. 49
0 . 0335
P( p > 0.15) = P (Z > + 1.49)
= 0.5 – P(0 to + 1.49)
= 0.5 – P (0 to + 1.49)
= 0.5 – 0.43189 = 0.06811
About 6.81% of the time, twelve or more defective parts would appear in a random sample of eighty parts when the
population proportion is 0.10.

You might also like