Professional Documents
Culture Documents
Unit V Sampling
Unit V Sampling
4/6/2011
Introduction
y A sample is a portion, piece or segment that is representative of the whole. y A sample is a part or small section selected from the population. y A population or universe may be defined as the aggregate of items possessing a common trait or traits. y Population may be finite or infinite. y The individual units of the population are called items or elements. y The process of selecting a sample from a population is known as sampling.
4/6/2011
Essentials of Sampling
y A sample should possess the following essentials for valid conclusions of the experimental results.
A sample should have similar characteristics of the original population from which it has been selected. ii. Selected sample should be homogeneous. iii. More number of items is to be included in the sample to make the results more reliable. In other words, the size of the sample should be sufficiently large.
i.
4/6/2011
4/6/2011
4/6/2011
4/6/2011
4/6/2011
Methods of Sampling
There are two methods of
4/6/2011
4/6/2011
Judgment Sampling
The method is so called as the choice of sample items depends exclusively on the judgment of the investigator. It is a simple method used to obtain a more representative sample. It is widely used in solving every day business problems and making public policy decisions. The drawback of this method is that it is based solely on the judgment of the individual and hence may be biased. The sample may not be representative in character and results may not be accurate.
4/6/2011
Convenience Sampling
y This method involves selecting the sample on convenience and easy accessibility. y This method is quick and cheap. y It may not be representative in character and hence may not yield reliable results.
4/6/2011
Quota Sampling
y It is a type of judgment sampling. y In this, sample quotas are fixed according to any characteristics of the population like income, sex, religion etc. y It involves less time and money. y It may not be representative of the population as it is based on the personal bias of the selector.
4/6/2011
Simple Random Sampling ii. Systematic Sampling iii. Stratified Sampling iv. Cluster Sampling
i.
4/6/2011
4/6/2011
4/6/2011
Systematic Sampling
y In systematic sampling, elements are selected from a population at a uniform interval that is measured in time, order or space. y Systematic sampling differs from simple random sampling in that each element has an equal chance of being selected but each sample does not have an equal chance of being selected. y It is a relatively simple and convenient method of sample selection. y It involves less time and labour. y The main demerit of this method is that it may not represent the whole population.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
4/6/2011
Stratified Sampling
y In this method, we divide the population into relatively homogeneous groups, called strata. y Then we use one of the two approaches- either select at random from each stratum a specified number of elements corresponding to the proportion of that stratum in the population as a whole or draw an equal number of elements from each stratum and give weights to the results according to the stratum s proportion of total population. y Stratified sampling is appropriate when the population is already divided into groups of different sizes. y Stratified sampling, if properly designed accurately reflects the characteristics of the population from which they were chosen as compared to other sampling methods.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
Cluster Sampling
y In this method, the population is divided into some recognizable subgroups which are called clusters. y A random sample of these clusters is drawn and all the units belonging to the selected clusters constitute the sample. y In this method, the clusters should be of small size and the number of units in each cluster must be more or less the same. y The method offers flexibility which is lacking in other methods. y It is less time consuming and less expensive. y The method is less accurate than any other method of selecting a sample.
4/6/2011
4/6/2011
Sampling Distribution
y The mean and standard deviation computed from a sample need not be the same as the mean and standard deviation computed from another sample. y A probability distribution of all the means of all samples is a distribution of the sample means. This is called a sampling distribution of the mean. y Similarly a probability distribution of all the medians (modes or proportions) of all samples is a sampling distribution of the median( or mode or proportion). y Such a sampling distribution can be described by its mean and standard deviation.
4/6/2011
4/6/2011
its variance or its standard deviation. The variability of a sampling distribution depends on three factors:
y N: The number of observations in the population. y n: The number of observations in the sample. y The way that the random sample is chosen. y If the population size is much larger than the sample size, then
the sampling distribution has roughly the same sampling error, whether we sample with or without replacement. On the other hand, if the sample represents a significant fraction (say, 1/10) of the population size, the sampling error will be noticeably smaller, when we sample without replacement.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
4/6/2011
y y y y
central limit theorem states that the sampling distribution of any statistic will be normal or nearly normal, if the sample size is large enough. How large is "large enough"? As a rough rule of thumb, many statisticians say that a sample size of 30 is large enough. If you know something about the shape of the sample distribution, you can refine that rule. The sample size is large enough if any of the following conditions apply. The population distribution is normal. The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less. The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40. The sample size is greater than 40, without outliers.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
4/6/2011
and
* ( 1/n - 1/N )
4/6/2011
is known.
1/N is approximately equal to zero; and the standard deviation formula reduces to: x = / (n).
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
4/6/2011
=P
and = [ PQ ].
where
y Note: When the population size is very large, the factor PQ/N is approximately equal to zero; and the standard deviation formula reduces to: p = ( PQ/n ).
4/6/2011
Concept Problem
y Assume that a school district has 10,000 6th graders.
In this district, the average weight of a 6th grader is 80 pounds, with a standard deviation of 20 pounds. Suppose you draw a random sample of 50 students. What is the probability that the average weight of a sampled student will be less than 75 pounds?
4/6/2011
Solution
y To solve this problem, we need to define the sampling
distribution of the mean. Because our sample size is greater than 40, the Central Limit Theorem tells us that the sampling distribution will be normally distributed.
y To define our normal distribution, we need to know
both the mean of the sampling distribution and the standard deviation. Finding the mean of the sampling distribution is easy, since it is equal to the mean of the population. Thus, the mean of the sampling distribution is equal to 80.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
Solution
y The standard deviation of the sampling distribution
= * ( 1/n - 1/N ) x = 20 * ( 1/50 - 1/10000 ) = 20 * ( 0.0199 ) = 20 * 0.141 = 2.82 normally distributed with a mean of 80 and a standard deviation of 2.82. We want to know the probability that a sample mean is less than or equal to 75 pounds. To solve the problem, we calculate the value of z and from that the probability that the average weight of a sampled student is less than 75 pounds is equal to 0.038.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
4/6/2011
Concept Problem
y Find the probability that of the next 120 births, no more than 40% will be boys. Assume equal probabilities for the births of boys and girls. Assume also that the number of births in the population (N) is very large, essentially infinite.
4/6/2011
Solution
y The Central Limit Theorem tells us that the proportion of boys in 120 births will be normally distributed. y The mean of the sampling distribution will be equal to the mean of the population distribution. In the population, half of the births result in boys; and half, in girls. Therefore, the probability of boy births in the population is 0.50. Thus, the mean proportion in the sampling distribution should also be 0.50. y The standard deviation of the sampling distribution can be computed using the following formula. y p = [ PQ/n - PQ/N ] p = [ (0.5)(0.5)/120 ] = [ 0.25/120 ] = 0.04564
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
Solution
y In the above calculation, the term PQ/N was equal to
proportion is normally distributed with a mean of 0.50 and a standard deviation of 0.04564. We want to know the probability that no more than 40% of the sampled births are boys. To solve the problem, we calculate the value of z and from that the probability that no more than 40% of the sampled births are boys is equal to 0.014.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
Suppose further that we take all possible samples of size n1 and n2. And finally, suppose that the following assumptions are valid. the population. That is, N1 is large relative to n1, and N2 is large relative to n2. (In this context, populations are considered to be large if they are at least 10 times bigger than their sample.) normal distribution to model differences between proportions. The sample sizes will be big enough when the following conditions are met: n1P1 > 10, n1(1 -P1) > 10, n2P2 > 10, and n2(1 - P2) > 10. not affected by observations in population 2, and vice versa.
y The size of each population is large relative to the sample drawn from
y The samples from each population are big enough to justify using a
4/6/2011
proportions is equal to the difference between population proportions. Thus, E(p1 - p2) = P1 - P2.
4/6/2011
random variables is equal to the sum of the individual variances. Thus, 2d = 2P1 - P2 = 21 + 22 y If the populations N1 and N2 are both large relative to n1 and n2, respectively, then 2 = P (1 - P ) / n 2 = P (1 - P ) / n And 2 2 2 2 1 1 1 1 Therefore, y 2d = [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ]
y And
4/6/2011
Concept Problem 1
y In one state, 52% of the voters are Republicans, and 48% are Democrats. In a second state, 47% of the voters are Republicans, and 53% are Democrats. Suppose 100 voters are surveyed from each state. Assume the survey uses simple random sampling. y What is the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state?
4/6/2011
Solution
y For this analysis, let P1 = the proportion of Republican voters in the first state, P2 = the proportion of Republican voters in the second state, p1 = the proportion of Republican voters in the sample from the first state, and p2 = the proportion of Republican voters in the sample from the second state. The number of voters sampled from the first state (n1) = 100, and the number of voters sampled from the second state (n2) = 100. y The solution involves four steps. y Make sure the samples from each population are big enough to model differences with a normal distribution. Because n1P1 = 100 * 0.52 = 52, n1(1 - P1) = 100 * 0.48 = 48, n2P2 = 100 * 0.47 = 47, and n2(1 - P2) = 100 * 0.53 = 53 are each greater than 10, the sample size is large enough. y Find the mean of the difference in sample proportions: E(p1 - p2) = P1 - P2 = 0.52 - 0.47 = 0.05.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
Solution
y Find the standard deviation of the difference. y d = { [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ] } [ (0.52)(0.48) / 100 ] + [ (0.47)(0.53) / 100 ] } d = d = (0.002496 + 0.002491) = (0.004987) = 0.0706 y Find the probability. This problem requires us to find the probability that p1 is less than p2. This is equivalent to finding the probability that p1 - p2 is less than zero. To find this probability, we need to transform the random variable (p1 - p2) into a z-score. That transformation appears below. y zp1 - p2 = (x - p1 - p2) / d = = (0 - 0.05)/0.0706 = -0.7082 y The probability of a z-score being -0.7082 or less is 0.24. y Therefore, the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state is 0.24.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
= ( 12 / n1 + 22 / n2 ) d y The variance of the difference between independent random variables is equal to the sum of the individual variances.
y
4/6/2011
y If the populations N1 and N2 are both large relative to n1 and n2, respectively, then y
2 x1 = 2 1
/ n1
And
x2 =
2 2
/ n2
Therefore, and y d2 = 12 / n1 + 22 / n2 2 2 y d = ( 1 / n1 + 2 / n2 )
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
Concept Problem
y For boys, the average number of absences in the first grade is 15 with a standard deviation of 7; for girls, the average number of absences is 10 with a standard deviation of 6. y In a nationwide survey, suppose 100 boys and 50 girls are sampled. What is the probability that the male sample will have at most three more days of absences than the female sample?
4/6/2011
Solution:
y Find the mean difference (male absences minus female absences) in the population. y d = 1 - 2 = 15 - 10 = 5 y Find the standard deviation of the difference. y d = ( 12 / n1 + 22 / n2 ) = (72/100 + 62/50) = (49/100 + 36/50) = (0.49 + .72) = d (1.21) = 1.1 y Find the z-score that is produced when boys have three more days of absences than girls. When boys have three more days of absences, the number of male absences minus female absences is three. And the associated z-score is y z = (x - )/ = (3 - 5)/1.1 = -2/1.1 = -1.818
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
4/6/2011
Solution
y Find the probability. We find that the probability of a zscore being -1.818 or less is about 0.035. y Therefore, the probability that the difference between samples will be no more than 3 days is 0.035.
4/6/2011
mean m and standard deviation s, then the sampling distribution of the mean (the distribution of all possible means for samples of size N) 1) has a mean equal to the population mean mx 2) has a standard deviation (also called "standard error" or "standard error of the mean") equal to the population standard deviation, sx, divided by the square root of the sample size, N: 3) and the shape of the sampling distribution of the mean approaches normal as N increases.
This last point is especially important: The shape of the sampling distribution approaches normal as the size of the sample increases, whatever the shape of the population distribution.
4/6/2011
likely accuracy of a sample mean, but only if the sampling distribution of the mean is approximately normal.
the mean will be normal for any sample size N (even N = 1). If a population distribution is not normal, but it has a bump in the middle and no extreme scores and no strong skew, then a sample of even modest size (e.g., N = 30) will have a sampling distribution of the mean that is very close to normal. However, if the population distribution is far from normal (e.g., extreme outliers or strong skew), then to produce a sampling distribution of the mean that is close to normal it may be necessary to draw a very large sample (e.g., N= 500 or more).
y Important note: You should not assume that the sampling distribution of the
mean is normal without considering the shape of the population distribution and the size of your sample. A sample with N > 30 does not guarantee a normal sampling distribution if the population distribution is far from normal.
4/6/2011
4/6/2011
4/6/2011
4/6/2011
Sampling Errors
y Sampling errors have their origin in sampling as sample is never a perfect miniature of the population. y Sampling errors are of two types
Biased Errors: These errors arise because of bias in selection. ii. Unbiased Errors: Unbiased errors arise due to chance difference between members of the population included in the sample and members not included in the sample. It is known as random sampling error. With the increase in the size of the sample, unbiased errors tend to decrease in magnitude.
i.
4/6/2011
x = . The sampling distribution has a standard deviation (standard error) equal to the population standard deviation divided by the square root of the sample size x= / n
4/6/2011
4/6/2011
4/6/2011
4/6/2011
= [ / n ][ (N n)/(N -1)] Where N = size of the population & n = size of the sample
x
4/6/2011
Concept Problem
y From a population of 125 items with a mean of 105 and standard deviation of 17, 64 items were chosen.
i. ii.
What is the standard error of mean? What is the probability that the sample mean will be between 107.5 & 109.
4/6/2011
Solution:
N = 125 = 105 = 17 n = 64 x = [ / n ][ (N n)/(N -1)] = 1.4904 To compute probability, find the area of the curve between mean of 107.5 & 109. For x = 107.5 = x - / x = 107.5 105/ 1.4904 = 2.5/1.4904 = 1.68
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS
Solution (Cont d)
This corresponds to area of 0.4535 [From Z table] For x = 109 = 109 105/ 1.4904 = 2.683 This corresponds to area of 0.4963 [From Z table] So the probability that the mean will lie between the 2 values = 0.4963 0.4535 = 0.0428 4.28 %
4/6/2011