You are on page 1of 7

Chapter 7: Sampling Distribution

Population Probability Distribution


- The population probability distribution is the probability distribution of the population data.
- Example 1: Suppose there are only five students in an advanced statistics class and the midterm scores of these
five students are
o 70, 78, 80, 80, 95
o Let x denote the score of a
student.
o Population mean = 403/5 = 80.6

Sampling Distribution
- The probability distribution of the mean is called its sampling distribution.
- It lists the various values that the mean can assume and the probability of each value of the mean
- In general, the probability distribution of a sample statistic is called its sampling distribution.
- Example 1: Reconsider the population of midterm scores of five students: 70 78 80 80 95
o Consider all possible samples of three scores each that can be selected, without replacement, from that
5! 5 ×4 ×3 ×2 ×1
5 C3 = = =10
population. The total number of possible samples is 3 ! ( 5 - 3 ) ! 3 ×2 ×1 ×2 ×1
- Example 2: Suppose we assign the letters A, B, C, D, and E to the scores of the five students so that A = 70, B =
78, C = 80, D = 80, E = 95
o Then, the 10 possible samples of three scores each are ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE,
BDE, CDE

Remember: Population mean = 403/5


= 80.6
Remember: Mean of sample means =
806/10 = 80.6

Sampling Error and Non-sampling Errors


- Sampling error is the difference between the value of a sample statistic and the value of the corresponding
population parameter. In the case of the mean,
- assuming that the sample is random, and no non-sampling error has been made.
- The errors that occur in the collection, recording, and tabulation of data are
called nonsampling errors.
- Reasons for the occurrence of non-sampling errors:
o 1. If a sample is nonrandom (and, hence, most likely non-representative), the sample results may be too
different from the census results.
o 2. The questions may be phrased in such a way that they are not fully understood by the members of the
sample or population.
o 3. The respondents may intentionally give false information in response to some sensitive questions.
o 4. The poll taker may make a mistake and enter a wrong number in the records or make an error while
entering the data on a computer.
- Example 1: Reconsider the population of five scores: 70 78 80 80 95
o Suppose one sample of three scores is selected from this population, and this sample includes the scores
70, 80, and 95. Find the sampling error.
70 +78 +80 +80 +95
µ =
5
= 8 0 .6 0 That is, the mean score estimated from the sample is 1.07
70 +80 +95 higher than the mean score of the population.
x = = 8 1 .6 7
3 Note that this difference occurred due to chance—that is,
S a m p lin g e rro r = x - µ = 8 1 .6 7 - 8 0 .6 0 = 1 .0 7 because we used a sample instead of the population.
Chapter 7: Sampling Distribution
- Example 2: Now suppose, when we select the sample of three scores, we mistakenly record the second score as
70 + 82 + 95
x = = 8 2 .3 3
82 instead of 80. As a result, we calculate the sample mean as 3

o The difference between this sample mean and the population mean is x - = 8 2 .3 3 - 8 0 .6 0 = 1 .7 3
 This difference does not represent the sampling error. Only 1.07 of this difference is due to the
sampling error.
N o n s a m p l i n g e r r o r = In c o r r e c t x - C o r r e c t x
= 8 2 .3 3 - 8 1 .6 7 = .6 6

Mean and Standard Deviation of x


- The mean and standard deviation of the sampling distribution of x are called the mean and standard deviation
of x

- and are denoted by x and


s x , respectively.

- - Example 1: The mean wage for all 5000


employees who work at a large company is $27.50 and the standard deviation is $3.70. Let x be the mean
wage per hour for a random sample of certain employees selected from this company. Find the mean and standard
deviation of x for a sample size of (a) 30 (b) 75 (c) 200
x = = $ 2 7 .5 0
3 .7 0
x = = = $ .6 7 6
o (a) N = 5000, μ = $27.50, σ = $3.70. In this case, n/N = 30/5000 = .006 < .05. n 30

µ x = = $ 2 7 .5 0
µ
s 3 .7 0
s x = = = $ .4 2 7
o (b) N = 5000, μ = $27.50, σ = $3.70. In this case, n/N = 75/5000 = .015 < .05. n 75
Chapter 7: Sampling Distribution
x = = $ 2 7 .5 0
3 .7 0
x = = = $ .2 6 2
o (c) In this case, n = 200 and n/N = 200/5000 = .04, which is less than.05. n 200

Shape of the Sampling Distribution of x


- The shape of the sampling distribution of x relates to the following two cases:
- The population from which samples are drawn has a normal distribution.
- The population from which samples are drawn does not have a normal distribution.

Sampling from a Normally Distributed Population


- If the population from which the samples are drawn is normally distributed with mean μ and standard deviation
σ, then the sampling distribution of the sample mean, x , will also be normally distributed with the following
x
= and x
=
mean and standard deviation, regardless of the sample size: n

- Example 1: According to the 2015 Physician Compensation Report by Medscape (a subsidiary of WebMD),
American internal medicine physicians earned an average of $196,000 in 2014. Suppose that the 2014 earnings of
all-American internal medicine physicians are approximately normally distributed with a mean of $196,000 and a
standard deviation of $20,000. Let x be the mean 2014 earnings of a random sample of American internal
medicine physicians. Calculate the mean and standard deviation of x and describe the shape of its sampling
distribution when the sample size is (a) 16 (b) 50 (c) 1000

Sampling from a Population that is not Normally Distributed


- According to the central limit theorem, for a large sample size, the sampling distribution of x is approximately
normal, irrespective of the shape of the population distribution.
an d
- The mean and standard deviation of the sampling distribution of x are
x
= x
=
n
- The sample size is usually considered to be large if n ≥ 30.
Chapter 7: Sampling Distribution
- Example 1: The mean rent paid by all tenants in a small city is $1550 with a standard deviation of $225. However,
the population distribution of rents for all tenants in this city is skewed to the right. Calculate the mean and
standard deviation of and describe the shape of its sampling distribution when the sample size is (a) 30 (b) 100

Applications of the Sampling Distribution of x


- (1) If we take all possible samples of the same (large) size from a population and calculate the mean for each of
these samples, then about 68.26% of the sample means will be within one standard deviation of the population
mean.
P ( - 1 x £x £ +1 x )

- (2) If we take all possible samples of the same (large) size from a population and calculate the mean for each of
these samples, then about 95.44% of the sample means will be within two standard deviations of the population
mean.
P ( - 2 x £ x £ +2 x )

- (3) If we take all possible samples of the same (large) size from a population and calculate the mean for each of
these samples, then about 99.74% of the sample means will be within three standard deviations of the population
mean.
P ( - 3 x £x £ + 3 x )

- Example 1: Assume that the weights of all packages of a certain brand of cookies are normally distributed with a
mean of 32 ounces and a standard deviation of .3 ounce. Find the probability that the mean weight, x , of a
random sample of 20 packages of this brand of cookies will be between 31.8 and 31.9 ounces.
µx = =32 ounces
µ
P¿ s x =
s
=
.3
= .0 6 7 0 8 2 0 4 o u n c e
- n 20

Z Value for a Value of x


x = =32 ounces
x - .3
z = x = = = .0 6 7 0 8 2 0 4 o u n c e
- The z value for a value of x is calculated as x n 20
Chapter 7: Sampling Distribution

- Example 1: According to Moebs Services Inc., an individual checking account at major U.S. banks costs the
banks between $350 and $450 per year. Suppose that the current average cost of all checking accounts at major
U.S. banks is $400 per year with a standard deviation of $30. Let x be the current average annual cost of a
random sample of 225 individual checking account at major banks in America.
o (a) What is the probability that the average annual cost of the checking accounts in this sample is within
$4 of the population mean? P( $ 396 ≤ x́ ≤ $ 404)
 μ = $400 and σ = $30. The shape of the probability distribution of the population is unknown.
However, the sampling distribution of x is approximately normal
because the sample is large (n > 30).

EXAMPLE 7-6: SO LUTIO N EXAMPLE 7-6


� − � 396 − 400 (b)What is the p robability that the average annual cost of the
(a) For �̅ = 396; � = = = −2.00
�̅ 2.00 checking accounts in this sam ple is less than the population
m ean by $2.70 or m ore?
P( � ≤ $397.50)
For �̅ = 404; � = = = 2.00 � − � 397.30 − 400
.
For �̅ = 397.30; � = = = −1.35
P($396 ≤ x ≤ $404) = P(-2.00 ≤ z ≤ 2.00) = .9772 – .0228 = .9544 �̅ 2.00
P( � ≤ $397.50) = P (z ≤ -1.35) = .0885
Therefore, the probability that the average annual cost of
the 225 checking accou nts in this sam ple is w ithin $4 of the
popu lation m ean is .9544.

40
39

Population and Sample Proportions; and Mean, Standard Deviation, and Shape of the Sampling
Distribution of p̂
X x
p = and p = ˆ
- The population and sample proportions, denoted by p and p̂ respectively, are calculated as N n
- Where:
o N = total number of elements in the population
o n = total number of elements in the sample
o X = number of elements in the population that possess a specific characteristic
o x = number of elements in the sample that possess a specific characteristic
- Example 1: Suppose a total of 789,654 families live in a city and 563,282 of them own homes. A sample of 240
families is selected from this city, and 158 of them own homes. Find the proportion of families who own homes in
X 5 6 3,2 8 2
p = = = .7 1
N 7 8 9 ,6 5 4
x 158
pˆ = = = .6 6
n 240
the population and in the sample.

The Sampling Distribution of the Sample Proportion p̂


- The probability distribution of the sample proportion, p̂ , is called its sampling distribution.
- It gives various values that p̂ can assume and their probabilities.
- Example 1: Boe Consultant Associates has five employees. Table 7.6 gives the names of these five employees
and information concerning their knowledge of statistics. If we define the population proportion, p, as the
proportion of employees who know statistics, then p = 3 / 5 = .60
Chapter 7: Sampling Distribution
o Now, suppose we draw all possible samples of three employees each and compute the proportion of
employees, for each sample, who know statistics.
5! 5 ×4 ×3 ×2 ×1
T o ta l n u m b e r o f s a m p le s = 5C 3 = = =10
3 ! ( 5 - 3 ) ! 3 ×2 ×1 ×2 ×1
Mean and Standard Deviation of p̂
- The mean of the sample proportion, p̂ ,is denoted by p̂ and is equal to the population proportion, p. Thu µ pˆ = p
pq
s pˆ =
- The standard deviation of the sample proportion, p̂ , is denoted by p̂ and is given by the formula n
- Where p is the population proportion, q = 1 – p, and n is the sample size. This formula is used when n/N ≤ .05,
where N is the population size.
pq N - n
pˆ =
- If n /N > .05, then p̂ is calculated as: n N - 1
N - n
o where the factor N - 1 is called the finite population correction factor.

Shape of the Sampling Distribution of



- Central Limit Theorem for Sample Proportion
o According to the central limit theorem, the sampling distribution of p̂ is approximately normal for a
sufficiently large sample size.
- In the case of proportion, the sample size is considered to be sufficiently large if np and nq are both greater than 5
– that is, if np > 5 and nq > 5
- Example 1: According to a New York Times/CBS News poll, 55% of adults polled said that owning a home is a
very important part of the American Dream. Assume that this result is true for the current population of American
adults. Let p̂ be the proportion of American adults in a random sample of 2000 who will say that owning a home
is a very important part of the American Dream. Find the mean and standard deviation of p̂ and describe the
shape of its sampling distribution.
o p=.55 , q=1− p=1−.55=.45∧n=2000
o μ x́ = p=.55
pq √( .55 ) .45 ¿
σ x́ =
n
¿
√ 2000
¿=.0111
o np=2000 ( .55 )=1100∧nq=2000 ( .45 )=900

Therefore, the sampling distribution of p̂ is approximately normal


(by the central limit theorem) with a mean of .55 and a standard
deviation of .0111, as shown in Figure.

Applications of the Sampling Distribution of



- When we conduct a study, we usually take only one sample and make all decisions or inferences on the basis of
the results of that one sample.
- We use the concepts of the mean, standard deviation, and shape of the sampling distribution of p̂ to determine the
probability that the value of p̂ computed from one sample falls within a given interval.
- Example 1: In a recent Pew Research Center nationwide telephone survey of American adults, 75% of adults said
that college education has become too expensive for most people. Suppose that this result is true for the current
population of American adults. Let p̂ be the proportion in a random sample of 1400 adult Americans who will
hold the said opinion. Find the probability that 76.5% to 78% of adults in this sample will hold this opinion.
o p=.75 , q=1− p=1−.75=.25∧n=1400
Chapter 7: Sampling Distribution
o μ x́ = p=.75
pq ( .75 ) (.25)
o σ x́ =
n
¿
√ √ 1400
=.01157275
o np=1400 ( .75 )=1050∧nq=1400 ( .25 ) =350
o We can infer from the central limit theorem that the sampling distribution of p̂ is approximately normal.

P(.765< ^p <.78)

Z value for a Value of


- Example 1: Maureen Webster, who is running for mayor in a large city, claims that she is favored by 53% of all
eligible voters of that city. Assume that this claim is true. What is the probability that in a random sample of 400
registered voters taken from this city, less than 49% will favor Maureen Webster?

You might also like