You are on page 1of 13

Lecture 10.

Chapter 10
Estimation: Describing a single population

10.2 & 10.3 Estimating the population mean


when the population variance is known and
unknown
10.4 Estimating the population proportion
10.5 Determining the required sample size

Introduction: Where We Have Been…


Chapters 7 and 8: Binomial and normal distributions allow
us to make probability statements about X (a random
variable / index measuring a characteristic of individuals
of a population), i.e. to calculate P(a  X  b) based on
the population parameters:
n and p for the binomial, µ and σ for the normal
distribution,

Chapter 9: Sampling distributions allow us to make


probability statements about statistics 𝑋 and 𝑝, i.e. to
calculate P(a  𝑿 b) or P(a  𝒑 b) based on the
population parameters such as µ and σ, and p.

1
Introduction: Where We Are Going…
However, in almost all realistic situations
parameters are unknown.

In this chapter, we will use the sampling distributions


to draw statistical inferences about the unknown
population parameters.

Statistical inference is the process by which we


acquire information about population
parameters from samples.

There are two procedures for making


inferences: Estimation and Hypothesis Testing.

In this chapter we learn about Estimation: We will


develop techniques to estimate two population
parameters: the population mean  (for
numerical data) and the population proportion p
(for nominal data).

Examples
 A bank conducts a survey to estimate the number
of times customers will actually use ATM machines.
 A random sample of processing times is taken to
estimate the mean production time.
 A survey of eligible voters is conducted to gauge
support for the federal government’s new industrial
relations reforms.
4

2
10.2 Estimating the Population
Mean  when the Variance 2 is Known

• To estimate  we take 𝑋 as the point estimator (which


is an unbiased, consistent and relatively efficient
estimator: optional reading, pages 370-371)
• As an interval estimator for 
with confidence level 1 -, we can consider interval:
σ σ
(X  z α 2 , X  zα 2 )
n n
since
σ σ
P(X  z α 2  μ  X  zα 2 )  1 α
n n
(Explanation: Please refer to the next three slides)
5

Interval estimator for 


Example: for  = 0.05
𝑋−𝜇
The symmetry of the normal distribution 𝑧 = with
𝜎/ 𝑛
the sample distribution of the mean leads to
𝑃 𝑍 < −𝑧0.025 + P 𝑍 > 𝑧0.025 = 0.05 or
𝑃 −𝑧0.025 ≤ 𝑍 ≤ 𝑧0.025 = 0.95 or
X 
P ( 1.96  Z  1.96)  0.95, or P ( 1.96   1.96)  0.95
 n
- Z.025 Z.025
 
This can be written as P ( 1.96  X    1.96 )  0.95
n n
 
which becomes P ( X  1.96    X  1.96 )  0.95
n n

3
Standard normal distribution Z

0.025 0.025

Normal distribution of 𝑋
-1.96 0 1.96

0.025 0.025

  
  1.96   1.96
n n
7

Estimating the  when  is known


In general, P (   z
 
2  X    z 2 ) 1
n n
or  
P ( X  z 2    X  z 2 ) 1
n n
Confidence level 1 -  measures how frequently the (1 - ) confidence

interval ( X  z / 2 , X  z / 2  ) will actually include µ.
n n

Normal distribution of 𝑿 Confidence level


/2 1– /2

Lower confidence limit    Upper confidence limit


  z 2   z 2
n n 8

4
Four commonly used confidence levels (1 – )

Confidence level  /2 z/2 Width of the


(1-) confidence interval

0.90 0.1 0.05 1.645 21.645/ 𝑛


0.95 0.05 0.025 1.96 21.96/ 𝑛
0.98 0.02 0.01 2.327 22.327/ 𝑛
0.99 0.01 0.005 2.575 22.575/ 𝑛

Note: Increasing of confidence level (1 - )


implies also increasing of width of the
confidence interval getting larger and larger.
9

Example 10.1, page 375: Find 95% confidence


interval estimate of the average number of hours
Australian children spend watching television based
on a sample of size 100, given that  = 8.0.
Solution: The confidence interval is

 8.0
X  z 2  X  1.96  X  1.568
n 100

Remark: We can use lower case letter 𝑥 instead upper


case letter 𝑋 to write the confidence interval
where 𝑥 is a value of sample mean 𝑋 of size n = 100.

10

5
Interpretation: This notation means that, if we
repeatedly many times draw samples of size 100 from
this population (number of hours children spend watching
television), 95% of the values of 𝑥 will be such that µ (the
average number of hours) would lie somewhere between

x  1.568 and x  1.568


(1.568 is the half width of the confidence interval)

For the sample of size 100 in this example: 𝑥 = 27.191.


Hence, the 95% confidence interval is 27.191  1.568 =
(25.623, 28.759) which includes  with probability 95%.

11

Steps in estimating a population mean when


the population variance is known (page 378)

1. Determine the sample mean 𝒙 : In example 10.1,


𝑥 = 27.191.
2. Determine the desired confidence level 1 - 
which in turn specifies . From Table 3 in
Appendix B, find z/2 : In example 10.1, 1 -  =
95%, hence  = 0.05 and z/2 = z0.025 = 1.96.
𝝈 𝝈
3. Calculate LCL= 𝒙 − 𝒛𝜶/𝟐 𝒏
and UCL= 𝒙 + 𝒛𝜶/𝟐 𝒏
: In
8.0
example 10.1, LCL = 27.191 − 1.96 100
= 25.623 and
8.0
UCL = 27.191 + 1.96 100
= 28.759.
We can use Microsoft Excel > Data> Data Analysis >Descriptive
Statistics or CONFIDENCE.NORM(0.05,8.0,100) to find out the
half width of the confidence interval 1.568.
12

6
10.3 Estimating the Population Mean
 when the Variance 2 is Unknown
• Recall that when  is known, the Z- statistic
is normally distributed X 
Z 
• When  is unknown, we use its point  n
estimator s and the Z-statistic is replaced
then by the T-statistic X 
T 
which is Student t distributed s n
The Student t distribution is mound- d.f. = n2
shaped and symmetrical around
n2 > n 1
zero. The ‘degrees of freedom’
determines how spread the d.f. = n1
distribution is (compared to
normal distribution).
13

Probability Calculations for the


t Distribution
• The form of the probabilities that appear in Table 4,
Appendix B (also see Table 10.3 page 389), are:
P(T > tA, d.f.) = A

• For a given degree of freedom, and for a


predetermined right-hand tail probability A, the
entry in the table is the corresponding tA.

• These values are used in computing interval


estimates and performing hypothesis tests when  is
unknown.

14

7
Table 10.3, page 389
t 0.05, 20 =1.725 A = 0.05
P(T>1.725)=.05

tA
Degrees of Freedom t.100 t.05 t.025 t.01 t.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.92 4.303 6.965 9.925
. . . . . .
. . . . . .
20 1.325 1.725 2.086 2.528 2.845
. . . . . .
. . . . . .
200 1.286 1.653 1.972 2.345 2.601
 1.282 1.645 1.96 2.326 2.576

We can use excel functions to find:


t0.05,20 = T.INT.2T(0.1,20)=1.725.
15

Estimating  with 2 Unknown


• (1 - ) confidence interval estimator of  with 2
unknown:
s
x  t  2,n1 , d.f.  n  1
n
Example 10.2, page 393
 The general manager wants to know the distance
customers travel by taxi on an average trip.
 A random sample of 41 customers was produced
which has sample mean = 7.7 km and sample
standard deviation = 2.93 km.
 Construct a 95% confidence interval for the mean
distance travelled by customers.

16

8
Steps in estimating a population mean when
the population variance is unknown
1. Determine the sample mean 𝒙 and sample
standard deviation s: In example 10.2, 𝑥 = 7.7 and
s = 2.93
2. Determine . From Table 4 in Appendix B, find
t/2, n-1 : In example 10.2, 1 -  = 95%, hence  =
0.05 and t/2, n-1 = t0.025, 40 = 2.021.
𝒔 𝝈
3. Calculate LCL= 𝒙 − 𝒕𝜶 , 𝒏−𝟏 and UCL= 𝒙 + 𝒕𝜶 , 𝒏−𝟏 :
𝟐 𝒏 𝟐 𝒏
2.93
In example 10.2, LCL = 7.7 − 2.201 41
= 6.77 and
2.93
UCL = 7.7 + 2.201 = 8.62.
41
We can use Microsoft Excel function: CONFIDENCE.t() to
find the half width of the confidence interval = 0.924822.

017

10.4 Estimating the Population Proportion


• When the population consists of nominal data (data
fall in different categories, say A, B, C …), we count
the number of occurrences of each value, say A, and
calculate the proportion A occurs. If we are interested
in occurrences of A (success), all other categories B,
C … may considered as 𝐴 (failure).

• If p - the population proportion of occurrences of A is


not known, we can provide the confidence interval to
estimate p based on statistic 𝑝 = X/n, where X is the
number of successes (the number of times say A
occurs) in the sample of size n.

18

9
• 𝒑 is an unbiased and consistent estimator for p.
• Moreover, if np  5 and n(1–p)  5, the binomial variable X is
approximately normally distributed, with  = p and 2= p(1–
p)/n.
• Therefore, variable 𝑝 = X/n is approximately normally
distributed and has mean and standard deviation as follows:
𝒑−𝒑
𝜇𝑝 = p and 𝜎𝑝 = 𝑝(1 − 𝑝)/𝑛 ≈ 𝑝 1 − 𝑝 /𝑛 . Hence,
𝒑(𝟏−𝒑)/𝒏
can be considered as a Z- statistic which has the
standard normal distribution.
𝑝−𝑝
• So 1-  = 𝑃 −𝑧𝛼/2 ≤ 𝑍 ≤ 𝑧𝛼/2 = 𝑃 −𝑧𝛼/2 ≤ ≤ 𝑧𝛼/2
𝑝(1−𝑝)/𝑛

𝑝 1−𝑝
= 𝑃 𝑝 − 𝑧𝛼 ≤ 𝑝 ≤ 𝑝 + 𝑧𝛼/2 𝑝(1 − 𝑝) 𝑛
2 𝑛

𝒑 𝟏−𝒑
or 𝑷 𝒑 − 𝒛𝜶 ≤ 𝒑 ≤ 𝒑 + 𝒛𝜶/𝟐 𝒑(𝟏 − 𝒑) 𝒏 =𝟏 − 𝜶.
𝟐 𝒏 19

Estimating the population proportion p


• (1 - ) confidence interval estimator of p

𝒑(𝟏 − 𝒑)
𝒑 ±𝒛𝜶/𝟐
𝒏
Example 10.4, page 402
 In late December 2010 and January 2011, Queensland
suffered the worst floods in recent history.
 The survey, which was conducted among 1500 medium
or large businesses with more than 50 employees
nationwide, found that 10% of businesses had
experienced some disruption or closed.
 Estimate with 99% confidence the proportion of all
nation-wide businesses that experienced disruption or
closure due to the floods.
20

10
Steps in estimating a population proportion
1. Determine the sample proportion 𝒑 : In example
10.4, 𝑝 = 0.10
2. From Table 3 in Appendix B, find z/2 : In example
10.4, 1 -  = 99%, hence  = 0.01 and z/2= z0.005 =
2.575.
3. Calculate LCL= 𝒑 − 𝒛𝜶 𝒑(𝟏 − 𝒑)/𝒏 and UCL=
𝟐
𝒑 + 𝒛𝜶 𝒑(𝟏 − 𝒑)/𝒏 : In example 10.4, LCL = 0.10 − 2.575 ×
𝟐
0.007746 = 0.10 - 0.2 = 0.08 and UCL = 0.10 + 2.575 ×
0.007746 = 0.10+ 0.2= 0.12.
We can use Microsoft Excel function:
CONFIDENCE.NORM(0.01, 0.007746, 1) to find the half
width of the confidence interval = 0.019952  0.2.

21

10.5 Selecting the Sample Size


Determining the sample size for estimating a
population mean
Slides #8 and #9 show a formula to calculate
the (1- ) confidence level of a population mean:

X  z 2
n
An interval estimate can be considered good if:
i) The confidence level 1 -  say 95%, 98%, 99% is
usually fixed at a high level ( is 5%, 2%, 1%) and
ii) If we wish to estimate the population mean to
within , then the half width B of the confidence
interval should not exceed a small positive number :
𝝈
𝑩 = 𝒛𝜶/𝟐 ≤𝜺
𝒏
22

11
𝝈
• 𝑩 = 𝒛𝜶/𝟐 ≤ 𝜺 ⇔ the required sample size to
𝒏
𝒛𝜶/𝟐 𝝈 𝟐
estimate the mean is 𝒏 ≥
𝜺
Example 10.5 revised, page 407
• To estimate the average amount of time workers take
to assemble an electronic component.
• The shortest time is about 10 minutes, the longest time
is 22 minutes.
• How large a sample of workers should be taken to
estimate the mean assembly time to within 20 seconds?
Assume that the confidence level is to be 99%.
Solution
• Step 1: 1-  =99%   =1%  z/2 =z0.005= 2.575
• Step 2: range/4=(22-10)/4=3 minutes=180 seconds
2
𝑧𝛼/2 𝜎 2 2.575×180
• Step 3:  = 20 seconds  𝑛 ≥ 𝜀
= 20
=
537.08  538.
23

Selecting the Sample Size


Determining the sample size for estimating a
population proportion
Slides #18 and #19 show a formula to calculate the
(1- ) confidence level of a population proportion
𝑝(1−𝑝)
𝑝 ±𝑧𝛼/2 𝑛
.
If we wish to estimate the population proportion to
within , then the half width of the confidence interval
𝒑(𝟏−𝒑)
𝑩 = 𝒛𝜶/𝟐 ≤ 𝜺 ⇔ the required sample size to
𝒏
𝒛𝜶/𝟐 𝟐
estimate the mean is 𝒏 ≥ 𝜺
𝒑 𝟏−𝒑 .

Note: If 𝒑 is not known, we may take 𝒑 = 0.5.

24

12
Example 10.6, page 411
• To estimate the proportion of shoppers who will buy a
new type of liquid detergent.
• How large a sample of shoppers should be taken in
order to estimate the proportion to within 0.04, with
90% confidence?
Solution
• Step 1: 1-  =90%   =10%  z/2 =z0.05= 1.645
• Step 2: Since 𝑝 is not known we may take 𝑝 = 0.5.
𝑧𝛼/2 2 1.645 2
• Step 3:  =0.04 𝑛 ≥ 𝜀
𝑝(1 − 𝑝)= 0.04
× 0.52 = 423

Note: If 𝑝 has been approximately estimated as 0.3 from a


preliminary survey, then the required sample size
𝑧𝛼/2 2 1.645 2
𝑛≥ 𝜀
𝑝(1 − 𝑝)= 0.04
× 0.3 × 0.7 = 355.1638 ≈ 356.

25

Summary: page 416

Home assignment:

- Section 10.2 Exercises pages 383-387: 10.4,


10.26, 10.32

- Section 10.3 Exercises pages 396-398: 10.44,


10.54, 10.56

- Section 10.4 Exercises pages 404-405: 10.60,


10.66, 10.69

- Section 10.5 Exercises page 411: 10.75, 10.76

26

13

You might also like