Professional Documents
Culture Documents
X i
X i 1
n
but if the observed values of X are 1, 2, 3, and 6,
the estimate is 3.
So the estimator is a formula; the estimate is a
number.
Properties of a Good Estimator
1. Unbiasedness
2. Efficiency
3. Sufficiency
4. Consistency
Unbiasedness
ˆ
An estimator (“theta hat”) is unbiased if its
expected value equals the value of the parameter
(theta) being estimated. That is,
E (ˆ)
In other words, on average the estimator is right
on target.
Examples
Since E(X) , X is an unbiased estimator of .
(X - X ) 2
Recall that s 2 i 1
.
n 1
If we divided by n instead of by n-1, we would not have
an unbiased estimator of 2. That is why s2 is defined the
way it is.
Bias
ˆ
bias E( )
The bias of an unbiased estimator is
zero.
Mean Squared Error (MSE)
ˆ
MSE E[( ) ]
2
Example: Sample Mean
_
We know that the mean of X is .
So its bias not only goes to zero as n
approaches infinity, its bias is always zero.
The variance of the sample mean is 2/n.
As n approaches infinity, that variance
approaches zero.
So, since both the bias and the variance go
to zero, as n approaches infinity, the
sample mean is a consistent estimator.
A great estimator: the sample mean X
-1.96 0 1.96 Z
We know that Pr(Z < 1.96) = 0.9750
Then Pr(-1.96 < Z < 1.96) = 0.95
X-
We also know that is distributed as a standard normal (Z).
n
X-
So there is a 95% probability that - 1.96 1.96
n
X-
Continuing from: with 95% probability, - 1.96 1.96
n
- 1.96 X - 1.96
Multiplying through by , n n
n
Subtracting off X , - X - 1.96 - -X 1.96
n n
Multiplying by -1 and flipping the X 1.96 X - 1.96
n n
inequalities appropriately,
X - 1.96 X 1.96
Flipping the entire expression, n n
So we have a 95% Confidence Interval
for the Population Mean
X - 1.96 X 1.96
n n
Example: Suppose a sample of 25 students at a
university has a sample mean IQ of 127. If the
population standard deviation is 5.4, calculate the
95% confidence interval for the population mean.
X - 1.96 X 1.96
n n
5.4 5.4
127 - 1.96 127 1.96
25 25
X - Z X Z
n n
Notice: In our confidence interval formula, we used “less than”
symbols:
X - Z X Z
n n
Your textbook uses “less than or equal to” symbols:
X - Z X Z
n n
Either of these is acceptable. Recall that the formula is built upon
the concept of the normal probability distribution. The probability
that a continuous variable is exactly equal to any particular number
is zero. So it doesn’t matter whether you include the endpoints of
the interval or not.
Determining Z values for 0.9800
-k 0 k Z
-2.33 2.33
Suppose we want a 98% confidence interval.
We need to find 2 values, call them –k and k, such that
Z is between them 98% of the time.
Then Z will be less than k with probability 0.99.
Look in the body of the Z table for the value closest to
0.99, which is 0.9901 .
The number on the border of the table corresponding
to 0.9901 is 2.33.
So that is your value of k, and the number you use for
Z in your confidence interval.
Sometimes 2 numbers in the Z table are
equally close to the value you want.
For example, suppose you want a 90% confidence
interval. Remember the Z table gives cumulative
values. So to get a value you can look up in the table,
you add the 0.90 from the middle area of your Z graph
plus 0.05 from the left tail for a total of 0.95. So you
look for 0.95 in the body of the Z table.
You find 0.9495 and 0.9505. Both are off by 0.0005.
The number on the border of the table corresponding to
0.9495 is 1.64.
The number corresponding to 0.9505 is 1.65.
Usually in these cases, we use the average of 1.64 and
1.65, which is 1.645.
Similarly for the 99% confidence interval, we usually use
2.575. (Draw your graph & work through the logic of
this number.)
Which interval is wider: One with a higher
confidence level (such as 99%) or one with
a lower confidence level (such as 90%)?
s s
X - t n -1 X t
n n -1
n
Example: From a large class of normally
distributed grades, sample 4 grades: 64, 66, 89,
& 77. Calculate the 95% confidence interval for
the class mean grade .
s s
X - t n -1 X t
n n -1
n
X Dividing by 4, we find
64 our sample mean is 74.
66
89
77
296
X 74
4 grades: 64, 66, 89, & 77
95% confidence interval for
Keep in mind that the
X X X (X X ) 2
sample standard
64 -10 deviation is
66 -8 n
( X X ) 2
89 15 s i 1
n 1
77 3
296 So, next we subtract
our sample mean 74
X 74
from each of our X
values,
4 grades: 64, 66, 89, & 77
95% confidence interval for
X X X (X X ) 2
296 398 s i 1
n 1
X 74
4 grades: 64, 66, 89, & 77
95% confidence interval for
X X X (X X ) 2
296 398 s i 1
n 1
s = 398/3
2
X 74 =132.7
4 grades: 64, 66, 89, & 77
95% confidence interval for
X X X (X X ) 2
296 398 s i 1
n 1
s = 398/3
2
X 74 =132.7
s = 11.5
So we have X 74 and s = 11.5
Since n = 4, dof = n-1 = 3
Since we want 95% confidence,
we want 0.95 as the middle area
of our graph, and .025 in each of 0.025 0.95 0.025
the 2 tails.
0 3.182 t3
We find the 3.182 in our t table.
s s
Our formula is X - t n -1 X t
n n -1
n
s s
X - t n -1 X t
n n -1
n
p(1 p) p (1 p)
pz pz
n n
We have our confidence interval for the
binomial proportion .
p (1 p ) p (1 p )
pz pz
n n
Example: Consider a random sample of 144
families; 48 have 2 or more cars. Compute the
95% confidence interval for the population
proportion of families with 2 or more cars.
p (1 p ) p(1 p )
pz pz
n n
n = 144
48 1
p 0.95
144 3 0.0250 0.0250
-1.96 0 1.96 Z
2
1 p Looking up the cumulative area
3 0.9500 + 0.0250 = 0.9750, we
find our z value is 1.96 .
1 2
We now have n = 144, z = 1.96, p and 1 p
3 3
p (1 p ) p(1 p )
pz pz
n n
1 2 1 2
1 3 3 1 3 3
1.96 1.96
3 144 3 144
estimate, . X1 X2
Recall:
V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)]
V ( X 1 X 2 ) (1) 2 V ( X 1 ) (1) 2 V ( X 2 )
or V (X 1 X 2 ) V (X 1) V (X 2 )
We now have V ( X 1 X 2 ) V ( X 1 ) V ( X 2 )
2
Recall that V ( X ) .
n
Applying subscripts for our samples,
12 22
V (X 1 X 2)
n1 n2
12 22
n1 n2
Apply our basic format
12 22 12 22
(X 1 X 2 ) z 1 2 ( X 1 X 2 ) z
n1 n2 n1 n2
Example: From 2 large classes,
with normally distributed grades, sample
4 grades (64, 66, 89, & 77) & 3 grades (56, 71, &
53). If the population variances for the 2 classes
are both 96, compute the 90% confidence interval
for the difference in means of the class grades.
12 22 12 22
(X 1 X 2) z 1 2 ( X 1 X 2 ) z
n1 n2 n1 n2
We need the 2 sample means & the z value.
0 1.645 Z
Assembling our formula:
12 22 12 22
(X 1 X 2) z 1 2 ( X 1 X 2 ) z
n1 n2 n1 n2
96 96 96 96
(74 60) 1.645 1 2 (74 60) 1.645
4 3 4 3
14 12.31 1 2 14 12.31
1.69 1 2 26.31
Interpreting the results
1.69 1 2 26.31
s12 s 22 s12 s 22
(X 1 X 2 ) t 1 2 ( X 1 X 2 ) t
n1 n2 n1 n2
2
s 2
s
2
1 2
n1 n2
n1 1 n2 1
Let’s do the same example as before,
but without knowing the population variances.
i
( X X ) 2
We calculate the sample means
Recall s 2 i 1
as before.
n 1
Class 1 Class 2
X1 X 1 X 1 ( X 1 X 1 )2 X2 X 2 X 2 (X 2 X 2)
2
64 56
66 71
89 53
77
296 180
296 180
X1 X2
4 3
74 60
n
Then subtract the sample mean
i
( X X ) 2
Class 1 Class 2
X1 X 1 X 1 ( X 1 X 1 )2 X2 X 2 X 2 (X 2 X 2)
2
64 -10 100 56 -4 16
66 -8 64 71 11 121
89 15 225 53 -7 49
77 3 9
296 180
296 180
X1 X2
4 3
74 60
n
i
( X X ) 2
n 1
Class 1 Class 2
X1 X 1 X 1 ( X 1 X 1 )2 X2 X 2 X 2 (X 2 X 2)
2
64 -10 100 56 -4 16
66 -8 64 71 11 121
89 15 225 53 -7 49
77 3 9
296 398 180 186
296 180
X1 X2
4 3
74 60
n
i
( X X ) 2
Dividing by n-1, we have
Recall s 2 i 1
our sample variances.
n 1
Class 1 Class 2
X1 X 1 X 1 ( X 1 X 1 )2 X2 X 2 X 2 (X 2 X 2)
2
64 -10 100 56 -4 16
66 -8 64 71 11 121
89 15 225 53 -7 49
77 3 9
296 398 180 186
296 398 180 186
X1 s12 X2 s2
2
4 3 3 2
74 132.67 60 93.0
What are the dof & t value?
dof the integer part of
2 2
s12 s22 132.67 93.0
n1 n2 4 3 = 4.860
2 2 2 2
s1 s2
2 2
132.67 93.0
n1 n2 4 3
n1 1 n2 1 3 2
14 17.08 1 2 14 17.08
3.08 1 2 31.08
2 2 2 2
(X 1 X 2) z 1 2 ( X 1 X 2 ) z
n1 n2 n1 n2
Factoring out the variance, we have
1 1 1 1
( X 1 X 2 ) z 1 2 ( X 1 X 2 ) z
2
2
n1 n2 n1 n2
Next we replace the variance by a pooled sample variance, based
on information from both samples.
1 1 1 1
( X 1 X 2 ) t s p 1 2 ( X 1 X 2 ) t s p
2
2
n1 n2 n1 n2
The dof for the t value is n1 + n2 – 2 .
The pooled sample variance
(n1 1) s1 (n2 1) s 2
2 2
sp
2
n1 n2 2
1 1 1 1
( X 1 X 2 ) t s p 1 2 ( X 1 X 2 ) t s p
2
2
n1 n2 n1 n2
( n 1) s 2
( n 1) s 2
where s p2 1 1 2 2
n1 n2 2
( n 1) s 2
( n 1) s 2
s p2 1 1 2 2
n1 n2 2
(3)132.67 (2)93.0
432
584
116 .8
5
We have: X 1 74, X 2 60, s 2p 116 .8
0.90
0.05
0 2.015 t5
We have: X1 74, X 2 60, s 2p 116 .8, t 2.015
1 1 1 1
( X 1 X 2 ) t s p 1 2 ( X 1 X 2 ) t s p
2
2
n1 n2 n1 n2
1 1 1 1
(74 60) 2.015 116 .8 1 2 (74 60) 2.015 116 .8
4 3 4 3
14 16.63 1 2 14 16.63
2.63 1 2 30.63
p1 p2
Next we need the standard deviation of our point estimate.
V ( p1 p2 ) V ( p1 ) V ( p2 )
p (1 p )
Recalling that our previous estimate of V ( p ) was n
,
we have p1 (1 p1 ) p2 (1 p2 )
V ( p1 p2 )
n1 n2
p1 (1 p1 ) p2 (1 p2 )
n1 n2
Using our basic format, we find the confidence
interval for the difference in population proportions.
std . dev. or std . dev. or
z estimate of
point Desired point z estimate of
or the std. dev.
estimate parameter estimate or the std. dev.
t of our pt. t of our pt.
estimate estimate
p1 (1 p1 ) p2 (1 p2 ) p (1 p1 ) p2 (1 p2 )
( p1 p2 ) z 1 2 ( p1 p2 ) z 1
n1 n2 n1 n2
Example: Samples from 2 states show proportions
of Democrats 1/3 & 1/5 with sample sizes 100 & 225.
Calculate the 99% confidence interval for the
difference in population proportions.
p1 (1 p1 ) p2 (1 p2 ) p (1 p1 ) p2 (1 p2 )
( p1 p2 ) z 1 2 ( p1 p2 ) z 1
n1 n2 n1 n2
0 2.575 Z
We have:
p1 0.33, 1-p1 0.67, p2 0.20, 1 p2 0.80, n1 100, n 2 225, z 2.575
p1 (1 p1 ) p2 (1 p2 ) p (1 p1 ) p2 (1 p2 )
( p1 p2 ) z 1 2 ( p1 p2 ) z 1
n1 n2 n1 n2
yields
(0.33)(0.67) (0.20)(0.80)
(0.33 0.20) 2.575 1 2
100 225
(0.33)(0.67) (0.20)(0.80)
(0.33 0.20) 2.575
100 225
X
You also know that is distribute d as a Z.
n
X
So, z 0 z 0 with 95% probability.
n
X
With 95% probabilit y, we have z0 z0 .
n
Multiplying by , we have z0 X z0 .
n n n
We see here that the largest value of the difference between X and ,
which we called D, is z 0 .
n
So, D z 0
n
We have now, D z 0 ,
n
and we can solve for the sample size n.
2
of the equation: n
Multiply through by n: nD z 0
2 2 2
2
Divide through by D :
2
n z0 2 2
D
Dropping the subscript on 2
z for convenience, we n z2 2
have the formula: D
So we have a formula for determining the
appropriate sample size n when we want to
estimate the population mean.
2
nz 22
D
Example: Suppose you’re trying to estimate the mean
monthly rent of 2-bedroom apartments in towns of 100,000
people or less. The population standard deviation is 20.
You want to be 95% sure that your estimate is within $3 of
the true mean. How large a sample should you take?
2
You need to sample
nz 2 2
171 observations.
D
It’s not 170, because
2 (20)
2
sample sizes smaller than
(1.96) 170.3 provide you with less
(3) 2
p
So with the desired level of confidence, -z 0 z0 .
(1 )
n
p
Starting from -z 0 z0
(1 )
n
(1 )
Divide through by D :
2 nz 2
0 2
D
Dropping the subscript on (1 )
z for convenience, we nz 2
(1 )
nz 2
D2
Plugging in the maximum value of ¼ for , we
have
1
2
4 2 1 1 1 z2
nz z 2 z 2
D 2
4 D 4D 2
4D 2
z2
So our formula for n is: n
4D 2
Sometimes you have a rough idea of what is,
but you’re trying to get a more precise value.
D2
So we have 2 formulae for determining the
appropriate sample size for estimating the
population proportion.
0 1.96 Z
Filling in our z 2
(1.96) 2