You are on page 1of 39

Chapter 9

Estimation
Using a Single Sample

1 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Point Estimation

A point estimate of a population


characteristic is a single number that is
based on sample data and represents a
plausible value of the characteristic.

2 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Example
A sample of 200 students at a large
university is selected to estimate the
proportion of students that wear contact lens.
In this sample 47 wore contact lens.
Let = the true proportion of all students at
this university who wear contact lens.
Consider success being a student who
wears contact lens.
number of successes in the sample
The statistic p
n
is a reasonable choice for a formula to obtain a point
estimate for .
47
Such a point estimate is p 0.235
200
3 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A sample of weights of 34 male freshman
students was obtained.
185 161 174 175 202 178
202 139 177 170 151 176
197 214 283 184 189 168
188 170 207 180 167 177
166 231 176 184 179 155
148 180 194 176
If one wanted to estimate the true mean of all
male freshman students, you might use the
sample mean as a point estimate for the true
mean.
sample mean x 182.44
4 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
After looking at a histogram and boxplot of the
data (below) you might notice that the data
seems reasonably symmetric with a outlier, so
you might use either the sample median or a
sample trimmed mean as a point estimate.

5% trimmed mean 180.07


140 180 220 260
Calculated using Minitab

177 178
sample median 177.5
5 2 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Bias
A statistic with mean value equal to the value
of the population characteristic being
estimated is said to be an unbiased
statistic. A statistic that is not unbiased is
said to be biased.

Sampling
distribution of a
unbiased statistic
Original
distribution
Sampling
distribution of a
biased statistic

True
value
6 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Criteria
Given a choice between several unbiased
statistics that could be used for estimating a
population characteristic, the best statistic to
use is the one with the smallest standard
deviation.

Unbiased sampling
distribution with the
smallest standard
deviation, the Best
choice.
True
value
7 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Large-sample Confidence Interval
for a Population Proportion

A confidence interval for a population


characteristic is an interval of plausible
values for the characteristic. It is
constructed so that, with a chosen degree
of confidence, the value of the
characteristic will be captured inside the
interval.

8 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Confidence Level

The confidence level associated with a


confidence interval estimate is the success
rate of the method used to construct the
interval.

9 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Recall
For the sampling distribution of p,
p = p (1 ) and for large* n
n
The sampling distribution of p is
approximately normal.
Specifically when n is large*, the statistic
p has a sampling distribution that is
approximately normal with mean and
standard deviation (1 ) .
n
* n 10 and n(1-) 10
10 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Some considerations

Approximately 95% of all large samples will


result in a value of p that is within
(1 ) of the true population
1.96p 1.96
n
proportion .
11 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Some considerations

Equivalently, this means that for 95% of


all possible samples, will be in the
interval
(1 ) (1 )
p 1.96 to p 1.96
n n

Since is unknown and n is large, we estimate


(1 ) p(1 p)
with
n n

This interval can be used as long as


np 10 and np(1-p) 10
12 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The 95% Confidence Interval
When n is large, a 95% confidence
interval for is
p(1 p) p(1 p)
p 1.96 , p 1.96
n n

The endpoints of the interval are often


abbreviated by p(1 p)
p 1.96
n
where - gives the lower endpoint and + the
upper endpoint.

13 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Example

For a project, a student randomly


sampled 182 other students at a large
university to determine if the majority of
students were in favor of a proposal to
build a field house. He found that 75 were
in favor of the proposal.

Let = the true proportion of students


that favor the proposal.

14 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Example - continued
75
p 0.4121
182
So np = 182(0.4121) = 75 >10 and
n(1-p)=182(0.5879) = 107 >10 we can use
the formulas given on the previous slide to
find a 95% confidence interval for .

p(1 p) 0.4121(0.5879)
p 1.96 0.4121 1.96
n 182
0.4121 0.07151

The 95% confidence interval for is


(0.341, 0.484).
15 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The General Confidence Interval
The general formula for a confidence
interval for a population proportion
when
1. p is the sample proportion from a
random sample , and
2. The sample size n is large
(np 10 and np(1-p) 10)
is given by
p(1 p)
p z critical value
n
16 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Finding a z Critical Value

Finding a z critical value for a 98%


confidence interval.

2.33
Looking up the cumulative area or 0.9900 in the
body of the table we find z = 2.33
17 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Some Common Critical Values

Confidence z critical
level value
80% 1.28
90% 1.645
95% 1.96
98% 2.33
99% 2.58
99.8% 3.09
99.9% 3.29

18 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Terminology

The standard error of a statistic is the


estimated standard deviation of the statistic.

For sample proportions, the standard deviation is


(1 )
n

This means that the standard error of the sample


proportion is
p(1 p)
19
n
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology

The bound on error of estimation, B,


associated with a 95% confidence interval is
(1.96)(standard error of the statistic).

The bound on error of estimation, B, associated


with a confidence interval is
(z critical value)(standard error of the statistic).

20 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Sample Size
The sample size required to estimate a
population proportion to within an amount
B with 95% confidence is
2
1.96
n (1 )
B
The value of may be estimated by prior
information. If no prior information is available,
use = 0.5 in the formula to obtain a
conservatively large value for n.

Generally one rounds the result up to the nearest integer.


21 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size Calculation
Example

If a TV executive would like to find a 95%


confidence interval estimate within 0.03
for the proportion of all households that
watch NYPD Blue regularly. How large a
sample is needed if a prior estimate for
was 0.15.

We have B = 0.03 and the prior estimate of = 0.15


2 2
1.96 1.96
n (1 ) (0.15)(0.85)
544.2
B 0.03
A sample of 545 or more would be needed.
22 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size Calculation Example revisited

Suppose a TV executive would like to find a


95% confidence interval estimate within 0.03
for the proportion of all households that
watch NYPD Blue regularly. How large a
sample is needed if we have no reasonable
prior estimate for .
We have B = 0.03 and should use = 0.5 in
the formula.
2 2
1.96 1.96
n (1 ) (0.5)(0.5)
1067.1
B 0.03
The required sample size is now 1068.
Notice, a reasonable ball park estimate for
23 can lower the needed sample size.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Another Example
A college professor wants to estimate the
proportion of students at a large university
who favor building a field house with a 99%
confidence interval accurate to 0.02. If one
of his students performed a preliminary
study and estimated to be 0.412, how
large a sample should he take.
We have B = 0.02, a prior estimate = 0.412 and we
should use the z critical value 2.58 (for a 99%
confidence interval)
2 2
2.58 2.58
n (1 ) (0.412)(0.588)
4031.4
B 0.02
The required sample size is 4032.
24 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample z Confidence
Interval for
If
1. x is the sample mean from a random
sample,
2. The sample size n is large (generally
n30), and
3. , the population standard deviation, is
known then the general formula for a
confidence interval for a population mean
is given by


x z critical value
n
25 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample z Confidence
Interval for
If n is small (generally n < 30) but it is
reasonable to believe that the distribution of
values in the population is normal, a
confidence interval for (when is known)
is

x z critical value
n
Notice that this formula works when is known and
either
1. n is large (generally n 30) or
2. The population distribution is normal (any
sample size.
26 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example

A certain filling machine has a true


population standard deviation = 0.228
ounces when used to fill catsup bottles. A
random sample of 36 6 ounce bottles of
catsup was selected from the output from
this machine and the sample mean was
6.018 ounces.

Find a 90% confidence interval estimate for the


true mean fills of catsup from this machine.

27 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Example I (continued)

x 6.018, 0.228, n 36

The z critical value is 1.645


x (z critical value)
n
0.228
6.018 1.645 6.018 0.063
36
90% Confidence Interval
(5.955, 6.081)
28 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Unknown - Small Size Samples
[All Size Samples]

An Irish mathematician/statistician, W. S. Gosset


developed the techniques and derived the Students
t distributions that describe the behavior of .

x 0
s n

29 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


t Distributions

If X is a normally distributed random variable, the


statistic
x 0
t
s n

follows a t distribution with df = n-1 (degrees of


freedom).

30 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


t Distributions

x 0
This statistic t is fairly robust
s n
and the results are reasonable for moderate
sample sizes (15 and up) if x is just reasonable
centrally weighted. It is also quite reasonable
for large sample sizes for distributional
patterns (of x) that are not extremely skewed.

31 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


t Distributions

Comparison of normal and t distibutions

df = 2
df = 5
df = 10
df = 25
Normal

-4 -3 -2 -1 0 1 2 3 4

32 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


t Distributions

Notice: As df increase, t distributions


approach the standard normal
distribution.

Since each t distribution would require a


table similar to the standard normal table,
we usually only create a table of critical
values for the t distributions.

33 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Central area captured: 0.80 0.90 0.95 0.98 0.99 0.998 0.999
Confidence level: 80% 90% 95% 98% 99% 99.8% 99.9%
1 3.08 6.31 12.71 31.82 63.66 318.29 636.58
2 1.89 2.92 4.30 6.96 9.92 22.33 31.60
3 1.64 2.35 3.18 4.54 5.84 10.21 12.92
4 1.53 2.13 2.78 3.75 4.60 7.17 8.61
5 1.48 2.02 2.57 3.36 4.03 5.89 6.87
D 6 1.44 1.94 2.45 3.14 3.71 5.21 5.96
7 1.41 1.89 2.36 3.00 3.50 4.79 5.41
e
8 1.40 1.86 2.31 2.90 3.36 4.50 5.04
g 9 1.38 1.83 2.26 2.82 3.25 4.30 4.78
r 10 1.37 1.81 2.23 2.76 3.17 4.14 4.59
e 11 1.36 1.80 2.20 2.72 3.11 4.02 4.44
12 1.36 1.78 2.18 2.68 3.05 3.93 4.32
e 13 1.35 1.77 2.16 2.65 3.01 3.85 4.22
s 14 1.35 1.76 2.14 2.62 2.98 3.79 4.14
15 1.34 1.75 2.13 2.60 2.95 3.73 4.07
o 16 1.34 1.75 2.12 2.58 2.92 3.69 4.01
17 1.33 1.74 2.11 2.57 2.90 3.65 3.97
f 18 1.33 1.73 2.10 2.55 2.88 3.61 3.92
19 1.33 1.73 2.09 2.54 2.86 3.58 3.88
f 20 1.33 1.72 2.09 2.53 2.85 3.55 3.85
21 1.32 1.72 2.08 2.52 2.83 3.53 3.82
r 22 1.32 1.72 2.07 2.51 2.82 3.50 3.79
e 23 1.32 1.71 2.07 2.50 2.81 3.48 3.77
e 24 1.32 1.71 2.06 2.49 2.80 3.47 3.75
d 25 1.32 1.71 2.06 2.49 2.79 3.45 3.73
26 1.31 1.71 2.06 2.48 2.78 3.43 3.71
o 27 1.31 1.70 2.05 2.47 2.77 3.42 3.69
m 28 1.31 1.70 2.05 2.47 2.76 3.41 3.67
29 1.31 1.70 2.05 2.46 2.76 3.40 3.66
30 1.31 1.70 2.04 2.46 2.75 3.39 3.65
40 1.30 1.68 2.02 2.42 2.70 3.31 3.55
60 1.30 1.67 2.00 2.39 2.66 3.23 3.46
120 1.29 1.66 1.98 2.36 2.62 3.16 3.37
34 z critical values 1.28 1.645 1.96 2.33 2.58 3.09 3.29
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample t Procedures
Suppose that a SRS of size n is drawn from a
population having unknown mean . The general
confidence limits are
s
x (t critical value)
n

and the general confidence interval for is


s s
x (t critical value) , x (t critical value)
n n

35 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Confidence Interval Example

Ten randomly selected shut-ins were each


asked to list how many hours of television
they watched per week. The results are
82 66 90 84 75
88 80 94 110 91
Find a 90% confidence interval estimate for
the true mean number of hours of
television watched per week by shut-ins.

36 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Confidence Interval Example

Calculating the sample mean and standard


86,
deviation we have n = 10, x = 86 s = 11.842

We find the critical t value of 1.833 by looking on the


t table in the row corresponding to df = 9, in the
column with bottom label 90%. Computing the
confidence interval for is
s 11.842
x t* 86 (1.833) 86 6.86
n 10
(79.14, 92.86)
37 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example

To calculate the confidence interval, we had


to make the assumption that the distribution
of weekly viewing times was normally
distributed. Consider the normal plot of the
10 data points produced with Minitab that is
given on the next slide.

38 Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.


Confidence Interval Example
Notice that the normal plot looks reasonably
linear so it is reasonable to assume that the
number of hours of television watched per week
by shut-ins is normally distributed.
Normal Probability Plot

.999

Typically if the .99


.95
p-value is more than Probability .80

0.05 we assume that the .50

.20
distribution is normal .05
.01
.001

70 80 90 100 110
Hours
Anderson-Darling Normality Test
Average: 86
StDev: 11.8415
Anderson-Darling Normality Test
A-Squared: 0.226

A-Squared: 0.226
N: 10 P-Value: 0.753

39
P-Value: 0.753
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

You might also like