Confidence Intervals

Confidence Intervals
Contents
Two-sided Confidence interval

One-sided Confidence interval
Student ‘s t Distribution
How to use SPSS
3 Basic concepts
Goal: To describe or estimate some characteristic

of a continuous random variable (such as its
mean).
Method: Get a representative/random sample
and use the information contained in a sample of
observations to make statistical inference on the
unknown parameter of the population
Populations and Parameters
Population
A group of individuals that we would like to know
something about
Parameter
A characteristic of the population in which we have a
particular interest
Often denoted with Greek letters (μ, σ, π , ρ)
Examples:
 The mean of the population. (μ)
 The standard deviation of the population. (σ)

Samples and Statistics
Sample
A subset of a population (hopefully
representative)
Statistic
A characteristic of the sample
Often denoted with English letters (x-bar, s, p, r)
Examples:
The mean of the sample. (x-bar)
 The standard deviation of the sample. (s)
Populations and Samples
Studying populations is too expensive and time-

consuming, and thus impractical
If a sample is representative of the population,

then by observing the sample we can learn
something about the population
And thus by looking at the characteristics of the sample
(statistics), we may learn something about the
characteristics of the population (parameters)
Statistical Analyses
Two steps
Descriptive Statistics
Describe the sample
Inference
Make inferences about the population
using what is observed in the sample
Primarily performed in two ways:
Estimation
Hypothesis testing
Basic concepts
8
Statistical Inference has two parts:

Estimation (of parameters)
▪ Point estimation
▪ Interval estimation
Hypothesis testing
Estimation Methods
Estimation
Point Interval
Estimation Estimation
Confidence
Interval
Estimation Process
Population Random Sample

I am 95%
Mean confident that
Mean, mX, is mX is between
unknown
`X = 50
40 & 60.
11 Basic concepts
Point Estimation: use the sample data to

calculate a single number to estimate a
parameter of population
Point Estimation
Provides Single Value

Based on Observations from 1 Sample
Gives No Information about How Close
Value Is to the Unknown Population
Parameter
Sample Mean`X = 8.3 Is Point Estimate of
Unknown Population Mean
Point Estimates
A point estimate is a single value called statistic used

to estimate a population value called parameter.
Sample Population
Statistics parameter
Mean
x µ
Standard
S 
deviation
Proportion P 
14 Basic concepts
Interval Estimation:
a technique provides a range of reasonable
values that are intended to contain the
parameter of the population, with a certain
degree of confidence. This range of values is
called a confidence interval.
Confidence Interval Estimate
 A confidence interval is a range of values

within which the population parameter is
expected to occur at a specified probability.
P ( a  m  b)  1  
 The probability that the population
parameter will lie within a confidence interval
is called the level of confidence.
 Find an interval （a, b） such that
 for a given significant level

Interval Estimation
Provides Range of Values

Based on Observations from 1 Sample
Gives Information about Closeness to
Unknown Population Parameter
Stated in terms of Probability
Knowing Exact Closeness Requires Knowing Unknown
Population Parameter
e.g., Unknown Population Mean Lies Between
50 & 70 with 95% Confidence
Key Elements of Interval Estimation
A Probability That the Population Parameter Falls

Somewhere Within the Interval.
Sample Statistic
Confidence Interval
(Point Estimate)
Confidence Limit Confidence Limit

(Lower) (Upper)
Level of Confidence
Probability that the Unknown Population Parameter

Falls Within Interval
Denoted by 100 (1 - ) %
 is Probability That Parameter Is Not Within Interval
Typical Values Are 99%, 95%, 90%
Level of Confidence
A relative frequency interpretation

In the long run, 100 1   ) % of all the
confidence intervals that can be constructed
will contain the unknown parameter
A specific interval will either contain or not
contain the parameter
No probability involved in a specific interval
Factors Affecting
Interval Width (Precision)
Data variation Intervals Extend from

Measured by  X - Z to X + Z 
x x
Sample size
   
X
n
Level of confidence
 100 1   ) %
Elements of
Confidence Interval Estimation
Level of confidence
Confidence in which the interval will
contain the unknown population parameter
Precision (range)
Closeness to the unknown parameter
Cost
Cost required to obtain a sample of size n
Determining Sample Size (Cost)
Too Big: Too small:

• Requires too • Won’t obtain
many resources exact CI
Confidence Interval Estimates
Confidence
Intervals
Mean Variance Proportion
 Known  Unknown
Two-sided Confidence Interval
Two-sided Confidence Interval Estimate
of the Population Mean μ (with σ known)

Given confidence level= 95%, (α=0.05).
Given a random variable X that has

mean μ and standard deviation  X  N(m , 2 )
X is actually a random variable，its probability distribution

has mean μ and standard deviation  n
n
 X 2
X  i1
i
X  N( m , )
n n
the central limit theorem states that Z z  x  m ~ N (0,1)
has a standard normal distribution / n
Given confidence level= 95%, (α=0.05).

P(1.96  z  1.96)  0.95
X-μ
P ( - 1.96 < < 1.96 ) = 0.95
σ /√ n
X-μ
P ( - 1.96 < < 1.96 ) = 0.95
σ /√ n
P( X 1.96  ( n )  m  X  1.96  ( n ))  0.95
P  X  1.96( n) X  1.96( n )   0.95

The 95% confidence interval (CI) as follows


X  1.96 
n
Where population standard deviation σ is known
Given confidence level (1-α)

 The confidence interval of μ is given by (σ is known)
  
X  Z / 2 or X  Z / 2  m  X  Z / 2
n n n
 Where Z is the value corresponding to a cumulative
area of 1- α/2 from the standard normal distribution. α=0.01 Z0.01/2 =2.58
α=0.05 Z0.05/2 =1.96
α=0.10 Z0.10/2 =1.645
Where z follows Standard
Normal Distribution 𝛼/2 1  𝛼/2
z~N(0,1)
z
-Zα/2 Zα/2
Interval Estimate of m

n
X  z / 2  X i
X  i 1
n n
where: X is the sample mean

1 - is the level of confidence
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
 is the population standard deviation known
n is the sample size
Confidence Interval for μ
( Known)
 A 95% Confidence interval of m

 
X  1.96  m  X  1.96
n n
 A 99% Confidence interval of m
 
X  2.58  m  X  2.58
n n
 A % Confidence interval of m
 
X  Z /2  m  X  Z /2
n n
The meaning of
Confidence Interval
we are 95% confident that the interval will

cover μ. This statement does not imply that μ is a
random variable that assumes a value within the
interval 95% of the time, nor that 95% of the
population values lie between these limits;
The meaning of
Confidence Interval
it means that if we were to select 100 random

samples from the population and use these
samples to calculate 100 different confidence
intervals for μ approximately 95 of the intervals
would cover the true population mean and 5
would not.
The meaning of Confidence Interval
the estimator X is a random variable, whereas the

parameter μ is a constant. Therefore, the interval.
( X 1.96( n) ，X 1.96( n))
is random and has a 95% chance of covering μ before

a sample is selected.
The meaning of Confidence Interval
Since μ has a fixed value, once a sample has been

drawn and the confidence limits
( x  1.96( n ) ，x  1.96( n ))
have been calculated, either μ is within the interval

or it is not. There is no longer any probability
involved.
Length of Interval
If we wish to make an interval tighter without reducing the

level of confidence, we need more information about the
population mean; thus, we must select a larger sample.
As the sample size n increases, the standard error
decreases; this results in a X  n
more narrow confidence interval.

Length of Interval
Consider the 95% confidence limits:


X  1.96 
n
Sample Size for an Interval Estimate
of a Population Mean
Let E = the desired margin of error.
E is the amount added to and subtracted from the

point estimate to obtain an interval estimate.
Sample Size for an Interval Estimate
of a Population Mean
Margin of Error 
E  z /2
n
Necessary Sample Size

(z / 2 )  2 2
n  2
E
Example
Suppose that we draw a sample of size 12 from the

population of hypertensive smokers and that these men have
a men serum cholesterol level of X =217mg/100ml, with σ=46
mg/100ml
Construct a 95% confidence interval estimate
Solution:  X =217, n=12 σ=46
confidence level=95% ，α=0.05 , Z0.05/2=1.96
 46
Using the formula: X  Z 0.05/2   217  1.96 
n 12
95% confidence interval is: (191, 243)
Exercise
The Dean of the Medical School wants to estimate the

mean number of hours studied per week by students. The
number of hours in a random sample of 49 students are
29 20 26 21 25 24 16 27 20 33 28 16 16 19 27 24 28
24 26 24 29 20 23 27 25 16 25 22 30 24 25 27 23 30
24 19 23 20 30 24 28 30 17 21 22 28 27 32 22
Construct a 95% confidence interval estimate for σ=4.5

One-sided Confidence Interval
In some situations, we are concerned with either an upper

limit for the population mean μ or a lower limit for μ but not
both. Consider the distribution of hemoglobin levels
hemoglobin is an oxygen-bearing protein found in red blood
cells-for the population of children under the age of 6 who
have been exposed to high levels of lead. This distribution has
an unknown mean μ and standard deviation = 0.85 g/100 ml.
We know that children who have lead poisoning tend to have
much lower levels of hemoglobin than children who do not.
Therefore, we might be interested in finding an upper bound
for μ.
Upper 95% confidence bound for μ :

X m
P ( - 1.645 ≤Z) = 0.95 P(1.645  )  0.95
/ n
P( X 1.645  ( )  m )  0.95
n
P(m  X 1.645 ( ))  0.95
n 0.05
1  0.05
X 1.645 ( ) -1.645
z
n
Lower 95% confidence bound for μ：
P ( Z≤1.645) = 0.95 0.05

1  0.05
X m
P(  1.645)  0.95
/ n 1.645
P( X 1.645  ( )  m )  0.95
n
X 1.645 ( )m
n
Example
Suppose that we select a sample of 74 children who have

been exposed to high levels of lead; these children have a
mean hemoglobin level of : X = 10.6 g/100ml , =0.85
g/100ml. Based on this sample, Construct a one-sided 95%
confidence interval for μ-the upper bound.
Solution:  X =10.6, n=74, =0.85
confidence level 1- α =95% ，α=0.05 , z0.05=1.645
 0.85
Using the formula: X  Z 0.05   10.6  1.645  =10.8
n 74
the upper bound of one-sided 95% confidence interval is: 10.8
Exercise
Suppose that we select a sample of 164 children who have

been exposed to high levels of lead; these children have a
mean hemoglobin level of : X = 11.4 g/100ml , =1.25 g/100ml.
Based on this sample, Construct a one-sided 95% confidence
interval for μ-the upper bound.
Interval Estimate of a Population
Mean:  Known
matters need attention：

Adequate Sample Size
In most applications, a sample size of n

= 30 is adequate.
If the population distribution is highly
skewed or contains outliers, a sample size
of 50 or more is recommended.
Mean:  unknown
If an estimate of the population standard deviation 

cannot be developed prior to sampling, we use the
sample standard deviation s to estimate  .
This is the  unknown case.
In this case, the interval estimate for m is based on the t

distribution.
We’ll assume for now that the population is normally
distributed.
Student's t Distribution
In reality, If μ is unknown,  is probably unknown as

well. In this situation, confidence intervals are calculated
in much the same way as we have already seen. Instead of
using the standard normal distribution, however, the
analysis depends on a probability distribution known as
Student's t distribution.
Student's t Distribution
When the population standard deviation is not known,

it may seem logical to substitute s, the standard deviation
of a sample drawn from the population, for . This is, in
fact, the ratio t
X m
t
s/ n
is a random variable and its probability distribution
Student's t distribution with n-1 degrees of freedom.
Degrees of Freedom (df )
Number of observations that are free

to vary after sample mean has been
calculated degrees of freedom
= n -1
Example: = 3 -1
Mean of 3 numbers is 2 =2
X 1  1 (or any number)
X 2  2 (or any number)
X 3  3 (cannot vary)
t Distribution
The t distribution is a family of similar probability

distributions.
A specific t distribution depends on a parameter

known as the degrees of freedom.
Degrees of freedom refer to the number of

independent pieces of information that go into the
computation of s.
t Distribution
A t distribution with more degrees of freedom has

less dispersion.
As the number of degrees of freedom increases, the

difference between the t distribution and the
standard normal probability distribution becomes
smaller and smaller.
t Distribution
t distribution
Standard normal (df=20)
distribution
Bell-Shaped
Symmetric t distribution
(df=10)
‘Fatter’ Tails
z, t
0
t Distribution
For more than 100 degrees of freedom, the standard

normal z value provides a good approximation to
the t value.
The standard normal z values can be found in the

infinite degrees ( ) row of the t distribution table.
t Distribution
Degrees Area in Upper Tail

of Freedom .20 .10 .05 .025 .01 .005
. . . . . . .
50 .849 1.299 1.676 2.009 2.403 2.678
60 .848 1.296 1.671 2.000 2.390 2.660
80 .846 1.292 1.664 1.990 2.374 2.639
100 .845 1.290 1.660 1.984 2.364 2.626
 .842 1.282 1.645 1.960 2.326 2.576
Standard normal
z values
Interval Estimation of a Population Mean:
σ Unknown: Two-sided
Interval Estimate
s n
X  t / 2,n  1
n
 Xi  i
( X  X )2
n X  s 
i 1 i 1
n n  1
where: 1 - = the confidence level

t/2,n-1 = the t value providing an area of /2
in the upper tail of a t distribution /2 1  /2
with n - 1 degrees of freedom
z
s = the sample standard deviation
-tα/2 tα/2
Mean: s Unknown Two-sided
Example:
Consider a random sample of 16 children selected
from the population of infants receiving antacids that
contain aluminum. These antacids are often used to
treat peptic or digestive disorders. The distribution of
plasma aluminum levels is known to be approximately
normal; however, its mean μ and standard deviation 
are not known. The mean aluminum level for the
sample of 16 infants is 37.2 μg/l and the sample
standard deviation is s= 7.13 μg/l.
Mean: σ Unknown Two-sided
Let us provide a 95% confidence interval estimate

of the mean of the population. We will assume this
population to be normally distributed.
Mean:  Unknown Two-sided
At 95% confidence,  =0 .05, and /2 =0 .025.

t0.025,df is based on n  1 = 16  1 = 15 degrees of freedom.
In the t distribution table we see that t0.025, 15 = 2.131
Degrees Area in Upper Tail
of Freedom .20 .100 .050 .025 .010 .005
15 .866 1.341 1.753 2.131 2.602 2.947
16 .865 1.337 1.746 2.120 2.583 2.921
17 .863 1.333 1.740 2.110 2.567 2.898
18 .862 1.330 1.734 2.101 2.520 2.878
19 .861 1.328 1.729 2.093 2.539 2.861
. . . . . . .
Mean:  Unknown Two-sided
s
X  t0.025,15
n
7.13
37.2  2.131  37.2  3.889
16
CI （33.311 ， 41.089）
We are 95% confident interval for the mean of population

is between 33.311 μg/l and 41.089 μg/l.
Interval Estimation of a Population Mean:
σ Unknown: One-sided
 Interval Estimate
 X i

n
(X i  X )2
X  i 1
s  i1
n n  1
s
Upper:
X  t .n1
n
1 
s t
Lower:
X  t .n1 tα, n-1
n
1 
t
-tα, n-1
Summary of Interval Estimation
Procedures for a Population Mean
Yes No
 be assumed
known ?
 Unknown
 Known Use the sample

standard deviation
s to estimate
Two-sided One-sided Two-sided One-sided

 s
Use Upper: X  z Use Upper: X  t .n1
 n n
X  z /2 s s
Lower:  X  t / 2.n 1 X  t .n1
n X  z n Lower:
n n
Confidence Interval for μ ( unKnown)
Assumptions 70.161.96(9.7/(109)1/2
Population standard deviation is unknown
Population is normally distributed
If population is not normal, use large
sample
Use student’s t distribution
Confidence interval estimate
 S S
X  t / 2 , n  1  m  X  t / 2 , n  1
n n
How to use SPSS
Exercise
Consider a random sample of 25 children selected from the

population of infants receiving antacids that contain aluminum.
These antacids are often used to treat peptic or digestive disorders.
The distribution of plasma aluminum levels is known to be
approximately normal; The mean aluminum level for the sample of
ten infants is X = 36.8 μg/l and the sample standard deviation is s=
7.85 μg/l .
Construct a 95% confidence interval for the mean of the population

Using SPSS to estimate CI
For the σ known σ=9.7

Setup dataset in Variable View of the Data
editor
Input the data into the dataset in Data View
Using the descriptives to calculate the mean, s.
Input the value to the formula, calculate the CI

Using SPSS to estimate CI
For the σ unknown
Setup dataset in Variable View of the Data

editor
Input the data into the dataset in Data View
Using the explore to calculate the CI
Exercise
The Dean of the Medical School wants to estimate the

mean number of hours studied per week by students. The
number of hours in a random sample of 49 students are
29 20 26 21 25 24 16 27 20 33 28 16 16 19 27 24 28
24 26 24 29 20 23 27 25 16 25 22 30 24 25 27 23 30
24 19 23 20 30 24 28 30 17 21 22 28 27 32 22
Construct a 95% confidence interval estimate for σ=4.5

Exercise

Confidence Intervals

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Confidence Intervals

Uploaded by

Copyright:

Available Formats

Confidence Intervals

Two-sided Confidence interval

Goal: To describe or estimate some characteristic

 The standard deviation of the population. (σ)

Studying populations is too expensive and time-

If a sample is representative of the population,

Statistical Inference has two parts:

Population Random Sample

Point Estimation: use the sample data to

Provides Single Value

A point estimate is a single value called statistic used

 A confidence interval is a range of values

 for a given significant level

Provides Range of Values

A Probability That the Population Parameter Falls

Confidence Limit Confidence Limit

Probability that the Unknown Population Parameter

A relative frequency interpretation

Data variation Intervals Extend from

Too Big: Too small:

Mean Variance Proportion

Two-sided Confidence Interval Estimate

of the Population Mean μ (with σ known)

Given confidence level= 95%, (α=0.05).

Given a random variable X that has

X is actually a random variable，its probability distribution

Given confidence level= 95%, (α=0.05).

P( X 1.96  ( n )  m  X  1.96  ( n ))  0.95

P  X  1.96( n) X  1.96( n )   0.95

The 95% confidence interval (CI) as follows

Given confidence level (1-α)

where: X is the sample mean

 A 95% Confidence interval of m

we are 95% confident that the interval will

it means that if we were to select 100 random

the estimator X is a random variable, whereas the

( X 1.96( n) ，X 1.96( n))

is random and has a 95% chance of covering μ before

Since μ has a fixed value, once a sample has been

have been calculated, either μ is within the interval

If we wish to make an interval tighter without reducing the

As the sample size n increases, the standard error

decreases; this results in a X  n

more narrow confidence interval.

Consider the 95% confidence limits:

Let E = the desired margin of error.

E is the amount added to and subtracted from the

Necessary Sample Size

Suppose that we draw a sample of size 12 from the

The Dean of the Medical School wants to estimate the

Construct a 95% confidence interval estimate for σ=4.5

In some situations, we are concerned with either an upper

Upper 95% confidence bound for μ :

Lower 95% confidence bound for μ：

P ( Z≤1.645) = 0.95 0.05

Suppose that we select a sample of 74 children who have

Suppose that we select a sample of 164 children who have

matters need attention：

In most applications, a sample size of n

If an estimate of the population standard deviation 

This is the  unknown case.

In this case, the interval estimate for m is based on the t

In reality, If μ is unknown,  is probably unknown as

When the population standard deviation is not known,