You are on page 1of 70

Confidence Intervals

Contents

Two-sided Confidence interval


One-sided Confidence interval

Student ‘s t Distribution
How to use SPSS
3 Basic concepts

Goal: To describe or estimate some characteristic


of a continuous random variable (such as its
mean).
Method: Get a representative/random sample
and use the information contained in a sample of
observations to make statistical inference on the
unknown parameter of the population
Populations and Parameters
Population
A group of individuals that we would like to know
something about
Parameter
A characteristic of the population in which we have a
particular interest
Often denoted with Greek letters (μ, σ, π , ρ)
Examples:
 The mean of the population. (μ)

 The standard deviation of the population. (σ)


Samples and Statistics

Sample
A subset of a population (hopefully
representative)
Statistic
A characteristic of the sample
Often denoted with English letters (x-bar, s, p, r)
Examples:
The mean of the sample. (x-bar)
 The standard deviation of the sample. (s)
Populations and Samples

Studying populations is too expensive and time-


consuming, and thus impractical

If a sample is representative of the population,


then by observing the sample we can learn
something about the population
And thus by looking at the characteristics of the sample
(statistics), we may learn something about the
characteristics of the population (parameters)
Statistical Analyses

Two steps
Descriptive Statistics
Describe the sample
Inference
Make inferences about the population
using what is observed in the sample
Primarily performed in two ways:
Estimation
Hypothesis testing
Basic concepts
8

Statistical Inference has two parts:


Estimation (of parameters)
▪ Point estimation
▪ Interval estimation
Hypothesis testing
Estimation Methods

Estimation

Point Interval
Estimation Estimation

Confidence
Interval
Estimation Process

Population Random Sample


I am 95%
Mean confident that
Mean, mX, is mX is between
unknown
`X = 50
40 & 60.
11 Basic concepts

Point Estimation: use the sample data to


calculate a single number to estimate a
parameter of population
Point Estimation

Provides Single Value


Based on Observations from 1 Sample
Gives No Information about How Close
Value Is to the Unknown Population
Parameter
Sample Mean`X = 8.3 Is Point Estimate of
Unknown Population Mean
Point Estimates

A point estimate is a single value called statistic used


to estimate a population value called parameter.
Sample Population
Statistics parameter
Mean
x µ
Standard
S 
deviation
Proportion P 
14 Basic concepts

Interval Estimation:
a technique provides a range of reasonable
values that are intended to contain the
parameter of the population, with a certain
degree of confidence. This range of values is
called a confidence interval.
Confidence Interval Estimate

 A confidence interval is a range of values


within which the population parameter is
expected to occur at a specified probability.
P ( a  m  b)  1  
 The probability that the population
parameter will lie within a confidence interval
is called the level of confidence.
 Find an interval (a, b) such that

 for a given significant level


Interval Estimation

Provides Range of Values


Based on Observations from 1 Sample
Gives Information about Closeness to
Unknown Population Parameter
Stated in terms of Probability
Knowing Exact Closeness Requires Knowing Unknown
Population Parameter
e.g., Unknown Population Mean Lies Between
50 & 70 with 95% Confidence
Key Elements of Interval Estimation

A Probability That the Population Parameter Falls


Somewhere Within the Interval.
Sample Statistic
Confidence Interval
(Point Estimate)

Confidence Limit Confidence Limit


(Lower) (Upper)
Level of Confidence

Probability that the Unknown Population Parameter


Falls Within Interval
Denoted by 100 (1 - ) %
 is Probability That Parameter Is Not Within Interval
Typical Values Are 99%, 95%, 90%
Level of Confidence

A relative frequency interpretation


In the long run, 100 1   ) % of all the
confidence intervals that can be constructed
will contain the unknown parameter
A specific interval will either contain or not
contain the parameter
No probability involved in a specific interval
Factors Affecting
Interval Width (Precision)

Data variation Intervals Extend from


Measured by  X - Z to X + Z 
x x
Sample size
   
X
n
Level of confidence
 100 1   ) %
Elements of
Confidence Interval Estimation

Level of confidence
Confidence in which the interval will
contain the unknown population parameter
Precision (range)
Closeness to the unknown parameter
Cost
Cost required to obtain a sample of size n
Determining Sample Size (Cost)

Too Big: Too small:


• Requires too • Won’t obtain
many resources exact CI
Confidence Interval Estimates

Confidence
Intervals

Mean Variance Proportion

 Known  Unknown
Two-sided Confidence Interval

Two-sided Confidence Interval Estimate

of the Population Mean μ (with σ known)


Two-sided Confidence Interval

Given confidence level= 95%, (α=0.05).

Given a random variable X that has


mean μ and standard deviation  X  N(m , 2 )

X is actually a random variable,its probability distribution


has mean μ and standard deviation  n
n

 X 2
X  i1
i
X  N( m , )
n n
the central limit theorem states that Z z  x  m ~ N (0,1)
has a standard normal distribution / n
Two-sided Confidence Interval

Given confidence level= 95%, (α=0.05).


P(1.96  z  1.96)  0.95
X-μ
P ( - 1.96 < < 1.96 ) = 0.95
σ /√ n
Two-sided Confidence Interval

X-μ
P ( - 1.96 < < 1.96 ) = 0.95
σ /√ n

P( X 1.96  ( n )  m  X  1.96  ( n ))  0.95

P  X  1.96( n) X  1.96( n )   0.95


Two-sided Confidence Interval

The 95% confidence interval (CI) as follows



X  1.96 
n
Where population standard deviation σ is known
Two-sided Confidence Interval

Given confidence level (1-α)


 The confidence interval of μ is given by (σ is known)
  
X  Z / 2 or X  Z / 2  m  X  Z / 2
n n n
 Where Z is the value corresponding to a cumulative
area of 1- α/2 from the standard normal distribution. α=0.01 Z0.01/2 =2.58
α=0.05 Z0.05/2 =1.96
α=0.10 Z0.10/2 =1.645
Where z follows Standard
Normal Distribution 𝛼/2 1  𝛼/2
z~N(0,1)
z
-Zα/2 Zα/2
Two-sided Confidence Interval

Interval Estimate of m

n

X  z / 2  X i
X  i 1

n n

where: X is the sample mean


1 - is the level of confidence
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
 is the population standard deviation known
n is the sample size
Confidence Interval for μ
( Known)

 A 95% Confidence interval of m


 
X  1.96  m  X  1.96
n n
 A 99% Confidence interval of m
 
X  2.58  m  X  2.58
n n
 A % Confidence interval of m
 
X  Z /2  m  X  Z /2
n n
The meaning of
Confidence Interval

we are 95% confident that the interval will


cover μ. This statement does not imply that μ is a
random variable that assumes a value within the
interval 95% of the time, nor that 95% of the
population values lie between these limits;
The meaning of
Confidence Interval

it means that if we were to select 100 random


samples from the population and use these
samples to calculate 100 different confidence
intervals for μ approximately 95 of the intervals
would cover the true population mean and 5
would not.
The meaning of Confidence Interval

the estimator X is a random variable, whereas the


parameter μ is a constant. Therefore, the interval.

( X 1.96( n) ,X 1.96( n))

is random and has a 95% chance of covering μ before


a sample is selected.
The meaning of Confidence Interval

Since μ has a fixed value, once a sample has been


drawn and the confidence limits

( x  1.96( n ) ,x  1.96( n ))

have been calculated, either μ is within the interval


or it is not. There is no longer any probability
involved.
Length of Interval

If we wish to make an interval tighter without reducing the


level of confidence, we need more information about the
population mean; thus, we must select a larger sample.

As the sample size n increases, the standard error

decreases; this results in a X  n

more narrow confidence interval.


Length of Interval

Consider the 95% confidence limits:



X  1.96 
n
Sample Size for an Interval Estimate
of a Population Mean

Let E = the desired margin of error.

E is the amount added to and subtracted from the


point estimate to obtain an interval estimate.
Sample Size for an Interval Estimate
of a Population Mean

Margin of Error 
E  z /2
n

Necessary Sample Size


(z / 2 )  2 2
n  2
E
Example

Suppose that we draw a sample of size 12 from the


population of hypertensive smokers and that these men have
a men serum cholesterol level of X =217mg/100ml, with σ=46
mg/100ml
Construct a 95% confidence interval estimate
Solution:  X =217, n=12 σ=46
confidence level=95% ,α=0.05 , Z0.05/2=1.96
 46
Using the formula: X  Z 0.05/2   217  1.96 
n 12
95% confidence interval is: (191, 243)
Exercise

The Dean of the Medical School wants to estimate the


mean number of hours studied per week by students. The
number of hours in a random sample of 49 students are

29 20 26 21 25 24 16 27 20 33 28 16 16 19 27 24 28
24 26 24 29 20 23 27 25 16 25 22 30 24 25 27 23 30
24 19 23 20 30 24 28 30 17 21 22 28 27 32 22

Construct a 95% confidence interval estimate for σ=4.5


One-sided Confidence Interval

In some situations, we are concerned with either an upper


limit for the population mean μ or a lower limit for μ but not
both. Consider the distribution of hemoglobin levels
hemoglobin is an oxygen-bearing protein found in red blood
cells-for the population of children under the age of 6 who
have been exposed to high levels of lead. This distribution has
an unknown mean μ and standard deviation = 0.85 g/100 ml.
We know that children who have lead poisoning tend to have
much lower levels of hemoglobin than children who do not.
Therefore, we might be interested in finding an upper bound
for μ.
One-sided Confidence Interval

Upper 95% confidence bound for μ :


X m
P ( - 1.645 ≤Z) = 0.95 P(1.645  )  0.95
/ n
P( X 1.645  ( )  m )  0.95
n
P(m  X 1.645 ( ))  0.95
n 0.05
1  0.05
X 1.645 ( ) -1.645
z
n
One-sided Confidence Interval

Lower 95% confidence bound for μ:

P ( Z≤1.645) = 0.95 0.05


1  0.05
X m
P(  1.645)  0.95
/ n 1.645

P( X 1.645  ( )  m )  0.95
n
X 1.645 ( )m
n
Example

Suppose that we select a sample of 74 children who have


been exposed to high levels of lead; these children have a
mean hemoglobin level of : X = 10.6 g/100ml , =0.85
g/100ml. Based on this sample, Construct a one-sided 95%
confidence interval for μ-the upper bound.
Solution:  X =10.6, n=74, =0.85
confidence level 1- α =95% ,α=0.05 , z0.05=1.645
 0.85
Using the formula: X  Z 0.05   10.6  1.645  =10.8
n 74
the upper bound of one-sided 95% confidence interval is: 10.8
Exercise

Suppose that we select a sample of 164 children who have


been exposed to high levels of lead; these children have a
mean hemoglobin level of : X = 11.4 g/100ml , =1.25 g/100ml.
Based on this sample, Construct a one-sided 95% confidence
interval for μ-the upper bound.
Interval Estimate of a Population
Mean:  Known

matters need attention:


Adequate Sample Size

In most applications, a sample size of n


= 30 is adequate.
If the population distribution is highly
skewed or contains outliers, a sample size
of 50 or more is recommended.
Interval Estimate of a Population
Mean:  unknown

If an estimate of the population standard deviation 


cannot be developed prior to sampling, we use the
sample standard deviation s to estimate  .

This is the  unknown case.

In this case, the interval estimate for m is based on the t


distribution.
We’ll assume for now that the population is normally
distributed.
Student's t Distribution

In reality, If μ is unknown,  is probably unknown as


well. In this situation, confidence intervals are calculated
in much the same way as we have already seen. Instead of
using the standard normal distribution, however, the
analysis depends on a probability distribution known as
Student's t distribution.
Student's t Distribution

When the population standard deviation is not known,


it may seem logical to substitute s, the standard deviation
of a sample drawn from the population, for . This is, in
fact, the ratio t
X m
t
s/ n
is a random variable and its probability distribution
Student's t distribution with n-1 degrees of freedom.
Degrees of Freedom (df )

Number of observations that are free


to vary after sample mean has been
calculated degrees of freedom
= n -1
Example: = 3 -1
Mean of 3 numbers is 2 =2
X 1  1 (or any number)
X 2  2 (or any number)
X 3  3 (cannot vary)
t Distribution

The t distribution is a family of similar probability


distributions.

A specific t distribution depends on a parameter


known as the degrees of freedom.

Degrees of freedom refer to the number of


independent pieces of information that go into the
computation of s.
t Distribution

A t distribution with more degrees of freedom has


less dispersion.

As the number of degrees of freedom increases, the


difference between the t distribution and the
standard normal probability distribution becomes
smaller and smaller.
t Distribution

t distribution
Standard normal (df=20)
distribution

Bell-Shaped
Symmetric t distribution
(df=10)
‘Fatter’ Tails

z, t

0
t Distribution

For more than 100 degrees of freedom, the standard


normal z value provides a good approximation to
the t value.

The standard normal z values can be found in the


infinite degrees ( ) row of the t distribution table.
t Distribution

Degrees Area in Upper Tail


of Freedom .20 .10 .05 .025 .01 .005
. . . . . . .
50 .849 1.299 1.676 2.009 2.403 2.678
60 .848 1.296 1.671 2.000 2.390 2.660
80 .846 1.292 1.664 1.990 2.374 2.639
100 .845 1.290 1.660 1.984 2.364 2.626
 .842 1.282 1.645 1.960 2.326 2.576

Standard normal
z values
Interval Estimation of a Population Mean:
σ Unknown: Two-sided

Interval Estimate

s n

X  t / 2,n  1
n

 Xi  i
( X  X )2

n X  s 
i 1 i 1

n n  1

where: 1 - = the confidence level


t/2,n-1 = the t value providing an area of /2
in the upper tail of a t distribution /2 1  /2
with n - 1 degrees of freedom
z
s = the sample standard deviation
-tα/2 tα/2
Interval Estimate of a Population
Mean: s Unknown Two-sided

Example:
Consider a random sample of 16 children selected
from the population of infants receiving antacids that
contain aluminum. These antacids are often used to
treat peptic or digestive disorders. The distribution of
plasma aluminum levels is known to be approximately
normal; however, its mean μ and standard deviation 
are not known. The mean aluminum level for the
sample of 16 infants is 37.2 μg/l and the sample
standard deviation is s= 7.13 μg/l.
Interval Estimate of a Population
Mean: σ Unknown Two-sided

Let us provide a 95% confidence interval estimate


of the mean of the population. We will assume this
population to be normally distributed.
Interval Estimate of a Population
Mean:  Unknown Two-sided

At 95% confidence,  =0 .05, and /2 =0 .025.


t0.025,df is based on n  1 = 16  1 = 15 degrees of freedom.
In the t distribution table we see that t0.025, 15 = 2.131
Degrees Area in Upper Tail
of Freedom .20 .100 .050 .025 .010 .005
15 .866 1.341 1.753 2.131 2.602 2.947
16 .865 1.337 1.746 2.120 2.583 2.921
17 .863 1.333 1.740 2.110 2.567 2.898
18 .862 1.330 1.734 2.101 2.520 2.878
19 .861 1.328 1.729 2.093 2.539 2.861
. . . . . . .
Interval Estimate of a Population
Mean:  Unknown Two-sided

s
X  t0.025,15
n
7.13
37.2  2.131  37.2  3.889
16

CI (33.311 , 41.089)

We are 95% confident interval for the mean of population


is between 33.311 μg/l and 41.089 μg/l.
Interval Estimation of a Population Mean:
σ Unknown: One-sided

 Interval Estimate
 X i

n
(X i  X )2
X  i 1
s  i1

n n  1
s
Upper:
X  t .n1
n
1 

s t
Lower:
X  t .n1 tα, n-1
n
1 

t
-tα, n-1
Summary of Interval Estimation
Procedures for a Population Mean

Yes No
 be assumed
known ?
 Unknown

 Known Use the sample


standard deviation
s to estimate

Two-sided One-sided Two-sided One-sided


 s
Use Upper: X  z Use Upper: X  t .n1
 n n
X  z /2 s s
Lower:  X  t / 2.n 1 X  t .n1
n X  z n Lower:
n n
Confidence Interval for μ ( unKnown)

Assumptions 70.161.96(9.7/(109)1/2
Population standard deviation is unknown
Population is normally distributed
If population is not normal, use large
sample
Use student’s t distribution
Confidence interval estimate
 S S
X  t / 2 , n  1  m  X  t / 2 , n  1
n n
How to use SPSS
Exercise

Consider a random sample of 25 children selected from the


population of infants receiving antacids that contain aluminum.
These antacids are often used to treat peptic or digestive disorders.
The distribution of plasma aluminum levels is known to be
approximately normal; The mean aluminum level for the sample of
ten infants is X = 36.8 μg/l and the sample standard deviation is s=
7.85 μg/l .

Construct a 95% confidence interval for the mean of the population


Using SPSS to estimate CI

For the σ known σ=9.7


Setup dataset in Variable View of the Data
editor
Input the data into the dataset in Data View
Using the descriptives to calculate the mean, s.

Input the value to the formula, calculate the CI


Using SPSS to estimate CI

For the σ unknown

Setup dataset in Variable View of the Data


editor
Input the data into the dataset in Data View
Using the explore to calculate the CI
Exercise

The Dean of the Medical School wants to estimate the


mean number of hours studied per week by students. The
number of hours in a random sample of 49 students are

29 20 26 21 25 24 16 27 20 33 28 16 16 19 27 24 28
24 26 24 29 20 23 27 25 16 25 22 30 24 25 27 23 30
24 19 23 20 30 24 28 30 17 21 22 28 27 32 22

Construct a 95% confidence interval estimate for σ=4.5


Exercise

You might also like