You are on page 1of 26

5: Introduction to estimation

(A) Intro to statistical inference


(B) Sampling distribution of the mean
(C) Confidence intervals (σ known)
(D) Student’s t distributions
(E) Confidence intervals (σ not known)
(F) Sample size requirements

04/15/24 5: Intro to estimation 1


Statistical inference
Statistical inference  generalizing from
a sample to a population with
calculated degree of certainty
Two forms of statistical inference
 Estimation  introduced this chapter
 Hypothesis testing  next chapter

04/15/24 5: Intro to estimation 2


Parameters and estimates
Parameter  numerical characteristic of a
population
Statistics = a value calculated in a sample
Estimate  a statistic that “guesstimates” a
parameter
Example: sample mean “x-bar” is the estimator of
population mean µ

Parameters and estimates are related but are


not the same

04/15/24 5: Intro to estimation 3


Parameters and statistics
Parameters Statistics

Source Population Sample

Notation Greek (μ, σ) Roman (x, s)

Random No Yes
variable?
Calculated No Yes

04/15/24 5: Intro to estimation 4


Sampling distribution of the mean
x-bar takes on different values with
repeated (different) samples
µ remain constant
Even though x-bar is variable, it’s
“behavior” is predictable
The behavior of x-bar is predicted by its
sampling distribution, the Sampling
Distribution of the Mean (SDM)

04/15/24 5: Intro to estimation 5


Simulation experiment
Distribution of AGE in population.sav
(Fig. right) 200

 N = 600
 µ = 29.5 (center)
  = 13.6 (spread)
 Not Normal (shape)
Conduct three sampling simulations
100

For each experiment


 Take multiple samples of size n
 Calculate means
 Plot means  simulated SDMs 0
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0

Experiment A: each sample n = 1 AGE


Experiment B: each sample n = 10
Experiment C: each sample n = 30

04/15/24 5: Intro to estimation 6


Results of simulation experiment
Findings:
(1) SDMs are
centered on
29 (µ)
(2) SDMs
become
tighter as n
increases
(3) SDMs
become
Normal as
the n
increases

04/15/24 5: Intro to estimation 7


95% Confidence Interval for µ
Formula for a 95% confidence interval for μ when σ is known:

x  (1.96)( SEM )

where SEM 
n

04/15/24 5: Intro to estimation 8


Illustrative example
Example
 Population with σ = 13.586 (known ahead of
time)
 SRS  {21, 42, 11, 30, 50, 28, 27, 24, 52}
 n = 10, x-bar = 29.0
SEM = n13.586 / 10 = 4.30
95% CI for µ = Margin of error
= xbar ± (1.96)(SEM)
= 29.0 ± (1.96)(4.30)
= 29.0 ± 8.4
= (20.6, 37.4)

04/15/24 5: Intro to estimation 9


Margin of error
Margin or error  d = half the confidence
interval
Surrounded x-bar with margin of error
95% CI for µ
= xbar ± (1.96)(SEM)
= 29.0 ± (1.96)(4.30)
= 29.0 ± 8.4

point estimate

margin of error

04/15/24 5: Intro to estimation 10


Interpretation of a 95% CI
We are 95% confident the parameter will be captured by the interval.

04/15/24 5: Intro to estimation 11


Other levels of confidence
Let the probability confidence interval will not capture parameter
1 –  the confidence level
Confidence level Alpha level z1–
1– 
.90 .10 1.645
.95 .05 1.96
.99 .01 2.58

04/15/24 5: Intro to estimation 12


(1 – )100% confidence for μ
Formula for a (1-α)100% confidence interval for μ when σ is known:

x  z1   SEM
2

04/15/24 5: Intro to estimation 13


Example: 99% CI, same data
Same data as before
99% confidence interval for µ
= x-bar ± (z1–.01)(SEM)
= x-bar ± (z.995)(SEM)
= 29.0 ± (2.58)(4.30)
= 29.0 ± 11.1
= (17.9, 40.1)

04/15/24 5: Intro to estimation 14


Confidence level and CI length
p. 5.9 demonstrates the effect of raising your confidence
level  CI length increases  more likely to capture µ

Confidence CI for illustrative CI length*


level data
90% (21.9, 36.1) 14.2

95% (20.6, 37.4) 16.8

99% (17.9, 40.1) 22.2

* CI length = UCL – LCL

04/15/24 5: Intro to estimation 15


Beware
Prior CI formula applies only to
 SRS
 Normal SDMs
 σ known ahead of time
It does not account for:
 GIGO
 Poor quality samples (e.g., due to non-
response)

04/15/24 5: Intro to estimation 16


When σ is Not Known
In practice we rarely know σ
Instead, we calculate s and use this as an
estimate of σ
This adds another element of uncertainty to
the inference
A modification of z procedures called
Student’s t distribution is needed to
account for this additional uncertainty

04/15/24 5: Intro to estimation 17


Student’s t distributions
Brilliant!
William Sealy Gosset
(1876-1937) worked for
the Guinness brewing
company and was not
allowed to publish
In 1908, writing under
the the pseudonym
“Student” he described
a distribution that
accounted for the extra
variability introduced by
using s as an estimate
of σ

04/15/24 5: Intro to estimation 18


t Distributions
Student’s t distributions
are like a Standard
Normal distribution but
have broader tails
There is more than one
t distribution (a family)
Each t has a different
degrees of freedom (df)
As df increases, t
becomes increasingly
like z

04/15/24 5: Intro to estimation 19


t table
Each row is for a particular df
Columns contain cumulative
probabilities or tail regions
Table contains t percentiles (like z
scores)
Notation: tdf,p Example: t9,.975 = 2.26

04/15/24 5: Intro to estimation 20


95% CI for µ, σ not known
Formula for a (1-α)100% confidence interval for μ when σ is NOT known:

x  t n 1,1   sem
2

s
where sem 
n
Same as z formula except replace z1/2 with t/2 and SEM with sem

04/15/24 5: Intro to estimation 21


Illustrative example: diabetic weight
To what extent
are diabetics over x  112 .778
weight? s  14.424
Measure “% of sem 
s

14.242
 3.400
ideal body n 18
weight” = (actual t n 1,1   t181,1 .05  t17,.975  2.110 (from t table)
body weight) ÷
2 2

x  (t n 1,1  )( sem)
(ideal body 2

weight) × 100%  112 .778  (2.110 )(3.44)


Data (n = 18):  112.778 ± 7.17
{107, 119, 99, 114, 120, = (105.6, 120.0)
104, 88, 114, 124, 116,
101, 121, 152, 100, 125,
114, 95, 117}

04/15/24 5: Intro to estimation 22


Interpretation of 95% CI for µ
Remember that the CI seeks to capture µ,
NOT x-bar
95% confidence means that 95% of similar
intervals would capture µ (and 5% would not)
For the diabetic body weight illustration, we
can be 95% confident that the population
mean is between 105.6 and 120.0

04/15/24 5: Intro to estimation 23


Sample size requirements
Assume: SRS, Normality, valid data
Let d  the margin of error (half
confidence interval length)
To get a CI with margin of error ±d,
use:
4 2
n 2
d
04/15/24 5: Intro to estimation 24
Sample size requirements, illustration

Suppose, we have a variable with  = 15


4 152
For d  5, use n  2
 36
5
4 15 2
For d  2.5, use n  2
 144
2.5
Smaller margins of
4 152 error require larger
For d  1, use n  2  900 sample sizes
1

04/15/24 5: Intro to estimation 25


Acronyms
SRS  simple random sample
SDM  sampling distribution of the mean
SEM  sampling error of mean
CI  confidence interval
LCL  lower confidence limit
UCL  lower confidence limit

04/15/24 5: Intro to estimation 26

You might also like