You are on page 1of 1

Chapter 8.

Sampling distribution and estimation Nguyen Thi Thu Van - August 12, 2023
Population distribution Sample distribution Sampling distribution

10

0
300 400 500 600 700 800 More
A sample distribution of 20 GMAT scores (n=20)

Point estimate. A sample mean 𝑥̅ calculated from a random sample is Sampling distribution of an estimator 𝑋̅ is a probability distribution based on a large
a point estimate of the unknown population mean μ. number of samples of size 𝑛 from a given population.
Estimator (say, 𝑋̅) is a statistic/a function used to estimate the value of an unknown
Sampling error is the difference between an estimate and the parameter of a population. Random samples vary, so an estimator is a random
corresponding population parameter. variable.
Example, for the population mean: 𝑥̅ − 𝜇. Bias is the difference between the expected value of estimator and the true parameter,
for example, for the mean: 𝐸(𝑋̅) − 𝜇. An estimator is unbiased if its expected value
is the parameter being estimated, i.e., 𝐸(𝑋̅) = 𝜇.
Central Limit Theorem states that the sample Interval estimate. Because samples vary, we need to indicate our uncertainty about
mean 𝑥̅ is centered at 𝜇 and follows a normal the true value of a population parameter.
distribution when 𝑛 is large (𝑛 ≥ 30), regardless  Based on our knowledge of the sampling distribution of 𝑋̅, we construct a
of the population shape. This theorem is also
confidence interval (CI) for the unknown parameter 𝜇 by adding and subtracting
applied to a sample proportion. 𝜎
a margin of error from sample statistic: 𝑋̅ ± 𝑧𝛼/2
√𝑛

 To calculate CI, we first think of confidence level 1 − 𝛼 as the probability on the


procedure used to compute the confidence interval:
𝜎 𝜎
𝑃 (𝑋̅ − 𝑧𝛼/2 ≤ 𝜇 ≤ 𝑋̅ + 𝑧𝛼/2 ) = 1 − 𝛼
√𝑛 √𝑛
Now when we have taken a random sample and calculated 𝑥̅ , the number 𝟏 − 𝜶
no longer is thought as probability but is the level of confidence that the interval
contains 𝜇.
Confidence interval for a mean 𝝁 Confidence interval for a proportion 𝝅
𝑥
With known variance With unknown variance (possibly use Student T developed by William Sealy Gosset [1876 –1937]) The sample proportion 𝑝 = may be assumed normal if 𝑛𝑝 ≥ 10 and 𝑛(1 − 𝑝) ≥ 10
𝑛

𝑛 ≥ 30 𝑛 < 30
𝑥̅ ± 𝑧𝛼/2 × 𝜎/√𝑛 𝑥̅ ± 𝑡𝛼/2 × 𝑠/√𝑛 or 𝑥̅ ± 𝑧𝛼/2 × 𝑠/√𝑛 𝑥̅ ± 𝑡𝛼/2 × 𝑠/√𝑛
𝑝(1 − 𝑝)
𝑝 ± 𝑧𝛼/2 √
Simply, because when 𝑑. 𝑓. is large, 𝑡 ≈ 𝑧 and 𝑡 is slightly larger than 𝑧. 𝑛

𝑧𝛼/2 ≡ 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝛼/2) 𝑡𝛼/2 ≡ 𝑇. 𝐼𝑁𝑉(𝛼/2, 𝑑𝑓 ); 𝑑. 𝑓. = 𝑛 − 1 is called the degree of freedom. 𝑧𝛼/2 ≡ 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝛼/2)

Estimating from a finite population [sampling without replacement, 𝒏 > 𝟓% × 𝑵]

𝜎 𝑁−𝑛 𝑠 𝑁−𝑛 𝑝(1 − 𝑝) 𝑁−𝑛


𝑥̅ ± 𝑧𝛼/2 ×√ 𝑥̅ ± 𝑡𝛼/2 ×√ 𝑝 ± 𝑧𝛼/2 √ ×√
√𝑛 𝑁−1 √𝑛 𝑁−1 𝑛 𝑁−1

𝑁−𝑛
√ is called the finite population correction factor (FPCF). This factor helps reduce the margin of error and provides a more precise interval when 𝑛 > 5% × 𝑁. Recall that if 𝑛 < 5% × 𝑁 then the population is effectively infinite.
𝑁−1

𝑧𝛼/2 𝜎 2 To estimate a population proportion 𝜋 with a maximum allowable margin of error of


To estimate a population mean 𝜇 with a maximum allowable margin of error of ±𝐸, we need sample size of 𝑛 = ( ) . Note that always round n to the next higher integer to be
𝐸
𝑧𝛼/2 2
conservative. ±𝐸, we need a sample size of 𝑛 = ( ) × 𝑝(1 − 𝑝)
𝐸

You might also like