You are on page 1of 46

Introduction to

Inferential Statistics
Glyzel Grace M. Francisco
2nd Semester, 2022-2023

CENTRAL LUZON STATE UNIVERSITY


DEPARTMENT of
STATISTICS Learning Outcomes
At the end of this lesson, the students must be able to:
1. Understand the concept of statistical inference.
2. Find the confidence interval for the mean when 𝜎 is known.
3. Determine the minimum sample size for finding a confidence interval for the
mean.
4. Find the confidence interval for the mean when 𝜎 is unknown.
5. Find the confidence interval for a proportion.
6. Determine the minimum sample size for finding a confidence interval for a
proportion.
7. Find a confidence interval for a variance and a standard deviation.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 2


DEPARTMENT of
STATISTICS Inferential Statistics
n = 100
N = 1000

process of using sample results to draw conclusions about the


characteristics of a population
Two most important types /areas of statistical inference are:
1. Estimation of Parameter
• To obtain a guess or an estimate of the unknown value along with the
determination of its accuracy
2. Statistical Hypothesis
• To examine whether the sample data support or contradict the investigators
conjecture about the true value of the parameter.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 3
DEPARTMENT of
STATISTICS Inferential Statistics

Point
Estimation
Estimation
Inferential Interval
Estimation
Statistics Hypothesis
Testing

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 4


DEPARTMENT of
STATISTICS Areas of Inferential Statistics

Estimation of Parameter
to determine the approximate value of a population
parameter on the basis of a sample statistic

Test of Hypothesis
to make decisions regarding unknown population
parameter values based on sample data

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 5


DEPARTMENT of
STATISTICS Estimation
Some important terms to be used:
➢ Estimator - sample statistic used to approximate a population parameter
➢ Estimate - specific value or range of value to approximate some population
parameter
➢ Standard error - standard deviation of an estimator

Estimators are used in two different ways:


1. Point Estimation - we are computing a single value from a sample data to
estimate the population parameter.
2. Interval Estimation - we are producing interval of values that is likely to
contain the true value of the parameter.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 6


DEPARTMENT of
STATISTICS Estimation
Three Properties of a Good Estimator
1. The estimator should be an unbiased estimator. That is, the expected value
or the mean of the estimates obtained from samples of a given size is equal
to the parameter being estimated.
2. The estimator should be consistent. For a consistent estimator, as sample
size increases, the value of the estimator approaches the value of the
parameter estimated.
3. The estimator should be a relatively efficient estimator. That is, of all the
statistics that can be used to estimate a parameter, the relatively efficient
estimator has the smallest variance.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 7


DEPARTMENT of
STATISTICS Estimation
Point Estimation
• A single number is calculated to estimate the population parameter.
• The rule or formula that describes the calculation of a single number is
called a point estimator.
• The resulting number is called a point estimate.

Characteristics of a good point estimate:


1. Unbiased - the mean of distribution is equal to the value of the
parameter.
2. The spread (measured by variance) of the sampling distribution should be
as small as possible.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 8
DEPARTMENT of
STATISTICS Point Estimation
Point estimators of the population mean, variance and proportion.
Parameter Point Estimator
Population Mean Sample mean σ 𝑋𝑖
𝑋ത =
𝑛
Population Variance Sample Variance σ 2
( 𝑋 )
σ 𝑋𝑖2 − 𝑖
𝑠2 = 𝑛
𝑛−1
Population Standard Sample Standard 𝑠= 𝑠2
Deviation Deviation
Population Sample Proportion 𝑥
𝑝Ƹ =
Proportion 𝑛
𝑥 = 𝑛𝑜. 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 𝑛 𝑡𝑟𝑖𝑎𝑙𝑠
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 9
DEPARTMENT of
STATISTICS Point Estimation

A single number is calculated to estimate the population


parameter
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 (𝑢𝑛𝑘𝑛𝑜𝑤𝑛)

The resulting number is called a point estimate.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 10


DEPARTMENT of
STATISTICS Point Estimation Example
1. Assume that the monthly salary follows a normal distribution.
Find a point estimate for the population mean monthly salary.
22,130 18,465 25,616 22,440 19,869

σ 𝑋𝑖 22,130 + 18,465 + 25,616 + 22,440 + 19,869


𝑋ത = = = 𝟐𝟏, 𝟏𝟎𝟒
𝑛 5

The point estimate for the population mean month salary is 21,104.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 11


DEPARTMENT of
STATISTICS Point Estimation Example
2. To assess the accuracy of a laboratory scale, a standard weight
that is known to weigh 1 gram is repeatedly weighed 4 times. The
resulting measurements (in grams) are: 0.96, 1.01, 1.03, and 0.99. Find a
point estimate for the population mean weight and population variance.
σ 𝑋𝑖 0.96 + 1.01 + 1.03 + 0.99
𝑋ത = = = 𝟎. 𝟗𝟗𝟕𝟓
𝑛 4
( σ 𝑋 𝑖 )2 (0.96+1.01+1.03+0.99)2
2
2
σ 𝑋𝑖 −
𝑛
(
0.962 +
1.012 +
1.032 + 2
0.99 )−
4
𝑠 = =
𝑛−1 3
2
3.9827 − 3.9800
𝑠 = = 𝟎. 𝟎𝟎𝟎𝟗
3
The point estimate for the population mean weight and population variance
are 0.9975 and 0.0009 respectively.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 12
DEPARTMENT of
STATISTICS Point Estimation Example
3. Every year, 1st year students may or may not choose
to study Statistics. To estimate the fraction who do study
Statistics, a sample of 1000 students was chosen from the past
10 years and 637 has chosen Statistics as a 1st year subject. Find
the point estimate for the proportion of 1st year students who
chose Statistics as 1st year subject.
𝑥 637
𝑝Ƹ = = = 𝟎. 𝟔𝟑𝟕𝟎 𝑜𝑟 𝟔𝟑. 𝟕𝟎%
𝑛 1000
the point estimate for the proportion of 1st year students who
chose Statistics as 1st year subject is 63.70%.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 13
DEPARTMENT of
STATISTICS Point Estimation Example
4. The Graduate Schools of Business Law take a random sample
of recent PhD graduates (n=135) and finds that 12 of these are
unemployed. Find the point estimate for the proportion of all
PhD graduates who find a job.

𝑥 135 − 12
𝑝ො = = = 𝟎. 𝟗𝟏𝟏𝟏 𝑜𝑟 𝟗𝟏. 𝟏𝟏%
𝑛 135

the point estimate for the proportion of all PhD graduates who find
a job is 91.11%.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 14


DEPARTMENT of
STATISTICS Interval Estimation
Two numbers are calculated to form an interval, consisting of the
lower limit and an upper limit, to estimate the true value of a
population parameter
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 (𝑢𝑛𝑘𝑛𝑜𝑤𝑛)

The resulting pair numbers is called an interval estimate or


confidence interval.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 15
DEPARTMENT of
STATISTICS Interval Estimation
Some important terms:

➢Confidence Interval – range/ interval of values that is likely to contain the


true value of the population parameter.
➢Degree of Confidence – measure of how certain we are that our interval
contains the population parameter. Commonly used degree of confidence
and level of significance
Degree of Confidence (1-𝜶) 100% 𝜶
90% 0.10
95% 0.05
99% 0.01

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 16


DEPARTMENT of
STATISTICS Interval Estimation
Critical Value
number on the borderline separating sample statistics that are likely to occur
from those that are unlikely to occur
A good confidence interval has two desirable
(1-𝜶) 100% 𝜶 𝜶/2 𝒁𝜶 /𝟐 characteristics:

90% 0.10 0.05 1.645 1. The confidence interval must be narrow


as possible. The narrower the interval, the
95% 0.05 0.025 1.96 more exactly you have located the
estimated parameter.
99% 0.01 0.005 2.575 2. The confidence coefficient must be near
to 1. The larger the confidence coefficient,
the more likely it is that the interval will
contain the estimated parameter.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 17
DEPARTMENT of
STATISTICS Interval Estimation

Error of Estimation
• difference between the estimate and the true value of the
parameter

Margin of Error
• also called the maximum error of the estimate, is the maximum
likely difference between the point estimate of a parameter and
the actual value of the parameter

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 18


DEPARTMENT of
STATISTICS Interval Estimation: z Critical Value

(1-𝜶)
𝜶 𝜶/2 𝒁𝜶 /𝟐
100%
90% 0.10 0.05 1.645
95% 0.05 0.025 1.96
99% 0.01 0.005 2.575

Note: Get the average of the two z-scores if the probability is exactly at the
middle of the two probabilities. Otherwise, use the closer one.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 19
DEPARTMENT of
STATISTICS Student t Distribution
A t-distribution is like a Z distribution, except has slightly fatter tails
to reflect the uncertainty added by estimating 𝜎

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 20


DEPARTMENT of
STATISTICS Student t Distribution

• T-tests (and all linear models, in fact) have a “normality


assumption”

• If the outcome variable is not normally distributed and


the sample size is small, a t-test is inappropriate

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 21


DEPARTMENT of
STATISTICS Student t Distribution Properties
1. The student t distribution is different for different sample sizes.

2. It has the same general symmetric bell shape as the standard


normal distribution, but it reflects greater variability (with wider
distributions) that is expected with small samples.

3. It has a mean of t=0.

4. As the sample size gets larger, the Student t distribution gets


closer to the standard normal distribution.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 22
DEPARTMENT of
STATISTICS Interval Estimation: t Critical Value
Commonly used degree of confidence
Two-tailed
𝒕𝜶
𝒁𝜶/𝟐 𝟐 ,𝒏−𝟏
(1-𝜶) 100% 𝜶/2
n=10 n=20 n=100
90% 0.10 1.645 1.833 1.729 1.645
95% 0.05 1.96 2.262 2.093 1.96
99% 0.01 2.575 3.250 2.861 2.575

As the sample size gets larger, the Student t distribution gets closer to the
standard normal distribution.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 23
DEPARTMENT of
STATISTICS Interval Estimation: z Critical Value
As n increases, the t approaches to z

(1-𝜶) 100% 𝜶/2 𝒁𝜶/𝟐 n=10 n=20 n=100

95% 0.05 1.96 t=2.262 t=2.093 t=1.96

Two-tailed
𝒁0.05/2

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 24


DEPARTMENT of
STATISTICS Interval Estimation: t Critical Value
(1-𝜶) 100% 𝜶 𝜶/2 n=10, df=9 n=20, df=19 n=100, df=99
95% 0.025 0.05 t=2.262 t=2.093 t=1.96

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 25


DEPARTMENT of
STATISTICS Interval Estimation: t Critical Value
(1-𝜶) 100% 𝜶 𝜶/2 n=10, df=9 n=20, df=19 n=100, df=99
95% 0.025 0.05 t=2.262 t=2.093 t=1.96

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 26


DEPARTMENT of
STATISTICS Confidence Interval of the Mean
(𝒑𝒐𝒊𝒏𝒕 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆 ± 𝒎𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝒆𝒓𝒓𝒐𝒓)

A (1-𝛼) 100% CI for population mean μ (𝝈 is known)


𝜎
𝑥ҧ − 𝐸 < 𝜇 < 𝑥ҧ + 𝐸 , 𝑤ℎ𝑒𝑟𝑒 𝐸 = 𝑍𝛼/2
𝑛
𝜎 𝜎
𝑥ҧ − 𝑍𝛼/2 < 𝜇 < 𝑥ҧ + 𝑍𝛼/2
𝑛 𝑛
Remark:
For a 90% confidence interval, 𝑍0.10/2 = 1.645; for a 95% confidence interval,
𝑍0.05/2 = 1.960; and for a 99% confidence interval, 𝑍0.01/2 = 2.575.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 27


DEPARTMENT of
STATISTICS Confidence Interval of the Mean
(𝒑𝒐𝒊𝒏𝒕 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆 ± 𝒎𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝒆𝒓𝒓𝒐𝒓)
A (1- 𝛼) 100% CI for population mean μ (𝝈 is unknown)
𝑠
𝑥ҧ − 𝐸 < 𝜇 < 𝑥ҧ + 𝐸 , 𝑤ℎ𝑒𝑟𝑒 𝐸 = 𝑡𝛼 ,𝑛−1
2 𝑛
𝑠 𝑠
𝑥ҧ − 𝑡𝛼 ,𝑛−1 < 𝜇 < 𝑥ҧ + 𝑡𝛼 ,𝑛−1
2 𝑛 2 𝑛
Remarks:
1. The degrees of freedom are n - 1. The degrees of freedom are the number of
values that are free to vary after a sample statistic has been computed, and
they tell the researcher which specific curve to use when a distribution
consists of a family of curves.
2. The values of 𝑡𝛼/2 with df = n - 1, are taken from the student t distribution,
most often called the t distribution.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 28
DEPARTMENT of
STATISTICS Confidence Interval of the Mean
A sample of size 100 produced the sample mean of 16. Assuming the population standard
deviation is equal to 3, compute a 95% confidence interval for the population mean µ.
n=100, 𝑥=16,
ҧ 𝜎=3, 𝛼=0.05, 𝝈 is known
𝜶
95% CI, 𝜶 = 𝟎. 𝟎𝟓, = 𝟎. 𝟎𝟐𝟓, 𝒁𝟎.𝟎𝟓/𝟐 = 𝟏. 𝟗𝟔(see slide 23)
𝟐
𝜎
𝑥ҧ − 𝐸 < 𝜇 < 𝑥ҧ + 𝐸 , 𝑤ℎ𝑒𝑟𝑒 𝐸 = 𝑍𝛼/2
𝑛
𝜎 3
𝐸 = 𝑍0.05/2 = 1.96 = 𝟎. 𝟓𝟖𝟖𝟎
𝑛 100
16 − 0.5880 < 𝜇 < 16 + 0.5880
𝟏𝟓. 𝟒𝟏𝟐𝟎 < 𝜇 < 𝟏𝟔. 𝟓𝟖𝟖𝟎 𝑜𝑟 𝟏𝟓. 𝟒𝟏𝟐𝟎, 𝟏𝟔. 𝟓𝟖𝟖𝟎
We are 95% confident that the true value of the population mean falls between 15.4120
and 16.5880.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 29
DEPARTMENT of
STATISTICS Confidence Interval of the Mean

Assume that the entry level monthly salary of random employees


follows a normal distribution. Find a confidence interval for the
population mean entry level monthly salary and interpret it if the
observed salaries of 5 employees are as follows:

22,130 18,465 25,616 22,440 19,869

Note: s = 2733.3707
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 30
DEPARTMENT of
STATISTICS Confidence Interval of the Mean
22,130 18,465 25,616 22,440 19,869
n=5, 𝑥=21704,
ҧ 𝑠=2733.3707, 𝛼=0.05, 𝝈 is unknown
95% CI, 𝜶 = 𝟎. 𝟎𝟓, 𝒕𝟎.𝟎𝟓,𝒅𝒇=𝒏−𝟏=𝟒 = 𝟐. 𝟕𝟕𝟔 (see slide 32)
𝟐
𝑠
𝑥ҧ − 𝐸 < 𝜇 < 𝑥ҧ + 𝐸 , 𝑤ℎ𝑒𝑟𝑒 𝐸 = 𝑡𝛼 ,𝑛−1
2 𝑛
𝑠 2733.3707
𝐸 = 𝑡0.05 = 2.776 = 𝟑𝟑𝟗𝟑. 𝟑𝟖𝟑𝟗
2
,4 𝑛 5
21704 − 3393.3839 < 𝜇 < 21704 + 3393.3839
𝟏𝟖, 𝟐𝟏𝟎. 𝟔𝟏𝟔𝟏 < 𝝁 < 𝟐𝟓, 𝟎𝟗𝟕. 𝟑𝟖𝟑𝟗 𝑜𝑟 𝟏𝟖, 𝟐𝟏𝟎. 𝟔𝟏𝟔𝟏, 𝟐𝟓, 𝟎𝟗𝟕. 𝟑𝟖𝟑𝟗
We are 95% confident that the true value of the population mean entry level
monthly salary of random employees falls between 18,310.6161 and 25,097.3839.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 31
DEPARTMENT of
STATISTICS Confidence Interval of the Mean

Alpha = 0.05
Tail = two-tailed

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 32


DEPARTMENT of
STATISTICS Confidence Interval for a proportion
𝑝Ƹ 𝑞ො
𝑝Ƹ − 𝐸 < 𝑝 < 𝑝Ƹ + 𝐸 , 𝑤ℎ𝑒𝑟𝑒 𝐸 = 𝑍𝛼/2
𝑛

𝑝Ƹ 𝑞ො 𝑝Ƹ 𝑞ො
𝑝Ƹ − 𝑧𝛼 < 𝑝 < 𝑝Ƹ − 𝑧𝛼/2
2 𝑛 𝑛
Remarks:
1. p = population proportion and 𝑝Ƹ = sample proportion
𝑥
2. For a sample proportion, 𝑝Ƹ = and 𝑞ො = 1 − 𝑝Ƹ where X is the number of
𝑛
sample units that possess the characteristics of the interest and n is the
sample size.
3. 𝑛𝑝Ƹ and 𝑛𝑞ො are each greater than or equal to 5.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 33
DEPARTMENT of
STATISTICS Confidence Interval for a proportion
Example: A survey was conducted with 1404 respondents and found that 323
students paid for their education by student loans. Find the 90% confidence interval
of the true proportion of students who paid for their education by student loans.
𝑥 323 (See slide 23)
𝑝ො = = = 𝟎. 𝟐𝟑, 𝑞ො = 1 − 𝑝ො = 1 − 0.23 = 𝟎. 𝟕𝟕, 𝑧𝛼=0.10/2 = 1.645
𝑛 1404
𝑝Ƹ 𝑞ො
𝑝Ƹ − 𝐸 < 𝑝 < 𝑝Ƹ + 𝐸 , 𝑤ℎ𝑒𝑟𝑒 𝐸 = 𝑍𝛼/2
𝑛 We are 90% confident
that the true
𝑝Ƹ 𝑞ො (0.23)(0.77) proportion of students
𝐸 = 𝑍0.10/2 = 1.645 = 𝟎. 𝟎𝟏𝟖𝟓
𝑛 1404 who paid for heir
0.23 − 0.0185 < 𝜇 < 0.23 + 0.0185 education by student
𝟎. 𝟐𝟏𝟏𝟓 < 𝝁 < 𝟎. 𝟐𝟒𝟖𝟓 𝑜𝑟 loans falls between
𝟎. 𝟐𝟏𝟏𝟓, 𝟎. 𝟐𝟒𝟖𝟓 𝑜𝑟 𝟐𝟏. 𝟏𝟓%, 𝟐𝟒. 𝟖𝟓% 21.15% and 24.85%
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 34
DEPARTMENT of
STATISTICS Confidence Interval for Variances and Standard Deviations

Formula for the confidence Interval for a Variance:


(𝑛 − 1)𝑠 2 2 <
(𝑛 − 1)𝑠 2
< 𝜎
2𝛼/2 1−𝛼/2
2

Formula for the confidence Interval for a Standard Deviation:


(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2
<𝜎<
𝛼/2
2
1−𝛼/2
2

Note: degrees of freedom df = n-1

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 35


DEPARTMENT of
STATISTICS Confidence Interval for Variances and Standard Deviations
Remarks:
1. To calculate these confidence intervals, a new statistical distribution is
needed. It is called the chi-square distribution.
2. The chi-square variable is similar to the t variable in that its distribution is a
family of curves based on the number of degrees of freedom. The symbol
for chi-square is 2 (Greek letter chi, pronounced “ki”).
3. A chi-square variable cannot be negative, and the distributions are skewed
to the right.
Assumptions for Finding a CI for a Variance or Standard Deviation
1. The sample is a random sample.
2. The population must be normally distributed.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 36
DEPARTMENT of
STATISTICS Confidence Interval for Variances and Standard Deviations

Example: Find the 95% confidence interval for the variance and
standard deviation of the nicotine content of cigarettes
manufactured if a random sample of 20 cigarettes has a standard
deviation of 1.6 milligrams. Assume the variable is normally
distributed.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 37


DEPARTMENT of
STATISTICS Confidence Interval for Variances and Standard Deviations

n = 20, s = 1.6, 𝛼 = 0.05

The critical value for 2𝛼/2 with df=19 is

2𝛼/2 = 20.05/2 = 20.025 = 32.852

2
The critical value for 1−𝛼 with df=19 is
2

2 𝛼 = 2 0.05 = 20.975 = 8.907


1− 2 1− 2

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 38


DEPARTMENT of
STATISTICS Confidence Interval for Variances

95% confidence interval for the population variance


n = 20, s = 1.6, 𝛼 = 0.05, 20.025,19 = 32.852, 20.975,19 = 8.907

(𝑛 − 1)𝑠 2 2 <
(𝑛 − 1)𝑠 2
< 𝜎
2𝛼/2 1−𝛼/2
2

(20 − 1)1.62 2 <


(20 − 1)1.62
We are 95% confident that the
< 𝜎
20.05/2 1−0.05/2
2 true value of population
(20 − 1)1.62 2
(20 − 1)1.62 variance for the nicotine
< 𝜎 <
0.025
2
20.975 content falls between 1.4806
(20 − 1)1.62 (20 − 1)1.62
and 5.4609
< 𝜎2 <
32.852 8.907
𝟐
𝟏. 𝟒𝟖𝟎𝟔 < 𝝈 < 𝟓. 𝟒𝟔𝟎𝟗

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 39


DEPARTMENT of
STATISTICS Confidence Interval for Standard Deviations

95% confidence interval for the population variance


n = 20, s = 1.6, 𝛼 = 0.05, 20.025,19 = 32.852, 20.975,19 = 8.907

(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2
<𝜎<
𝛼/2
2
1−𝛼/2
2

We are 95% confident that the


(20 − 1)1.62 (20 − 1)1.62 true value of population
<𝜎< standard deviation for the
32.852 8.907
nicotine content falls between
1.4806 < 𝜎 < 5.4609 1.2168 and 2.3369

𝟏. 𝟐𝟏𝟔𝟖 < 𝝈 < 𝟐. 𝟑𝟑𝟔𝟗


GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 40
DEPARTMENT of
STATISTICS Sample Size Determination

• Sample size determination is closely related to statistical estimation.

• Quite often you ask, How large a sample is necessary to make an accurate
estimate?

• The answer is not simple, since it depends on three things: the margin of
error, the population standard deviation, and the degree of confidence.

• For example, how close to the true mean do you want to be (2 units, 5
units, etc.), and how confident do you wish to be (90, 95, 99%, etc.)

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 41


DEPARTMENT of
STATISTICS Sample Size Determination
Minimum sample size needed for an interval estimate of the pop’n mean:
Remark: Assume that the population standard deviation of the variable is
known or has been estimated from a previous study.
2
𝑍𝛼/2 ∙ 𝜎
𝑛=
𝐸
Where:
𝑍𝛼/2 is the critical value
𝜎 is the population standard deviation
E is the margin of error.

Note: If there is any fraction or decimal portion in the answer, use the next whole
number for sample size n.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 42
DEPARTMENT of
STATISTICS Sample Size Determination
Example: A scientist wishes to estimate the average depth of a river. He
wants to be 99% confident that the estimate is accurate within 2 feet. From a
previous study, the standard deviation of the depths measured was 4.33
feet.
𝑍0.01/2 = 2.575, 𝜎 = 4.33, 𝐸=2
2 2
𝑍𝛼/2 ∙ 𝜎 (2.575)(4.33)
𝑛= = = 31.2 ≈ 𝟑𝟐
𝐸 2

Therefore, to be 99% confident that the estimate is within 2 feet of the true
mean depth, the scientist needs a sample of at least 32 measurements.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 43


DEPARTMENT of
STATISTICS Sample Size Determination
Minimum sample size needed for an interval estimate of the pop’n proportion:
2
𝑍𝛼/2
𝑛 = 𝑝Ƹ 𝑞ො
𝐸
Where:
𝑍𝛼/2 is the critical value
𝑝Ƹ is the sample proportion and 𝑞ො = 1 − 𝑝Ƹ
E is the margin of error.

Note: If there is any fraction or decimal portion in the answer, use the next whole
number for sample size n.

GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 44


DEPARTMENT of
STATISTICS Sample Size Determination
Example: A researcher wishes to estimate, with 95% confidence, the
proportion of people who own a home computer. A previous study shows
that 40% of those interviewed had a computer at home. The researcher
wishes to be accurate within 2% of the true proportion. Find the minimum
sample size necessary.

𝑍0.05/2 = 1.96, 𝑝Ƹ = 0.40, 𝑞ො = 0.60, 𝐸 = 0.02


2 2
𝑍𝛼/2 1.96
𝑛 = 𝑝Ƹ 𝑞ො = 0.40 0.60 = 2304.96 ≈ 𝟐𝟑𝟎𝟓
𝐸 0.02

Therefore, to be 95% confident that the estimate is within 2% of the true


proportion, the researcher must interview 2305 people.
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 45
DEPARTMENT of
STATISTICS Self Test
1. To assess the accuracy of a laboratory scale, a standard weight that is
known to weigh 1 gram is repeatedly weighed 4 times. The resulting
measurements (in grams) are: 0.95, 1.02, 1.01, and 0.98. Assume that the
weighings by the scale when the true weight is 1 gram are normally
distributed with mean µ. Construct a 99% CI for µ. Is it acceptable to
continue the usage of such laboratory scale?
2. Pulse rates of people are quite important. The pulse rates (in beats per
minute) of randomly selected women in a certain country have the statistics:
n=42, 𝑥=76.14.
ҧ Assume that 𝜎 is known to be 12.52.Using a 0.95 confidence
level, find both of the following:
a) The margin of error E
b) The confidence interval for µ
GGMFRANCISCO INTRODUCTION TO INFERENTIAL STATISTICS | 46

You might also like