Chapter 8

Chapter Eight
One Sample Inference
1
Chapter Goals
After completing this chapter, you are expected to:
• Distinguish between a point estimate and an interval
estimate
• Construct and interpret a confidence interval estimate
for a single population mean.
• Describe what a statistical hypothesis is.
• Know what Type I and Type II errors are.
• Test a hypothesis about a single population mean
• Conduct a hypothesis test of association between
variables.
2
Statistical Inference
• Statistical inference deals with making inference (under
uncertainty) based on the information provided by a
sample.
Descriptive Statistics Statistical Inference
Without generalization Generalize on the basis
of sample information
Describe data with summary Estimate population
measures parameters
Display data with graphical Test hypotheses about values
aids of parameters
3
Statistical Inference . . .
• Statistical inference is divided in two major classes:
estimation and hypothesis testing.
• Parameter estimation: finding optimal estimates of the
unknown parameter on the basis of sample information.
• Hypothesis testing: deals with testing whether a prior
assumption of the value of a parameter of a model is
supported by empirical data.
4
Estimation
• Population parameter is a numerical measure of a
summary characteristic of a population (e.g., µ,s).
• Sample statistic is a numerical measure of a summary
characteristic of a sample (e.g., x , s).
• An estimator of a population parameter is a sample
statistic to estimate or predict the population parameter.
• An estimate is a particular numerical value of a sample
statistic (estimator is a function, and estimate is a value
of the function).
• A point estimate is a single value used as an estimate of
a population parameter.
• An interval estimate is an interval aimed to include the
value of a population parameter.
5
Point and interval estimation of the mean
• We use the sample mean ( X ) as a point estimator of
the population mean (μ) .
• An estimator of a population parameter possesses the
following properties.
– Unbiasedness
– Efficiency
– Consistency
– Sufficiency
• A point estimator is unbiased if its expected value is
equal to the population parameter.
• Example:
– The sample mean is an unbiased estimator of μ
– The sample variance is an unbiased estimator of σ2
– The sample proportion is an unbiased estimator of P6
Unbiasedness
θ̂is1 an unbiased estimator, θ̂is2 biased:
θ̂1 θ̂ 2
θ θ θ̂
• Let θ̂ be an estimator of 
• The bias in θ̂ is defined as the difference between its
mean and 
Bias(θˆ ) E(θˆ )θ
• The bias of an unbiased estimator is 0.
7
Most efficient estimator
• Suppose that there are several unbiased estimators of 
• The most efficient estimator or the minimum variance
unbiased estimator of  is the unbiased estimator with the
smallest variance.
• Let θ̂ and θ̂ 2 be two unbiased estimators of , based on
1
the same number of sample observations. Then,
– θ̂1 is said to be more efficient than θ̂ 2 if Var( θˆ )  Var( θˆ )
1 2
– The relative efficiency of θ̂1 with respect to θ̂ is the ratio

2
of their variances:
Var( θˆ 2 )
Relative Efficiency 
Var( θˆ )
1
8
Consistency
• Let θ̂ be an estimator of .
• θ̂ is a consistent estimator of  if the difference between the
expected value of θ̂ and  decreases as the sample size
increases.
• Or an estimator is consistent if its values approach (in
probability sense) to the parameter as the sample size increases.
• Consistency is desired when unbiased estimators cannot be
obtained.
Sufficiency
• An estimator is sufficient if it contains all the information in the
data about the parameter it estimates.
• For example, a sample median is also an estimator of the
population mean, but it is not sufficient as it only uses the
middle observation of the ordered sample.
9
Interval estimation of population mean
• An interval estimate is defined by two numbers, between which a
population parameter is said to lie.
• A confidence interval is a calculated interval which contains an
unknown parameter value with a prescribed level of confidence.
• If Pr(a <  < b) = 1 -  then the interval from a to b is called a 100(1 -
)% confidence interval of .
• The quantity (1 - ) is called the confidence level of the interval ( is

between 0 and 1)
• The general formula for all confidence intervals is:

Point Estimate ± (Reliability Factor)(Standard Error)
• The value of the reliability factor depends on the desired level of

confidence (1 - ).
10
Confidence Interval for μ (σ2 Known)
• Assumptions
– Population variance (σ2) is known.
– Population is normally distributed.
– If population is not normal, use large sample.
• Confidence interval estimate:
σ σ
x  z α/2  μ  x  z α/2
n n
(where z/2 is the normal distribution value for a
probability of /2 in each tail).
11
Finding the reliability factor, Z/2
• Consider a 95% confidence interval:
1    .95
α α
 .025  .025
2 2
Z units: z = -1.96 0 z = 1.96

Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
 Z.025 = 1.96 from the standard normal distribution table.

12
Example
• Suppose that the heights of 100 sample male students at a
university has a mean 67.5in. If the population of heights is
distributed normal with standard deviation 2.9in, find the (a)
95% and (b) 99% confidence intervals for estimating the mean
height. Interpret the results.
• Solution
1 -   0.99
1 -   0.95
σ
σ x  z α/2
x  z α/2 n
n
 67.5  2.57 (2.9/ 100 )
 67.5  1.96 (2.9/ 100 )
 67.5  0.5684
 67.5  0.5684
66.7547  μ  68.2453
66.9316  μ  68.0684
• Interpretation: We are 95% confident that the true
mean height of male students is between 66.9316 and
13
68.0684 in.
Confidence interval for μ (σ2 unknown)
• If the population standard deviation σ is unknown, we
can substitute the sample standard deviation, s.
• This introduces extra uncertainty, since s is variable from
sample to sample.
• Use the t distribution instead of the normal distribution.
• Confidence interval estimate:
s s
x  t n -1,  μ  x  t n -1,
2 n 2 n
t  x μ
• The variable s/ n follows the Student’s t distribution
with (n - 1) degrees of freedom.
• Degrees of freedom is the number of observations that are
free to vary after sample mean has been calculated. 14
A portion of Student’s t table
Upper Tail Area

Let: n = 3
df .10 .025 df = n - 1 = 2
.05
 = .10
1 3.078 6.314 12.706 /2 =.05
2 1.886 2.920 4.303

/2 = .05
3 1.638 2.353 3.182
The body of the table

contains t values, not 0 2.920 t
probabilities 15
Example
A random sample of n = 25 has x = 50 and
s = 8. Construct a 95% confidence interval for μ
– d.f. = n – 1 = 24, so t n1,α/2  t 24,.025  2.0639
The confidence interval is

s s
x  tn-1,  μ  x  tn-1,
2 n 2 n
8 8
50  (2.0639)  μ  50  (2.0639)
25 25
46.698  μ  53.302
16
Basic concepts of hypothesis testing
• What is a Hypothesis?
• A hypothesis is a claim (assumption) about a
population parameter:
• Example:
– The mean grade point of students at AAU is μ= 2.95.
• The Null Hypothesis (H0) states the assumption to be
tested.
• H0 is always about a population parameter, not about a
sample statistic .
H0 : μ  2.95 H0 : X  2.95
17
Hypothesis testing . . .
• Begin with the assumption that the null hypothesis is
TRUE.
• Always contains “=” , “≤” or “” sign

The Alternative Hypothesis (H1)
• Is the opposite of the null hypothesis
– e.g., The mean grade point of students at AAU is not
equal to 2.95 (H1: μ ≠ 2.95).
• Never contains the “=” , “≤” or “” sign.
• Is generally the hypothesis that the researcher is trying
to support.
18
Example
Formulate appropriate null and alternative hypotheses for
testing the demographer's theory that the mean number of
children born to urban women is less than the mean
number of children born to rural women.
– H0: The mean number of children born to urban
women is greater than or equal to the mean number
of children born to rural women.
– H1: The mean number of children born to urban
women is less than the mean number of children
born to rural women.
19
Exercise
 For many years, cigarette advertisements have been
required to carry the following statement: "Cigarette
smoking is dangerous to your health." But, this warning
is often located in not easily seen corners of the
advertisements and printed in small type. Consequently,
a researcher believes that over 85% of those who read
cigarette advertisements fail to see the warning. Specify
the null and alternative hypotheses that would be used
in testing the researcher's theory.
20
Hypothesis testing . . .
 The decision of whether to reject the null hypothesis is
based on the value of a test statistic.
• A test statistic is a standardized value that is calculated
from sample data during a hypothesis test.
• Rejection region (also called critical region) is the range
of values of a sample statistic that will lead to rejection
of the null hypothesis.
• For example, in testing the population mean grade point
of students at AAU = 2.95, a logical choice as a test
statistic for μ is , and the rejection region contains the
values of that would lead us to believe that H1 is true,
i.e.,μ, μ>2.95 or μ<2.95.
21
Hypothesis Testing Process
Claim: the
population mean
grade point is 2.95
(Null Hypothesis:
H0: μ = 2.95 ) Population of students
Select a random
sample
Is X = 2.80 likely if μ = 2.95?
Suppose
If not likely,
the sample
REJECT mean grade point Sample of students
Null Hypothesis is 2.80: X = 2.80
22
Level of Significance ()
• The maximum acceptable probability of rejecting a true
null hypothesis.
• It defines the unlikely values of the sample statistic if the
null hypothesis is true.
• It defines rejection region of the sampling distribution.
• Typical values of α are 0.01, 0.05, or 0.10
• The level of significance is selected by the researcher at
the beginning.
• A level of significance 5% means 95 out of 100 cases
are true while 5 out of 100 cases are wrong.
23
Level of Significance
and the Rejection Region
Level of significance = a Represents

critical value
H0: μ = 3 a /2 a /2
Rejection
H1: μ ≠ 3 Two-tail test 0 region is
shaded
H0: μ ≤ 3 a
H1: μ > 3 Upper-tail test 0
H0: μ ≥ 3 H1:
a
μ<3
Lower-tail test 0
24
Possible decisions in hypothesis testing
• There are four possible types of decisions.

1) Rejecting H0 when H0 is true.
2) Rejecting H0 when H0 is false.
3) Failure to reject H0 when H0 is true.
4) Failure to reject H0 when H0 is false.
• The 1st and 4th possibilities lead to error decisions known
as Type-I error and Type-II error respectively.
• The probability of Type I error is represented by .
• The probability of Type II error is represented by β.
• Type I and Type II errors cannot happen at the same
time.
25
Power of the Test
• The power of a test is the probability of rejecting a null
hypothesis that is false.
– Power = P(Reject H0 | H1 is true) = 1 - β
– Power of the test increases as the sample size increases.
Steps for Hypothesis Testing
1. Specify H0, H1, and an acceptable level of α.
2. Define a sample-based test statistic and the rejection
region for the specified H0.
3. Collect the sample data and calculate the test statistic.
4. Make a decision to either reject or fail to reject H0.
5. Interpret the results in the language of the problem.
26
Hypothesis testing about population mean
• Suppose that we have a sample of n observations from a
random variable X ~ N(μ , σ2).
• Convert sample result () to a z value.
• Case 1: Consider the test from a normal population or a
large sample size.
H0 : μ  μ 0
H1 : μ  μ0
• The decision rule when σis known:
x  μ0
Reject H0 if z   zα
σ
n
27
Hypothesis testing about population mean . . .
• The decision rule when σis unknown:
x  μ0
Reject H0 if t   t n-1, α
s
n
large sample size.
H0 : μ  μ 0
H1 : μ  μ 0
x  μ0
Reject H0 if z    zα
σ
n 28
x  μ0
Reject H0 if t    tn-1 , α
s
n
large sample size.
H0 : μ  μ 0
H1 : μ  μ 0
x  μ0 x  μ0
Reject H 0 if z    Zα/2 or if z   Z α/2
σ σ
n n 29
x  μ0 x  μ0
Reject H0 if t    t n-1, α/2 or if t   t n-1, α/2
s s
n n
• Example:
1. A researcher designs a study to test the hypotheses H0: μ
=28 versus Ha: μ28. A random sample of 50 measurements
from the population of interest yields and s=5.6. Using α=
0.05, what conclusions can you make about the hypotheses
based on the sample information?
30
• Solution
• Since σis unknown the decision rule is
x  μ0 x  μ0
Reject H0 if t    tn-1 , α/2 or if t   tn-1 , α/2
s s
n n
25  28
t   3.79 and t  2.0096
5.6 49 , 0.025
50
• Reject H0 Since t < - t 49,0.025 , i.e. there is a sufficient
evidence that the population mean is different from 28.
31
Test of association
• When we have counts from categorical/qualitative
variables, we arrange them in cross tabulations or
contingency tables.
• The possible values of one variable determine the rows of
the table, and the possible values of the other determine
the columns of the table.
• Assume r categories for attribute A and c categories
for attribute B
– There are (r x c) possible cross-classifications.
32
r x c Contingency Table
Attribute B
Attribute A 1 2 ... c Totals
1 O11 O12 … O1c R1

2 O21 O22 … O2c R2
. . . … . .
. . . … . .
. . . … . .
r Or1 Or2 … Orc Rr
Totals C1 C2 … Cc n
33
Test for Association
• Consider n observations tabulated in an r x c
contingency table.
• Denote by Oij the number of observations in the cell
belonging to the ith row and the jth column.
• The null hypothesis is
H 0 : No association exists
between the two attributes in the population
• The appropriate test is a chi-squared test with (r-1)(c-1)

degrees of freedom.
• Let Ri and Cj be the row and column totals respectively.

34
Test for Association . . .
• The expected number of observations in cell row i and
column j, given that H0 is true, is
R iC j
Eij 
n
• A test of association at a significance level  is based on
the chi-square distribution and the following decision
rule
r c (Oij  Eij )2
Reject H0 if χ 2    χ(r2 1)c 1),α
i1 j1 Eij
35
Contingency Table Example
 Gender vs. opinion on abortion
 Opinion on abortion: pro-abortion vs. against-
abortion
 Gender: Male vs. Female
H0: There is no association between opinion on abortion
and gender
H1: There is association between opinion on abortion and
gender.
36
Assumptions for a Chi-squared Test of Association
 Well-defined categorical variables
 Representative sample
 Independent random sampling
 Large number condition: all expected values
should be > 5.
Example
A survey of clients' satisfaction levels with the
facilities and management of three sporting facilities
was conducted based on random samples of 60 clients.
The results are summarized in the following
contingency table:
37
Chi-squared Test of Association
Sporting Facility
Total
Satisfied? A B C
Yes 17 14 11 42
No 5 6 7 18
Total 22 20 18 60
• Is there evidence of different satisfaction levels in the
three facilities?
38
Chi-squared Test of Association . . .
H0: There is no association between satisfaction
and sporting facility
H1: There is association between satisfaction and
sporting facility
• Expected Cell Frequencies:
R iC j
(i th Row total)(j th Column total)
Eij  
n Total sample size
(42)(22) (42)(20)
E11   15.4 E12   14
60 60
39
Observed vs. Expected Frequencies
Sporting Facility
Total
Satisfied? A B C
Yes 17(15.4) 14(14) 11(12.6) 42
No 5(6.6) 6 (6) 7 (5.4) 18
Total 22 20 18 60
(17  15.4) 2 (14  14) 2 ( 7  5. 4) 2

 
2
  ...   1.231
15.4 14 5 .4
40
Chi-squared Test of Association
 2  1.231 with d.f.  (r - 1)(c - 1)  (1)(2)  2

Decision Rule:
If 2 > 5.99, reject H0, otherwise,
do not reject H0
Here, 2 = 1.231 <

 = 0.05 5.99, so we do not
reject H0 and
conclude that there
2 is no association
22, .05 = 5.99
Do not reject H0 Reject H0 between satisfaction
and sporting facility
41
χ2 Table
42

Chapter 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 8

Uploaded by

Copyright:

Available Formats

Chapter Eight

One Sample Inference

– The relative efficiency of θ̂1 with respect to θ̂ is the ratio

• The quantity (1 - ) is called the confidence level of the interval ( is

• The general formula for all confidence intervals is:

• The value of the reliability factor depends on the desired level of

Z units: z = -1.96 0 z = 1.96

 Z.025 = 1.96 from the standard normal distribution table.

Upper Tail Area

2 1.886 2.920 4.303

The body of the table

– d.f. = n – 1 = 24, so t n1,α/2  t 24,.025  2.0639

The confidence interval is

• Always contains “=” , “≤” or “” sign

Level of significance = a Represents

• There are four possible types of decisions.

Attribute A 1 2 ... c Totals

1 O11 O12 … O1c R1

• The appropriate test is a chi-squared test with (r-1)(c-1)

• Let Ri and Cj be the row and column totals respectively.

Yes 17(15.4) 14(14) 11(12.6) 42

No 5(6.6) 6 (6) 7 (5.4) 18

(17  15.4) 2 (14  14) 2 ( 7  5. 4) 2

 2  1.231 with d.f.  (r - 1)(c - 1)  (1)(2)  2

Here, 2 = 1.231 <

You might also like