You are on page 1of 35

# Statistical Inferences Based on

Two Samples
Chapter 10
Statistical Inferences Based on
Two Samples
10.1 Comparing Two Population Means by
Using Independent Samples: Variances
Known
10.2 Comparing Two Population Means by
Using Independent Samples: Variances
Unknown
10.3 Paired Difference Experiments
10.4 Comparing Two Population Proportions by
Using Large, Independent Samples
10.5 Comparing Two Population Variances by
Using Independent Samples
10-2
10.1 Comparing Two Population
Means by Using Independent
Samples: Variances Known
Suppose a random sample has been taken
from each of two different populations
Suppose that the populations are
independent of each other
Then the random samples are independent of
each other
Then the sampling distribution of the
difference in sample means is normally
distributed
LO 1: Compare two
population means when
the samples are
independent and the
population variances
are known.
10-3
Sampling Distribution of the
Difference of Two Sample Means #1
Suppose population 1 has mean
1
and variance
1
2
From population 1, a random sample of size n
1
is selected
which has mean x
1
and variance s
1
2
Suppose population 2 has mean
2
and variance
2
2
From population 2, a random sample of size n
2
is selected
which has mean x
2
and variance s
2
2
Then the sample distribution of the difference of two
sample means
LO1
10-4
Sampling Distribution of the
Difference of Two Sample Means #2
Is normal, if each of the sampled populations
is normal
Approximately normal if the sample sizes n
1
and
n
2
are large

Has mean
x1x2
=
1

2

Has standard deviation
2
2
2
1
2
1
2 1
n n
x x
o o
o + =

LO1
10-5
Sampling Distribution of the
Difference of Two Sample Means #3
LO1
10-6
z-Based Confidence Interval for the
Difference in Means (Variances Known)
A 100(1 o) percent confidence interval for
the difference in populations
1

2
is

( )
(
(

o
+
o

o
2
2
2
1
2
1
2 2 1
n n
z x x
LO1
10-7
z-Based Test About the Difference in
Means (Variances Known)
H
0
:
1

2
= D
0
D
0
=
1

2
is the claimed difference between the
population means
D
0
is a number whose value varies depending on
the situation
Often D
0
= 0, and the null means that there is no
difference between the population means
LO1
10-8
z-Based Test About the Difference in
Means (Variances Known)
Use the notation from the confidence interval
statement on a prior slide
Assume that each sampled population is
normal or that the samples sizes n
1
and n
2

are large
LO1
10-9
Test Statistic (Variances
Known)
The test statistic is

The sampling distribution of this statistic is a
standard normal distribution
If the populations are normal and the
samples are independent ...
( )
2
2
2
1
2
1
0 2 1
n n
D x x
z
o
+
o

=
LO1
10-10
z-Based Test About the Difference in
Means (Variances Known)
Reject H
0
:
1

2
= D
0
in favor of a particular
alternative hypothesis at a level of
significance if the appropriate rejection point
rule holds or if the corresponding p-value is
less than o
Rules are on the next slide
LO1
10-11
z-Based Test About the Difference in
Means (Variances Known) Continued
Alternative
Reject H
0

if p-value
H
a
:
1

2
> D
0
z > z
o
Area under standard
normal to the right of +z
H
a
:
1

2
< D
0
z < z
o
Area under standard
normal to the left of -z
H
a
:
1

2
D
0
|z| > z
o
/2
*
Twice the are under
standard normal to the
right of |z|
*
Either z > z
o
/2
or z < - z
o
/2
LO1
10-12
10.2 Comparing Two Population
Means by Using Independent
Samples: Variances Unknown
Generally, the true values of the population
variances
1
2
and
2
2
are not known
They have to be estimated from the sample
variances s
1
2
and s
2
2
, respectively
LO 2: Compare two
population means when
the samples are
independent and the
population variances
are unknown.
10-13
Comparing Two Population Means by
Using Independent Samples:
Variances Unknown #2
Also need to estimate the standard
deviation of the sampling distribution of the
difference between sample means
Two approaches:
1. If it can be assumed that
1
2
=
2
2
=
2
, then
calculate the pooled estimate of
2
2. If
1
2

2
2
, then use approximate methods
LO2
10-14
Pooled Estimate of
2
Assume that
1
2
=
2
2
=
2
The pooled estimate of
2
is the weighted averages
of the two sample variances, s
1
2
and s
2
2
The pooled estimate of
2
is denoted by s
p
2

The estimate of the population standard deviation of
the sampling distribution is
( ) ( )
2
1 1
2 1
2
2 2
2
1 1 2
+
+
=
n n
s n s n
s
p
|
|
.
|

\
|
+ = o

2 1
2
1 1
2 1
n n
s
p x x
LO2
10-15
t-Based Confidence Interval for the
Difference in Means (Variances Equal)
Select independent random samples from two normal
populations with equal variances
A 100(1 o) percent confidence interval for the difference in
populations
1

2
is

where

and t
o
/2
is based on (n
1
+n
2
-2) degrees of freedom (df)
( )
(
(

|
|
.
|

\
|
+
o
2 1
2
2 2 1
1 1
n n
s t x x
p
( ) ( )
2
1 1
2 1
2
2 2
2
1 1 2
+
+
=
n n
s n s n
s
p
LO2
10-16
Test Statistic (Variances
Equal)
The test statistic is

where D
0
=
1

2
is the claimed difference
between the population means
The sampling distribution of this statistic is a t
distribution with (n
1
+ n
2
2) degrees of freedom
( )
|
|
.
|

\
|
+

2 1
2
0 2 1
1 1
n n
s
D x x
p
LO2
10-17
t-Based Test About the Difference in
Means (Variances Unknown)
Alternative
Reject H
0

if p-value
H
a
:
1

2
> D
0
t > t
o
Area under t distribution to
the right of +t
H
a
:
1

2
< D
0
t < t
o
Area under t distribution to
the left of -t
H
a
:
1

2
D
0
|t| > t
o
/2
*
Twice the are under t
distribution to the right of |t|
where t
o
, t
o
/2
, and p-values are based on (n
1
+n
2
-2)
degrees of freedom
*
Either t > t
o
/2
or t < - t
o
/2
LO2
10-18
t-Based Confidence Intervals and Tests
for Differences with Unequal Variances
If populations are normal, but sample sizes and
variances differ substantially, small-sample
estimation and testing can be based on these
unequal variance procedure
Confidence interval

Test statistics

( )
2
2
2
1
2
1
/2 2 1
n
s
n
s
t x x +
o
( )
2
2
2
1
2
1
0 2 1
n
s
n
s
D x x
t
+

=
LO2
10-19
t-Based Confidence Intervals and Tests
for Differences with Unequal Variances
#2
For both the confidence interval and
hypothesis test, the degrees of freedom are
equal to
( )
( ) ( )
1 1
2
2
2
2
2
1
2
1
2
1
2
2
2
2 1
2
1

+
=
n
/n s
n
/n s
/n s /n s
df
LO2
10-20
10.3 Paired Difference
Experiments
Before, drew random samples from two
different populations
Now, have two different processes (or
methods)
Draw one random sample of units and use
those units to obtain the results of each
process
LO 3: Recognize when
data come from
independent samples
and when they are
paired.
10-21
Paired Difference
Experiments Continued
For instance, use the same individuals for the
results from one process vs. the results from
the other process
E.g., use the same individuals to compare
before and after treatments
Using the same individuals, eliminates any
differences in the individuals themselves and
just comparing the results from the two
processes
LO3
10-22
Paired Difference
Experiments #3
Let
d
be the mean of population of paired
differences

d
=
1

2
, where
1
is the mean of population 1 and
2
is
the mean of population 2
Let d and s
d
be the mean and standard deviation of
a sample of paired differences that has been
randomly selected from the population
d is the mean of the differences between pairs of values
from both samples
LO3
10-23
t-Based Confidence Interval for
Paired Differences in Means
If the sampled population of differences is
normally distributed with mean
d

A (1- o)100% confidence interval for

d
=
1
-
2
is

where for a sample of size n, t
o/2
is based on
n 1 degrees of freedom
(

o
n
s
t d
d
/2
LO 4: Compare two
population means when
the data are paired.
10-24
Test Statistic for Paired
Differences
The test statistic is

D
0
=
1

2
is the claimed or actual difference between the
population means
D
0
varies depending on the situation
Often D
0
= 0, meaning that there is no difference between
the population means
The sampling distribution of this statistic is a t
distribution with (n 1) degrees of freedom
n
/ s
D d
t=
d
0

LO4
10-25
Paired Differences Testing
Rules
Alternative
Reject H
0

if p-value
H
a
:
d
> D
0
t > t
o
Area under t distribution to
the right of +t
H
a
:
d
< D
0
t < t
o
Area under t distribution to
the left of -t
H
a
:
d
D
0
|t| > t
o
/2
*
Twice the are under t
distribution to the right of |t|
where t
o
, t
o
/2
, and p-values are based on (n-1) degrees of
freedom
*
Either t > t
o
/2
or t < - t
o
/2
LO4
10-26
10.4 Comparing Two Population
Proportions by Using Large,
Independent Samples
Select a random sample of size n
1
from a
population, and let p
1
denote the proportion of units
in this sample that fall into the category of interest
Select a random sample of size n
2
from another
population, and let p
2
denote the proportion of units
in this sample that fall into the same category of
interest
Suppose that n
1
and n
2
are large enough
n
1
p
1
5, n
1
(1 - p
1
)5, n
2
p
2
5, and n
2
(1 p
2
)5
LO 5: Compare two
Population proportions
using large independent
samples.
10-27
Comparing Two Population
Proportions Continued
Then the population of all possible values of
p
1
- p
2
Has approximately a normal distribution if each of
the sample sizes n
1
and n
2
is large
Here, n
1
and n
2
are large enough so n
1
p
1
5, n
1
(1 -
p
1
) 5, n
2
p
2
5, and n
2
(1 p
2
) 5
Has mean
p1 - p 2
= p
1
p
2
Has standard deviation
( ) ( )
2
2 2
1
1 1
1 1
2 1
n
p p
n
p p
p

= o

LO5
10-28
Confidence Interval for the Difference of
Two Population Proportions
If the random samples are independent of
each other, then the following is a 100(1 o)
percent confidence interval for p
1
- p
2
( )
( ) ( )
(
(

o
2
2 2
1
1 1
2 2 1
1 1
n
p

n
p

z p

LO5
10-29
Test Statistic for the Difference
of Two Population Proportions
The test statistic is

D
0
= p
1
p
2
is the claimed or actual difference between the
population proportions
D
0
is a number whose value varies depending on the
situation
Often D
0
= 0, and the null means that there is no difference
between the population means
The sampling distribution of this statistic is a
standard normal distribution
( )
2 1
0 2 1
p

D p

z=

o

LO5
10-30
A Hypothesis Test about the Difference
between Two Population Proportions
Alternative
Reject H
0

if p-value
H
a
: p
1
-p
2
> D
0
z > z
o
Area under standard
normal to the right of z
H
a
: p
1
-p
2
< D
0
z < z
o
Area under standard
normal to the left of z
H
a
: p
1
-p
2
D
0
|z| > z
o
/2
*
Twice the are under
standard normal to the
right of |z|
LO5
10-31
10.5 Comparing Two Population
Variances Using Independent
Samples
Population 1 has variance
1
2
and population 2 has
variance
2
2
The null hypothesis H
0
is that the variances are the
same
H
0
:
1
2
=
2
2
The alternative is that one is smaller than the other
That population has less variable measurements
Suppose
1
2
>
2
2
More usual to normalize
Test H
0
:
1
2
/
2
2
= 1 vs.
1
2
/
2
2
> 1
LO 7: Compare two
population variances
when the samples are
independent.
10-32
Comparing Two Population Variances
Using Independent Samples Continued
Reject H
0
in favor of H
a
if s
1
2
/s
2
2
is significantly
greater than 1
s
1
2
is the variance of a random of size n
1
from a
population with variance
1
2
s
2
2
is the variance of a random of size n
2
from a
population with variance
2
2
To decide how large s
1
2
/s
2
2
must be to reject H
0
,
describe the sampling distribution of s
1
2
/s
2
2
The sampling distribution of s
1
2
/s
2
2
is the F
distribution
LO7
10-33
F Distribution
Shape depends on two parameters: the numerator
number of degrees of freedom (df
1
) and the
denominator number of degrees of freedom (df
2
)
The F is skewed to the right
LO 6: Describe the
properties of the F
distribution and use on
F table.
10-34
F Distribution
The F point F
o
is the point on the horizontal axis
under the curve of the F distribution that gives a
right-hand tail area equal to o
The value of F
o
depends on a (the size of the right-hand
tail area) and df
1
and df
2
Different F tables for different values of o
Tables A.5 for o = 0.10
Tables A.6 for o = 0.05
Tables A.7 for o = 0.025
Tables A.8 for o = 0.01
LO6
10-35