You are on page 1of 56

ETF2121/ETF5912 Data Analysis in Business

Week 5: Estimation and hypothesis testing for two populations

Dr Wei Wei

Monash University

Dr Wei Wei (Monash University) ETF2121/5912 1 / 55


1 Differences in Population Means
Paired Samples vs Independent Samples
Paired Samples
Estimation and sampling distribution
Testing for equal population mean
Independent Samples
Unequal population variance
Equal population variance
Testing for equal population variances
Testing for equal population mean

2 Differences in Population Proportions


Independent Samples with Large Sample Size

Dr Wei Wei (Monash University) ETF2121/5912 2 / 55


Differences in Population Means

Differences of Population Means

Let X1 and X2 denote the variables associated with the first and second
population, respectively.
The population means are µ1 and µ2 the population standard
deviations are σ1 and σ2 .
We are interested in whether µ1 is different from µ2 . In other words,
the parameter of interest is µ1 − µ2 .
We take random samples of size n1 and n2 from the two populations.

Dr Wei Wei (Monash University) ETF2121/5912 3 / 55


Differences in Population Means

Why two populations?

In general, the two populations differ in a key characteristic and we


want to examine if the difference in this characteristic is associated
with a difference in the mean of one target variable.
Examples:
air quality of a city with high household income versus the air quality of
a city with low household income;
the amount of electricity used by freezer brand A versus brand B;
the proportion of voters for a political party in younger versus older
population;
the sales volume before and after an advertising campaign.

Dr Wei Wei (Monash University) ETF2121/5912 4 / 55


Differences in Population Means Samples

Paired samples

Paired samples: when two samples are selected in such a way that
each item in one sample has a corresponding match or related item in
the other sample.
The sample size is common in both samples: n1 = n2 = n.
For example, we may be interested in the effect of a training program
on employee productivity. If we select the same set of employees and
measure their productivity before and after the training program, we
have paired samples.

Dr Wei Wei (Monash University) ETF2121/5912 5 / 55


Differences in Population Means Samples

Independent samples

Independent samples: samples that are completely independent of


one another, meaning that the sample members selected from one
population is not related to the sample members drawn from other
population.
The sample size from the two samples can be different, i.e., n1 may be
different from n2 .
For example, we want to know if the average household income is
higher in Caulfield or Clayton. We can take a sample of residents in
Caulfield and ask them about their income, and take another
independent sample of residents from Clayton and ask them about
their income.

Dr Wei Wei (Monash University) ETF2121/5912 6 / 55


Differences in Population Means Samples

Independent samples vs paired samples

Are the following samples independent or paired?


We want to know if the average daily revenue of restaurant A is the
same as the average daily revenue of restaurant B.
We obtain the daily revenue of both restaurants over the year of 2020.

Dr Wei Wei (Monash University) ETF2121/5912 7 / 55


Differences in Population Means Samples

Independent samples vs paired samples

Samples may be neither paired nor independent.


Suppose that we want to know if the average marks of ETF2121 is
higher/lower/the same as the average marks of ETF2100. Are the
following samples independent or paired or neither?
We select 25 students who attended both units and obtain their marks
from both units.
We randomly select 20 students from the cohort of ETF2121, obtain
their marks, then randomly select 20 students who are not in the
sample for ETF2121 from the cohort of ETF2100 and obtain their
marks.
We randomly select 200 students from the cohort of ETF2121, obtain
their marks, then randomly select 230 students from the cohort of
ETF2100 and obtain their marks.

Dr Wei Wei (Monash University) ETF2121/5912 8 / 55


Differences in Population Means Paired Samples

Transform paired samples into one sample

The key for analyzing paired samples is to take the difference between
the two variables (of the same unit) then proceed as in the one sample
case.
Let D = X1 − X2 denote the difference between the population
variables. We calculate the sample difference d = x1 − x2 for each
paired unit in the paired samples.
Unit X1 X2 D = X1 − X2
1 x11 x12 d1 = x11 − x12
2 x21 x22 d2 = x21 − x22
... ... ...
n xn1 xn2 dn = xn1 − xn2

Dr Wei Wei (Monash University) ETF2121/5912 9 / 55


Differences in Population Means Paired Samples

Properties of the estimator: unbiasedness

The parameter of interest is the difference in population mean


µ1 − µ2 ≡ E (X1 ) − E (X2 ). Since
E (X1 ) − E (X2 ) = E (X1 − X2 ) = E (D), the difference in
population mean is the same as the mean of the population
difference, i.e., µ1 − µ2 = µD where µD ≡ E (D).
The mean of sample difference, defined as
Pn
di
d = i=1 ,
n
is a point estimator of the mean of population difference, µD .
This estimator is unbiased, i.e.,

E (d) = µD .

Dr Wei Wei (Monash University) ETF2121/5912 10 / 55


Differences in Population Means Paired Samples

Properties of the estimator: consistency

Dr Wei Wei (Monash University) ETF2121/5912 11 / 55


Differences in Population Means Paired Samples

Let 2
denote the population variance of D. The sample variance of
σD
d, defined as

Pn 2
i=1 di − d
sd2 = Var (d) = ,
n−1

2.
is an unbiased estimator of the population variance, σD

The sample variance of d, defined as

Var (d) s2
SE (d)2 = Var (d) = = d,
n n

2
is an unbiased estimator of the estimator’s variance σD /n.

Since Var (d) approached zero as the sample size n increases, and
E (d) = µD , d is also a consistent estimator of µD .
Dr Wei Wei (Monash University) ETF2121/5912 12 / 55
Differences in Population Means Paired Samples

Properties of the estimator: sampling distribution

If we do not know the distribution of D but n is large (n > 30), the


sampling distribution of d is approximately normal:
 s2 
d ∼ N µD , d ,
n
or
d − µD
√ ∼ N(0, 1)
sd / n
If D can be assumed to be normal, the standardized d follows a
t-distribution:

d − µD
√ ∼ tn−1
sd / n
Note that if n is large (n > 30), tn−1 is approximately normal.

Dr Wei Wei (Monash University) ETF2121/5912 12 / 55


Differences in Population Means Paired Samples

Confidence interval

Given the sampling distribution of any estimator, you should be able


to derive the confidence interval of the parameter.
A 100(1 − α)% confidence interval estimator of the population mean
difference µD is
d ± zα/2 × SE (d)
or
d ± tα/2,n−1 × SE (d)

Dr Wei Wei (Monash University) ETF2121/5912 13 / 55


Differences in Population Means Paired Samples

Hypothesis testing for equal population mean

Step 1: formulate the hypotheses


the null:
H0 : µD = 0
This is equivalent to µ1 = µ2 .
the alternative is either µ1 is larger than µ2 ,

HA : µD > 0

or µ1 is less than µ2 ,
HA : µD < 0
or µ1 is different from µ2 ,

HA : µD 6= 0

Dr Wei Wei (Monash University) ETF2121/5912 14 / 55


Differences in Population Means Paired Samples

Hypothesis testing

Step 2: specifying a significance level. 1%, 5% and 10% are the


commonly used significance level as before.
Step 3: transform the two paired samples into one sample of
differences, d = x1 − x2 . Determine the sampling distribution of the
test statistic T = s /d√n . Is it tn−1 or N(0, 1)?
d

Step 4: compute d, sd2 , and SE (d). Obtain the value of the test
statistic.
Step 5: make a decision of whether or not to reject the null hypothesis.

Dr Wei Wei (Monash University) ETF2121/5912 15 / 55


Differences in Population Means Paired Samples

Example 5.1

A company attempts to evaluate the effect of a new bonus plan. HR


selected a random sample of 5 salespersons to use this bonus plan for
a trial period.
The weekly sales volume before and after implementing the bonus
plan are shown below:

Assume that the population differences in weekly sales is normally


distributed.
Using a 10% level of significance, do the given sample data support the
claim that the bonus plan has a positive effect on the sales volume?
Dr Wei Wei (Monash University) ETF2121/5912 16 / 55
Differences in Population Means Paired Samples

Example 5.1

Using a 10% level of significance, do the given sample data support the
claim that the bonus plan has a positive effect on the sales volume?

Dr Wei Wei (Monash University) ETF2121/5912 17 / 55


Differences in Population Means Independent Samples

Estimating the mean difference in independent samples

Let X 1 denote the sample mean from the first sample, and X 2 the
sample mean from the second sample, we know that X 1 is an unbiased
estimator of µ1 and X 2 is an unbiased estimator of µ2 .
X1 − X2 is an unbiased point estimator of µ1 − µ2 :

E X 1 − X 2 = µ1 − µ 2 .

Let s12 and s22 denote the sample variance of the first and second
sample, i.e., s12 = Var (x1 ) and s22 = Var (x2 ), they are unbiased
estimators of σ12 and σ22 .
The variance and standard error of X 1 − X 2 depends on whether the
two populations have equal or unequal variance.

Dr Wei Wei (Monash University) ETF2121/5912 18 / 55


Differences in Population Means Independent Samples

Unequal population variance: standard error

If the two population variances are not equal, i.e., σ12 6= σ22 , then

Var X 1 − X 2 = Var (X 1 ) + Var (X 2 )
s12 s2
= + 2,
n1 n2
and s
s12 s2
+ 2.

SE X 1 − X 2 =
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 19 / 55


Differences in Population Means Independent Samples

Unequal population variance: sampling distribution

Now that we know the mean and variance of X 1 − X 2 , we can


standardize it to
  
X1 − X2 − E X1 − X2 X1 − X2 − (µ1 − µ2 )
 = q 2
SE X 1 − X 2 s1 s2
+ 2 n1 n2

If both n1 and n2 are large, the standardized estimator above follows


N(0, 1) regardless of what distribution X1 and X2 follows.
If the population variables X1 and X2 follow normal distributions and
the sample size is small, we use a t-distribution with
 2 2
∗ s1 /n1 + s22 /n2
df = 2 2 2 .
(s1 /n1 ) (s22 /n2 )
n1 −1 + n2 −1

Fractional values of df ∗ are rounded down.


Dr Wei Wei (Monash University) ETF2121/5912 20 / 55
Differences in Population Means Independent Samples

Unequal population variance: confidence interval

A 100(1 − α)% confidence interval of µ1 − µ2 is


s
s12 s2
+ 2

X 1 − X 2 ± tα/2,df ∗ ×
n1 n2
or s
s12 s2
+ 2

X 1 − X 2 ± zα/2 ×
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 21 / 55


Differences in Population Means Independent Samples

Equal population variance

If the population variances are assumed to be equal, the common


variance is estimated by pooling the estimates of the standard
deviation of both the samples
Pn1 2 Pn2 2
2 i=1 (x1i − x 1 ) + i=1 (x2i − x 2 )
sp =
n1 + n2 − 2
(n1 − 1)s1 + (n2 − 1)s22
2
= .
n1 + n2 − 2

Then the variance and standard error of the estimator X 1 − X 2 are
 
 2 1 1
Var X 1 − X 2 = sp + ,
n1 n2
r
 1 1
SE X 1 − X 2 = sp + .
n1 n2
Dr Wei Wei (Monash University) ETF2121/5912 22 / 55
Differences in Population Means Independent Samples

Equal population variance: sampling distribution

The standardized estimator,


 
X 1 − X 2 − (µ1 − µ2 ) X 1 − X 2 − (µ1 − µ2 )
 = q
SE X 1 − X 2 sp 1 + 1 n1 n2

follows either a normal distribution N(0, 1) if both n1 and n2 are large,


or a t-distribution with df = n1 + n2 − 2 if both X1 and X2 are
normally distributed.
A 100(1 − α)% Confidence interval estimator of µ1 − µ2 is

(X1 − X2 ) ± tα/2,df × SE X 1 − X 2

or 
(X1 − X2 ) ± zα/2 × SE X 1 − X 2

Dr Wei Wei (Monash University) ETF2121/5912 23 / 55


Differences in Population Means Independent Samples

Example 5.2

An urban planning group is interested in estimating the difference


between mean household incomes for two suburbs.
Independent samples of households in the suburbs provided the
following results.

Assuming that the population variances are equal, obtain a 95%


confidence interval estimate of the difference in mean incomes
between the two suburbs.

Dr Wei Wei (Monash University) ETF2121/5912 24 / 55


Differences in Population Means Independent Samples

Example 5.2

Dr Wei Wei (Monash University) ETF2121/5912 25 / 55


Differences in Population Means Independent Samples

Testing for equal population variances

In practice, we do not know whether the population variances are


equal or not. We can first use hypothesis testing to determine that.
Step 1:

H0 : σ12 = σ22
HA : σ12 6= σ22

Step 2: specify α.
Step 3: s12 and s22 are unbiased estimators for σ12 and σ22 . If X1 and X2
are both normally distributed, we use the ratio of sample variances as
the test statistic. Under the null that σ12 = σ22 , the test statistic has a
F distribution with n1 − 1 and n2 − 1 degrees of freedom,
s12
∼ F (n1 − 1, n2 − 1)
s22

Dr Wei Wei (Monash University) ETF2121/5912 26 / 55


Differences in Population Means Independent Samples

Testing for equal population variances: F distribution


The F distribution has two degrees of freedom parameters, one for the
numerator and one for the denominator. The numerator degrees of
freedom is always quoted first.
The F distribution is a non-symmetric distribution that is skewed to
the right. Its values are all positive.
F(9,100)
2 F(100,9)
F(9,9)
F(100,100)

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3

Dr Wei Wei (Monash University) ETF2121/5912 27 / 55


Differences in Population Means Independent Samples

Testing for equal population variances

Step 4: compute the value of the test statistic from the sample
denoted by f .
Step 5: Make a decision (of whether or not to reject the null) using
the critical value approach (when the null distribution is asymmetric
and the test is two-sided, the p-value approach is more complicated
and hence omitted).
Find the critical values form lower and upper percentiles of
F (n1 − 1, n2 − 1)
In EXCEL: Fα/2 = F.INV (α/2,n1 − 1,n2 − 1) and F1−α/2 = F.INV
(1 − α/2,n1 − 1,n2 − 1)
Reject if f < Fα/2 or f > F1−α/2

Dr Wei Wei (Monash University) ETF2121/5912 28 / 55


Differences in Population Means Independent Samples

Testing for equality of variance before testing for equality of


means

The sampling distribution of the difference between population means


(based on independent samples) depends on whether or not we can
assume equal population variances or not. Hence, we should first test
for equality of variance before testing for equality of means.
If we can not reject the null that σ12 = σ22 , we assume equal variance.
If we do reject the null, we assume unequal variances.

Dr Wei Wei (Monash University) ETF2121/5912 29 / 55


Differences in Population Means Independent Samples

Hypothesis testing for population mean

Step 1: formulate the hypotheses


the null:
H0 : µ1 − µ2 = 0
the alternative is either µ1 is larger than µ2 ,

HA : µ1 − µ2 > 0

or µ1 is less than µ2 ,
HA : µ1 − µ2 < 0
or µ1 is different from µ2 ,

HA : µ1 − µ2 6= 0

Step 2: specifying a significance level. 1%, 5% and 10% are the


commonly used significance level as before.

Dr Wei Wei (Monash University) ETF2121/5912 30 / 55


Differences in Population Means Independent Samples

Hypothesis testing for population mean

Step 3a: determine the formula for SE (X1 − X2 ) depending on


whether or not the variance is equal.
if the variances are assumed to be unequal,
s
s12 s2
+ 2.

SE X 1 − X 2 =
n1 n2

assuming equal variance,

(n1 − 1)s12 + (n2 − 1)s22


sp2 = ,
n1 + n2 − 2
r
 1 1
SE X 1 − X 2 = sp + .
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 31 / 55


Differences in Population Means Independent Samples

Hypothesis testing for population mean

Step 3b: obtain the test statistic by standardizing the estimator under
the null: 
X1 − X2
T = 
SE X 1 − X 2
Step 3c: determine the null distribution for the test statistic.
If the population variables have unknown distribution but n1 and n2 are
large, the test statistic has a standard normal distribution.
If the population variables follow normal distributions, the test statistic
has a t distribution with degree of freedom determined by

Equal variance: df = n1 + n2 − 2
 2 2
∗ s1 /n1 + s22 /n2
Unequal variance: df = 2
(s1 /n1 )2 (s22 /n2 )2
n1 −1 + n2 −1

Dr Wei Wei (Monash University) ETF2121/5912 32 / 55


Differences in Population Means Independent Samples

Hypothesis testing for population mean

Step 4: compute the value of the test statistic from the sample.
Step 5: make a decision using either the p-value approach or the
critical value approach.

Dr Wei Wei (Monash University) ETF2121/5912 33 / 55


Differences in Population Means Independent Samples

Example 5.3: paired vs independent samples

A tyre manufacturer designed a new tyre and wants to know if the


new design lasts on average longer than the existing tyre. We design
two experiments:
paired samples.
independent samples;

Dr Wei Wei (Monash University) ETF2121/5912 34 / 55


Differences in Population Means Independent Samples

Example 5.3: paired samples

On 20 cars, one of each type of tyre (new and existing) is installed on


the rear wheels. 20 drivers were told to drive in their usual way until
the tyres wore out.
The new-design and existing-design tyres were installed on the same
set of cars and driven by the same drivers — natural paring of
observations.
The data are stored in the W5_01.xlsx.
Do the new-design tyres last on average longer than the existing tyres?

Dr Wei Wei (Monash University) ETF2121/5912 35 / 55


Differences in Population Means Independent Samples

Example 5.3: paired samples

Let µ1 = mean distance to wear-out for the new-design tyres


Let µ2 = mean distance to wear-out for the existing tyres
Step 1: formulate hypotheses, Let µD = µ1 − µ2

H0 : µ D = 0
HA : µ D > 0

Step 2: set significance level to 5%, i.e., α = 0.05.


Step 3: determine test statistic and null distribution:

d
√ ∼ t19
sd / n

Dr Wei Wei (Monash University) ETF2121/5912 36 / 55


Differences in Population Means Independent Samples

Example 5.3: paired samples

Step 4: calculate the test statistic from sample:


4.55
t = 7.22
= 2.82

20

Step 5: if we are using the p-value approach; In EXCEL,


p =1-T.DIST(2.82,19,1)=0.005
Since p < 0.05, we reject the null hypothesis. There is evidence to
support the conclusion that the new-design tyres last on average
longer than the existing tyres.

Dr Wei Wei (Monash University) ETF2121/5912 37 / 55


Differences in Population Means Independent Samples

Example 5.3: independent samples

New-design tyres were installed on the rear (driving) wheels of 20 cars,


and existing design tyres were installed on the rear wheels of another
20 cars.
40 drivers were told to drive in their usual way until the tyres wore out.
The number of kilometers driven by each driver was recorded in
W5_01.xlsx.
Do the new-design tyres last on average longer than the existing tyres?

Dr Wei Wei (Monash University) ETF2121/5912 38 / 55


Differences in Population Means Independent Samples

Example 5.3: independent samples: test for equal variances


Before testing for equal mean, we should first test for equal variance.
Let σ12 = variance in the distance to wear-out for the new-design tyres
Let σ22 = variance in the distance to wear-out for the existing tyres
Step 1:

H0 : σ12 = σ22
HA : σ12 6= σ22

Step 2: set significance level to 5%, i.e., α = 0.05.


Step 3: determine test statistic and null distribution. The ratio of
sample variances has a F distribution with 19 and 19 degrees of
freedom under the null,

s12
∼ F (19, 19)
s22

Dr Wei Wei (Monash University) ETF2121/5912 39 / 55


Differences in Population Means Independent Samples

Example 5.3: independent samples: test for equal variances

Step 4: compute the value of the test statistic from the sample.
243.4
f = = 1.07
226.8
Step 5: In EXCEL:
F0.025 = F.INV (0.025,19,19)=0.396
F0.975 = F.INV (0.975,19,19)=2.526
Since the value from the sample lies within the two percentiles, we do
not reject the null of equal variance. In other words, we will assume
equal variance for testing the differences in mean.

Dr Wei Wei (Monash University) ETF2121/5912 40 / 55


Differences in Population Means Independent Samples

Example 5.3: independent samples: test for equal mean

Step 1: formulate hypotheses,

H 0 : µ1 − µ 2 = 0
HA : µ 1 − µ 2 > 0

Step 2: set significance level to 5%, i.e., α = 0.05.


Step 3: since we are assuming equal variances in mean, we will use the
test statistic for equal variances
(x 1 − x 2 )
t=r   ∼ t(38),
2 1 1
sp n1 + n2

where
(n1 − 1)s12 + (n2 − 1)s22
sp2 =
n1 + n2 − 2
Dr Wei Wei (Monash University) ETF2121/5912 41 / 55
Differences in Population Means Independent Samples

Example 5.3: independent samples: test for equal mean

Step 4 Calculate the test statistic

(20 − 1) 243.41 + (20 − 1) 226.8


sp2 = = 235.1
20 + 20 − 2
s  
2
1 1
SE = sp + = 4.849
20 20
4.4
t = = 0.91
4.849
Step 5: we use the upper percentile since the alternative hypothesis
states the the parameter of interest is greater than a given value.
From Excel, t0.95,38 = 1.686. Since t = 0.91 < 1.686, we DO NOT
reject the null hypothesis. There is no evidence to support the
conclusion that the new-design tyres last on average longer than the
existing tyres.
Dr Wei Wei (Monash University) ETF2121/5912 42 / 55
Differences in Population Means Independent Samples

Example 5.3: Paired vs independent samples: which is


better?

Using paired samples we had enough evidence to conclude that the


new design tyres last longer, but not under independent samples.
Why?
In this example, there are two sources of variation: (i) car drivers and
(ii) tyre brands.
Independent samples lead to more variability in our outcome as
different drivers (for the new and existing tyres) may drive different
ways.
In paired samples, the variation in drivers is eliminated– same drivers
and cars were used in both new and existing tyres samples.

Dr Wei Wei (Monash University) ETF2121/5912 43 / 55


Differences in Population Means Independent Samples

Example 5.3: Paired vs independent samples: which is


better?

Comparing the test statistic from paired and independent samples:


4.4
independent samples: t = = 0.91
4.85
4.55
paired samples: t = = 2.82
1.615
In this example, the numerators are similar, BUT the denominators are
quite different .
Will the paired samples always produce a more significant test statistic
than the independent samples experiment?
The answer is not necessarily.
It depends whether the variation due to drivers is large.

Dr Wei Wei (Monash University) ETF2121/5912 44 / 55


Differences in Population Means Independent Samples

Example 5.3: Paired vs independent samples: ignoring the


pairs

What if we wrongly assumed that the data in paired samples came


from independent samples?
We have

x̄1 = 73.6 x̄2 = 69.05


s1 = 15.58 s2 = 17.79
n1 = 20 n2 = 20

We do not reject the null of equal variance.

Dr Wei Wei (Monash University) ETF2121/5912 45 / 55


Differences in Population Means Independent Samples

Example 5.3: Paired vs independent samples: ignoring the


pairs

The value of the test statistic for equal mean is

(n1 − 1)s12 + (n2 − 1)s22


sp2 =
n1 + n2 − 2
(19)15.582 + (19)17.792
= = 279.61
38
(x 1 − x 2 ) − (µ1 − µ2 ) 73.6 − 69.05
t = r   =q
1 1
 = 0.86
2 1
sp n1 + n2 1 279.61 20 + 20

Since t = 0.86 < 1.686, we fail to reject H0 .


If we wrongly treat the paired samples as independent, we would fail
to reject the null hypothesis.

Dr Wei Wei (Monash University) ETF2121/5912 46 / 55


Differences in Population Proportions Sample Size

Estimation of Differences: Population Proportions

Let π1 and π2 represent the population proportions, p1 = x1 /n1 and


p2 = x2 /n2 the sample proportions.
We know that p1 and p2 are unbiased estimators of π1 and π2 , i.e.,

E (p1 ) = π1 and E (p2 ) = π2 ,

and that
p1 (1 − p1 ) p2 (1 − p2 )
and
n1 n2
are unbiased estimators for Var (p1 ) and Var (p2 ).
The point estimator of the difference between population proportions
π1 − π2 is the difference between sample proportions p1 − p2 .

Dr Wei Wei (Monash University) ETF2121/5912 47 / 55


Differences in Population Proportions Sample Size

Properties of the estimator

The sample difference in proportion is an unbiased estimator for the


population difference in proportion:

E (p1 − p2 ) = π1 − π2

The two samples are independent, then

Var (p1 − p2 ) = Var (p1 ) + Var (p2 )

p1 (1−p1 ) p2 (1−p2 )
and n1 + n2 is an unbiased estimator for Var (p1 − p2 )
Standard error
s
p1 (1 − p1 ) p2 (1 − p2 )
SE (p1 − p2 ) = + .
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 48 / 55


Differences in Population Proportions Sample Size

Sampling distribution

If both n1 and n2 are large, the estimator follows a normal distribution:

p1 − p2 ∼ N (π1 − π2 , Var (p1 − p2 ))


or
(p1 − p2 ) − (π1 − π2 )
Z= q ∼ N(0, 1)
p1 (1−p1 ) p2 (1−p2 )
n1 + n2

A 100(1 − α)% Confidence interval estimator of π1 − π2 is


s
p1 (1 − p1 ) p2 (1 − p2 )
(p1 − p2 ) ± zα/2 × +
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 49 / 55


Differences in Population Proportions Sample Size

Test for Difference in Population Proportions

Step 1: formulate the hypotheses


the null:
H 0 : π1 − π2 = 0
the alternative is either π1 is larger than π2 ,

HA : π1 − π2 > 0

or π1 is less than π2 ,
H A : π1 − π2 < 0
or π1 is different from π2 ,

HA : π1 − π2 6= 0

Step 2: specifying a significance level. 1%, 5% and 10% are the


commonly used significance level as before.

Dr Wei Wei (Monash University) ETF2121/5912 50 / 55


Differences in Population Proportions Sample Size

Test for Difference in Population Proportions

Step 3: determine the null distribution (sampling distribution under


the null) for the point estimator, p1 − p2
Step 3a: Under the null, π1 = π2 = π. In this case, E (p1 − p2 ) = 0,
and the variance of p1 − p2 becomes
 
1 1
Var (p1 − p2 ) = π(1 − π) + ,
n1 n2

where π can be estimated using the pooled proportion


x1 + x2 n1 p1 n2 p2
p= = + .
n1 + n2 n1 + n2 n1 + n2
Hence s  
1 1
SE (p1 − p2 ) = p(1 − p) + .
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 51 / 55


Differences in Population Proportions Sample Size

Test for Difference in Population Proportions

Step 3b: obtain the test statistic by standardizing the estimator under
the null:
p1 − p2
Z=r  
p(1 − p) n11 + n12

If n1 and n2 are large, the test statistic has a standard normal


distribution.
Step 4: compute the value of the test statistic from the sample.
Step 5: make a decision using either the p-value approach or the
critical value approach.

Dr Wei Wei (Monash University) ETF2121/5912 52 / 55


Differences in Population Proportions Sample Size

Example 5.4

In a public opinion survey, 65 out of a sample of 100 high-income


voters (Incomes of at least $100,000) and 35 out of a sample of 75
low-income voters (incomes less than $100,000) supported the
introduction of the flood levy which will go toward assisting residents
effected by flooding during extreme weather events.
Can we conclude at the 5% level of significance that a higher
proportion of high-income voters support the flood levy?

Dr Wei Wei (Monash University) ETF2121/5912 53 / 55


Differences in Population Proportions Sample Size

Example 5.4

Step 1: let π1 denote the proportion of supporters in high-income


voters, and π2 denote the proportion of supporters in low-income
voters,

H0 : π1 − π2 = 0
HA : π1 − π2 > 0

Step 2: α = 0.05.

Dr Wei Wei (Monash University) ETF2121/5912 54 / 55


Differences in Population Proportions Sample Size

Example 5.4

Dr Wei Wei (Monash University) ETF2121/5912 55 / 55

You might also like