You are on page 1of 3

1.017/1.

010 Class 17
Testing Hypotheses about Two Populations

Tests of differences between two populations

To test if two populations x and y are different we can compare specified


distributional properties ax and ay (means, variances, 90 percentiles, etc.).

Null hypothesis:

H0: ax = ay = a0 or ax - ay = 0

The hypothesis test may be based on "natural" (unbiased and consistent)


estimators of ax and ay, derived from the independent random samples x1,
x2,..., xNx and y1, y2,...,yNy :

aˆ x = aˆ x ( x1 , x 2 ,..., x Nx )

aˆ y = aˆ y ( x1 , x 2 ,..., x Ny )

We can derive a two-sided rejection region Ra0 written in terms of a


standardized statistic z , following the same basic procedure as in the
single population case (see Class 15):

(aˆ x - aˆ y ) − (a x − a y )
z (aˆ x , aˆ y , a x , a y ) =
SD[aˆ x - aˆ y ]
(aˆ x - aˆ y )
z (aˆ x , aˆ y , a 0 , a 0 ) =
SD[aˆ x - aˆ y ]

α
R z0 : z (aˆ x , aˆ y , a0 , a0 ) ≤ z L = Fz-1 ( )
2
α
z (aˆ x , aˆ y , a0 , a0 ) ≥ zU = Fz-1 (1 - )
2

For large samples z (aˆ x , aˆ y , a0 , a0 ) has a unit normal distribution if H0 is


true (ax = ay = a0). Use norminv to compute zL and zU from α.

We can also define a rejection region Ra0 written in terms of the


nonstandardized estimates:

1
α
Ra 0 : aˆ x − aˆ y ≤ ∆a L = Fz-1 ( ) SD[aˆ x - aˆ y ]
2
α
aˆ x − aˆ y ≥ ∆aU = Fz-1 (1 - ) SD[aˆ x - aˆ y ]
2

The two-sided p-value is obtained from:

 (aˆ x - aˆ y ) 
1 − p / 2 = Fz [z ] = Fz   aˆ x - aˆ y ≥ 0
 SD(aˆ ) 
 (aˆ x - aˆ y ) 
p / 2 = F z [z ] = F z   aˆ x - aˆ y ≤ 0
 SD(aˆ ) 

aˆ x - aˆ y
For large samples use normcdf to compute p from .
SD[aˆ ]

Special Case: Large sample test of the difference between two means

If the property of interest is the mean then:

H0: ax = E[x] = ay = E[y] or E[x] - E[y] = 0

Natural estimator of E[x] - E[y] is mx- my.

In large sample case mx - my is normal with mean and variance:

E[mx - my] = E[x] - E[y] (unbiased)

σ x2 σ 2y
Var[(m x - m y ) = Var[m x ] + Var[m y ] = + (consistent)
Nx Ny

Construct a large sample test statistic z ~ N(0,1) :

mx - m y mx - m y
z= ≈
σ x2 σ 2y s x2 s 2y
+ +
Nx Ny Nx Ny

Two-sided rejection region written in terms of mx and my:

2
α
Ra0 : m x − m y ≤ ∆a L = Fz-1 ( ) SD[m x - m y ]
2
α
m x − m y ≥ ∆aU = Fz-1 (1 - ) SD[m x - m y ]
2

The two-sided p-value is obtained from:

  s2 2  −1 / 2 
 s 
1 − p / 2 = Fz ( z ) = Fz (m x − m y ) x +
y 
N Ny   mx ≥ m y
  x  
  s2 2  −1 / 2 
 s 
p / 2 = Fz ( z ) = Fz (m x − m y ) x +
y 
N Ny   mx ≤ m y
  x  

Example: Comparing crop yields with and without fertilizer application

Consider two agricultural fields, one that is fertilized and one that is not. Yield
samples (kg/ha) from the two fields are as follows:

Fertilized (x): 66 41 77 80 52 98 99 74 81 78

Not fertilized (y): 65 88 55 124 66 72 96 71

Test the hypothesis H0: Mean yields are the same with and without fertilizer

mx = sx = Nx =
my = sy = Ny =

z= p=

Copyright 2003 Massachusetts Institute of Technology


Last modified Oct. 8, 2003