You are on page 1of 75

LECTURE NOTES NO.

8 M235
Chapter 8
Hypothesis Testing

1
Researchers from all areas, such as engineering,
agriculture and medicine, formulate hypotheses
about a population being studied.

The ultimate goal of the research is to determine


the validity of these hypotheses.

A statistical hypothesis is an assumption about


a population parameter(s). This assumption may
or may not be true.

2
The best way to determine whether a
statistical hypothesis is true would be to
examine the entire population. Since that
is often impractical, researchers examine
a random sample to test the statistical
hypothesis.
Hypothesis testing refers to the formal
procedures to accept or reject statistical
hypotheses.
3
Example: A medical researcher would like
to determine whether a new drug is
effective or not. A random sample of
patients is taken. Half the sample is given
the new drug and the other half is given a
placebo (no medication is given).
Hypothesis testing can be used to conclude
whether the new drug is effective.
There are two types of statistical hypotheses.
Null hypothesis: denoted by H0, is usually the
hypothesis that differences result purely from
chance. (Sampling error)
Alternative hypothesis: The alternative
hypothesis, denoted by H1 or Ha, is the
hypothesis that differences are influenced by
treatment effect in addition to chance variation).

4
Example: Compare two population means
µ1 and µ2
H0: µ1 = µ2
H1: µ1 ≠ µ2
Take a sample from the first population,
the sample mean
X
Take a sample from the first population,
the sample mean
Y
H0: µ1 = µ2 means the difference between the
sample means and is due to chance
X Y
or sampling error
H1: µ1 ≠ µ2 means the difference between the
sample means and is due to treatment
X Y
effect in addition to the sampling error.

5
Guidelines for Establishing H0 and H1
1. Testing research hypothesis
Research hypothesis expressed as H1
Example: It is claimed that a new treatment is
more effective than the standard treatment for
prolonging the lives of terminal cancer patients.
The standard treatment has been in use for a
long time, and from records in medical journals,
the mean survival period is known to be 4.2
years. The new treatments are administered to
100 patients and their duration of survival
recorded, the sample mean is 4.5 years and the
standard deviation is 1.1 years. Is the claim
supported by these results? Test at α=0.05.
µ: mean survival period of cancer patients using
the new drug.
Research hypothesis: The new drug is more
effective than the standard drug.

6
µ: mean survival period of patients using the
new drug.
4.2: mean survival period of patients using the
standard drug.
New drug more effective than the standard drug
Equivalent to mean survival period using new
drug is higher than mean survival period using
standard drug (i.e. µ > 4.2). As a general
guideline, a research hypothesis such as this
should be formulated as the alternative
hypothesis. Thus,
H0: µ ≤ 4.2
H1: µ > 4.2
2 -Testing the Validity of a Claim
The validity of claims that companies made
about their products, the null hypothesis is
generally based on the assumption that the
claim is true.
7
Example: A manufacturer of soft drinks
who states that 2-liter containers of its
products have an average of at least 67.6
fluid ounces. A sample of 2-liter containers
will be selected, and the contents will be
measured to test the manufacture’s claim. In
this type of hypothesis testing situation, we
generally follow the rationale suggested by
the criminal trial analogy. That is, the
manufacture’s claim should be assumed true
(innocent) unless the sample evidence
proves otherwise (guilty). Using this
approach:
H0: µ ≥ 67.6
H1: µ < 67.6

8
3- Testing in Decision-Making Situations
Example: On the basis of a sample of parts
from a shipment that has been just received,
a quality-control inspector must decide
whether to accept or return the shipment to
the supplier because it does not meet
specifications. Assume that specifications
for a particular part indicate a mean
length of 2 inches per part required.
If the average length of the parts is greater
or less than 2-inch standard, the parts will
cause quality problems in the assembly
operation, so
H 0: µ = 2
H1: µ ≠ 2
How to express null hypothesis and
alternative hypothesis in symbolic form?
9
Identify H1 first.
If the original claim of the question uses
words such as
“greater, larger, increased, improved
and so on”,
use “>” for H1.
If it uses words such as
“less, decreased, smaller and so on”,
apply “<” for H1.
If words such as “the same, change,
different/difference and so on”
appear in the claim, use “≠” for H1.
The opposite symbol = will be used
for H0)

10
Types of Errors in Hypothesis Testing

Truth
(for population studied)
H0 True H0 False
Reject Type I Correct
Decision H Error Decision
0
(based
on Fail to
Correct Type II
sample) reject
Decision Error
H0
Two types of errors can result from a
hypothesis test.

Type I error. A Type I error occurs
when the researcher rejects a null
hypothesis when it is true.

The probability of committing a Type I
error is called the significance level.
and is often denoted by α.

11
  P(Reject H 0 | H 0 is true)

Type II error. A Type II error occurs
when the researcher fails to reject a
null hypothesis that is false and is
denoted by β.

  P(Do not reject H o | H 0 is fale)


or

  P(Accept H o | H1 is true)

The probability of not committing a
Type II error is called the Power of the
test.
Power  1    P(Reject H0 | H0 is false)
Power  P(Reject H0 | H1 is true)

Relation between Type I Error (α)


and Type II Error (β)

12
Example: Let X1,X2,…,Xn denote a random
sample of size n = 25 from a normal population
with variance σ2 = 4, whose mean is known to
be either 0 or 1.
Test
H0: µ = 0
H1: µ = 1
Reject H0 if

X  k, k some constant
Take k = 0.4

  P(Reject H 0 | H 0 is true)
  P(X  k |   0)  1  P(X  k |   0)

13
X- k- X-0 k-0
  1  P(  |   0)  1  P(  )
/ n
/ n 2/ 25 2/ 25
5k 5(0.4)
  1  P( Z  )  1  P( Z  )
2 2
  1  P(Z  1)  1 - 0.8413  0.1587
  P(Do not reject H o | H 0 is fale)
  P( X  k |   1)
X- k-
  P(  |   1)
 / n  / n

k -1 0.4 - 1
  P( Z  )  P( Z  )
2 / 25 2 / 25

  P(Z  -1.5)  0.0668


Take k = 0.5

  P(Reject H 0 | H 0 is true)
  P(X  k |   0)  1  P(X  k |   0)

14
X- k- X-0 k-0
  1  P(  |   0)  1  P(  )
/ n / n 2/ 25 2/ 25
5k 5(0.5)
  1  P( Z  )  1  P( Z  )
2 2
  1  P(Z  1.25)  1 - 0.8944  0.1056
  P(Do not reject H o | H 0 is fale)
  P( X  k |   1)
X- k-
  P(  |   1)
/ n / n
k -1 0.5 - 1
  P( Z  )  P( Z  )
2 / 25 2 / 25

  P(Z  -1.25)  0.1056


Note:
k = 0.4 : α = 0.1587, β= 0.0668
k= 0.5 : α = 0.1056, β= 0.1056

15
As α↓, β↑
As β ↓, α↑

 We cannot minimize both types


of error simultaneously.

16
 Type I error can be controlled.
The value of alpha, which is
related to the level of
significance.
 α is the maximum probability that
we have a type I error.
 Usually α = 0.01, 0.05, 0.1

Hypothesis Testing
of a Population Mean (µ)
Let X1, X2,…, Xn be random sample
from normal population with mean μ
2
and variance σ , N (  ,  2
)

17
Set Null Alternative
Hypothesis (H0) Hypothesis (H1)
1
µ = µ0 µ ≠ µ0
2
µ ≥ µ0 µ <µ0
3
µ ≤µ0 µ > µ0

Each makes a statement about how the


populations mean μ is related to a
specified value µ0 (called test value)

18
Z-Test for the Mean (µ)
(σ2 is known)
Sampling from normal.
Step 1. Decide on the significance level, α .
Step 2. Compute the value of the test statistic
with the one sample Z-test:
X  0
Z
/ n
Step 3. Find the critical values using the Z-
table. Write down clearly the rejection region.
Check to see if the value of the test statistic falls
in the rejection region. If it does, then reject Ho .
If it does not fall in the rejection region, do not
reject Ho.
Two-tailed
H0: µ = µ0
H1: µ ≠ µ0

19
- Zα/2 Zα/2
Reject H0 if
z > zα/2 or z < - zα/2
or
Reject H0 if |Z| > Zα/2
Right-tailed
H0: µ = µ0
H1: µ > µ0

20

Reject H0 if Z > Zα

Left-tailed
H0: µ = µ0
H1: µ < µ0

-Zα
Reject H0 if Z < - Zα

21
Example: A company claims to fill its bags
with 50 kg of cement (σ = 0.25 kg). A sample of
10 bags is taken, and found that on an average a
bag weighs 49.75 kg. Use a 1% significance
level to test if the data are consistent with this
claim.
Solution:
Given data:
Standard deviation of the population data σ =
0.25 kg
22
Sample size n = 10 bags
Sample mean = 49.75 kg
Significance level = 1%
Hypotheses formulation:
H0: μ = 50 kg or μ ≥ 50 kg
H1: μ< 50 kg

Z value corresponding to 1 % significance level


(for one tailed testing) is -2.33.
Acceptance and the rejection regions:

23
Since the test statistic is falling in the critical
region the conclusion is “reject the null
hypothesis”. The answer to the problem is data
are not consistent with the company’s claims
Data follows a normal distribution,
σ2 unknown.
One-Sample t-Test

Conditions for the one-sample t-test to be valid


for testing one population mean:

Step 1. Set up the hypotheses as one of:


Two-tailed
H0: µ = µ0
H1: µ ≠ µ0
Right-tailed
H0: µ = µ0
H1: µ > µ0

24
Left-tailed
H0: µ = µ0
H1: µ < µ0
Step 2. Decide on the significance level, α .
Step 3. Compute the value of the test statistic
with the one sample t-test:
X  0
t
s/ n
Step 4. Find the critical values using the t-table.
Write down clearly the rejection region. Check
to see if the value of the test statistic falls in the
rejection region. If it does, then reject Ho . If it
does not fall in the rejection region, do not reject
Ho.

25
Two-tailed
H0: µ = µ0
H1: µ ≠ µ0

-tα/2,n-1 tα/2,n-1
Reject H0 if t > tα/2,n-1 or t < - tα/2,n-1
or

Reject H0 if |t| > tα/2,n-1

26
Right-tailed
H0: µ = µ0
H1: µ > µ0

tα,n-1
Reject H0 if t > tα,n-1

27
Left-tailed
H0: µ = µ0
H1: µ < µ0

-tα,n-1
Reject H0 if t <- tα,n-1
Step 5. State the conclusion in words.
Example: The foundation for a building is
designed to rest on 100 piles and the capacity of
the pile is 80 tons. Nine test piles were driven at
random and the failure loads were recorded. The
results of the 9 tests are given below;

28
Based on these results, at 5% significance level,
find out whether the foundation based on the
piles should be accepted?
Answer
Given data:
Sample size = 9
Sample mean = 85 tons
Sample standard deviation = 6.76 tons
Significance level = 5%

29
Hypotheses formulation:
H0: μ ≤ 80 ton or μ = 80 ton
H1: μ > 80 ton

X  0 85  80
t   2.22
s/ n 6.76 / 9
t – Value corresponding to 5 % significance
level (for one tailed testing) is t0.05,9 =1.86.
Since t=2.2 > 1.86, the conclusion is “reject the
null hypothesis”.

Testing Hypothesis
Traditional Method
(The Rejection Region Method)
Example: A random sample of size n =
100 yields, assume σ=0.5
x  1.1

30
Need to test
H0:  = 1
H1:  ≠ 1
The test statistic

X  0 1.1  1
Z  2
 / n 0.5 / 100
Testing at α = 0.05. The critical values are

Z = 2 > Zα = 1.96, we reject H0.


Testing at α = 0.01. The critical values are

31
- Zα/2 Zα/2
-Z0.005=-2.578 Z0.005=2.578
Z = 2 < Zα = 2.578, we do not reject H0.
At α = 0.05, we reject H0
At α = 0.01, we do not reject H0

The P-Value Method


P-Value is the smallest level at which
H0 is rejected (Observed level of
significance)

32
we make a yes or no decision without
discussing borderline cases.
Using the p-value approach, we reject
H0 if
p-value < α
and do not reject H0 if
p-value ≥ α

33
A researcher wishes to reject the null
hypothesis
Using P-values for a z-test for a Mean µ
Calculate the test statistic

X  0
Z0 
/ n

34
Right-tailed
H0: µ = µ0
H1: µ > µ0

Z0

P-Value = Area in right tail = P(Z > Z0)


Left-tailed
H0: µ = µ0
H1: µ < µ0

Z0

P-Value = Area in left tail = P(Z < Z0)

35
Two-tailed
H0: µ = µ0
H1: µ ≠ µ0

P-Value = 2 [P(Z > |Z0|)]


Remark:

36
Example: A random sample of size n =
100 yields, σ=0.5

x  1.1
Need to test
H0:  = 1
H1:  ≠ 1

37
The test statistic

X  0 1.1  1
Z  2
/ n 0.5 / 100
P-Value = 2 [ P(Z > 2) ] =2[1-P(Z<2)]
=2(1-0.9772) = 0.0456
Example: Suppose that we want to test the
hypothesis with a significance level of .05 that
the climate has changed since industrialization.
Suppose that the mean temperature throughout
history is 50 degrees. During the last 40 years,
the mean temperature has been 51 degrees and
suppose the population standard deviation is
σ=2 degrees. Compute the p-value, what can we
conclude?
H0:  = 50
H1:  ≠ 50
We compute the z:

38
P(Z < 3.16) = 0.992
so that
P-Value = 2[P(Z > 3.16)]=2[1-P(Z < 3.16)]
= 2 [ 1-0.9992] = 0.002
Hypotheses testing on the variance
Test
H0: σ2 = σ02
H1: σ2 ≠ σ02
Test statistic in this case follows chi-square
distribution with n-1 degrees of freedom and is
as given below.

39
Reject H0 if

Otherwise it may be said that the analyst failed


to reject the null hypothesis
Example: Consider a random sample of size
n=100, and the sample standard deviation is s
=0.0063
Test
H0: σ2 = 0.01
Ha: σ2 ≠ 0.01
40
(100  1)(0.0063) 2
 
2
0  0.39
0.01
 2
  2
0.025, 99  129.56
,n1
2

 
2
 2
0.975,99  74.22
1 ,n 1
2

Critical region: Reject H0 if

02  74.22
or

02  129.56
The test statistic value of 0.3903 is much
smaller than the lower critical value, so we

41
reject the null hypothesis and conclude that the
variance is not equal to 0.01.

42
43
44
Comparison of Two Groups or Means
Two independent samples t-test
A common form of scientific
experimentation is the comparison of two
groups (two different treatments, or a
treatment to a control).

45
Testing Hypothesis
Population-1

Let X 1 , X 2 ,..., X n1 be random sample


from normal population with mean μ1
and variance 
2
1 , N ( 1 ,  2
1 )

Population-2

Let Y1 , Y2 ,..., Yn1 be random sample from


normal population with mean μ2 and
variance  ,
2
2
N ( 2 ,  ) 2
2

The results to compare two groups are usually


summarized into the sample means.
n1 n2

X i Y i
X  i 1
, Y  i 1
n1 n2
How do you decide if the observed differences
between the sample means of the two groups
X and Y are real or just a chance
46
difference caused by the natural variation
(sampling error)?
A common way to approach that question is
by performing a statistical analysis.
The Independent Groups t-test is designed
to compare means between two groups where
there are different subjects in each group
If you have two samples from different
populations and you wish to test if their means
(µ1 and µ2 ) are different, you need a number
that measures their difference. X - Y is an
estimator of (µ1-µ2). It is possible to combine all
the information into one statistic by working with
the distribution of X -Y
Assuming equal variances


( 1
2
  2
2   2
)
The estimate of the common variance can be
obtained by

47
Where
n1

 i
( x  x ) 2

s12  i 1

n1  1
n2

(y i  y) 2
s 22  i 1

n2  1

This estimate is called pooled estimate. It


2 2
s
represents the weighted average of 1 and s2
Note that n1+n2-2 is the degrees of freedom for
the two samples taken together. Therefor the test
statistic is

48
X Y
t
1 1
sp 
n1 n2
which has t distribution with n1+n2-2 d.f
Two-tailed
H0: µ1 = µ2
H1: µ1 ≠ µ2

-tα/2,n1+n2-2 tα/2, n1+n2-2

Reject H0 if t > tα/2,n1+n2-2


49
or t < - tα/2, n1+n2-2
OR

Reject H0 if |t| > tα/2,n1-n2-2


Right-tailed
H0: µ1 = µ2
H1: µ1 > µ2

tα, n1+n2-2
Reject H0 if t > tα, n1+n2-2

50
Left-tailed
H0: µ1 = µ2
H1: µ 1 < µ2

-tα, n1+n2-2
Reject H0 if t <- tα, n1+n2-2

51
Example – Consider the following two
independent samples

sample 1 sample 2
51.0 54.0
53.3 56.1
55.6 52.1
51.0 56.4
55.5 54.0
53.0 52.9
52.1

Test the equality of the two population


means

H0: µ1 = µ2
H1: µ1 ≠ µ2

x  53.07, y  54.25, s1  1.91, s2  1.71

52
(7  1)(1.91)2  (6  1)(1.71)2
sp  1.82
762

The test statistic


X Y 53.07  54.25
t   1.16
1 1 1 1
sp  1.82 
n1 n2 7 6

-tα/2,n1+n2-2 tα/2, n1+n2-2


-t.025,11=-2.201 t0.025, 11=2.201

53
t = |-1.16| = 1.16 not greater than 2.201, we do
not reject H0. We conclude that no significant
difference between the two means

Test concerning two means


assuming unequal variances

( 1
2
  2
2 )

Test
H0: μ1 = μ2
H1: μ1 ≠ μ2
For the above hypotheses, the test statistic

is t-distributed with the degrees of freedom


as given below;

54
Hypotheses testing on two variances
F Distribution
Properties of F distribution
1. F distribution is continuous distribution.
2. F distribution ranges from 0 to ∞.
3. F distribution depends on two parameter
v1 and v2 (v and v =1,2,3,…).
4. F-distribution is positively skewed and
with the increase in the degrees of freedom
ν1 and ν2, its skewness decreases

55
56
57
Test
H0: σ12 = σ22
H1: σ12 ≠ σ22
The test statistic,

follows F-distribution with


n1 – 1 and n2 – 1 degrees of freedom.

58
Reject H0
if F > Fα/2,n1-1,n2-1
or F < F 1-α/2,n1-1,n2-1

Remark:
1
F  
1 , n1 1, n2 1 F
2
, n2 1, n1 1
2

Example: Given the following summary


statistics concerning

mean ni si
Sample 1 3.6 4.5 3.9 4.3 4.075 4 0.403
Sample 2 4.6 5.2 4.9 4.900 3 0.300

Test at α=0.05
H0: σ12 = σ22
59
2 2
H1: σ1 ≠ σ2
Calculation of the test statistic
s12 (0.403) 2
F0  2  2
 1.805
s2 (0.3)

Fα/2,n1-1,n2-1=F0.025,4-1,3-1=F0.025,3,2=39.2
1 1 1
F      0.0625
1 , n1 1, n2 1 F F0.025, 2,3 16
2
, n2 1, n1 1
2
Reject H0
if F0 =1.805 is not less than 0.0625
or greater than 39.2, then we do not reject H0,
we will accept the assumption that the two
variances (standard deviations) are equal.

Paired Samples t-test


Compare the two treatments under the same,
similar or identical conditions

60
Instead of two independent groups of
patients, each patient is administered both
drugs at different times, with an intervening
washout period to minimize carry-over
effects, we randomly choose which
treatment is administered first for each
patient, this is called paired design. The
idea the drugs are compared under similar
conditions
Data Diagram for a Paired Design
Patient Drug 1 Drug 2
1 X1 Y1
2 X2 Y2
. . .
. . .
. . .
n Xn Yn

61
Example of Paired Design
Testing wither a diet program is effective
for weight loss
Weight of the person before the program
(X), Weight of the same person after the
program (Y).This gives paired data
Paired data (dependent samples)
Paired Samples or dependent samples (Paired
Design)
The usual analysis is based on the
differences between X and Y values

62
The structure of paired data is:
Subject No. Treatment 1 Treatment 2 Difference
1 X1 Y1 d1 = x1 –y1
2 X1 Y1 d2 = x2 –y2
. . . .
. . . .
. . . .
n X1 Yn dn = xn –yn

The differences d1,d2,…,dn represent a random


sample with mean d and standard deviation
sd , where

63
n

d i
d i 1
n
and
n

 i
( d  d ) 2

sd  i 1
n 1
n

 i
d 2
 n ( d ) 2

sd  i 1
n 1
By taking the differences d1,d2,…, dn we
minimize the variations between the subjects
and focus on the effects of treatments.
Let d1,d2,…,dn be a random sample from
N (d ,  d2 ) with unknown  d2 . Then the
64
sampling distribution of d is
approximately normal with mean and
standard deviation:
d
d  d and  d 
n
Where
µd = mean of the paired differences for the
population (µd =µ1-µ2)
σd = standard deviation of the paired
differences for the population

Result
d - d
t  t - dist. with d.f.  n - 1
sd / n
Two-tailed

65
H0: µd = 0 (corresponds to µ1= µ2)
H1: µd ≠ 0 (corresponds to µ1≠ µ2)

-tα/2,n-1 tα/2,n-1
Reject H0 if t > tα/2,n-1 or t < - tα/2,n-1
or
Reject H0 if |t| > tα/2,n-1

66
Right-tailed
H0: µd = 0 (corresponds to µ1= µ2)
H1: µd > 0 (corresponds to µ1> µ2)

tα,n-1
Reject H0 if t > tα,n-1
Left-tailed
H0: µd = 0 (corresponds to µ1= µ2)
H1: µd < 0 (corresponds to µ1< µ2)

67
-tα,n-1
Reject H0 if t <- tα,n-1
Example 1: The sleep hours of 5 patients before
and after taking a medication are given by the
following table

Subject No. Before After


1 6 9

2 5 4

3 7 9
4 4 7
5 5 6

68
Can you conclude that the medication is
effective in increasing the sleep hours? Use
alpha = 0.01.
Solution: Need to test
H 0: µ d = 0
H1: µd < 0 (µ1<µ2, increase)

First, we compute mean d and sd

Subject No. Before (X) After (Y) d d2


1 6 9 -3 9

2 5 4 1 1

3 7 9 -2 4
4 4 7 -3 9
5 5 6 -1 1

69
n

d i
8
d i 1
  1.6
n 5
n

 i
d 2
 n ( d ) 2
24  5(1.6) 2
sd  i 1
 1.67
n 1 5 1
The value of the test statistic

- 1.6 - 0
t  -2.14
1.67 / 5
The cut-of-value = -tα,n-1 = -t0.01,4 = -3.747

Note -2.14 not less than -3.747, we do


not reject H0. There is no evidence that
the medication is effective in increasing
the sleep.

70
Example 2: Drinking Water

Trace metals in drinking water affect the flavor


and an unusually high concentration can pose a
health hazard. Ten pairs of data were taken
measuring zinc concentration in bottom water
and surface water. Does the data suggest that the
true average concentration in the bottom water
exceeds that of surface water ?
Perform a paired t-test for the above trace metal
example
Need to test
H 0: µ d = 0
H1: µd > 0 (mean bottom >mean surface)

71
Location Zinc concentration Zinc concentration
In bottom water In surface water d d2
1 0.430 0.415 0.015 0.000225

2 0.266 0.238 0.028 0.000784

3 0.567 0.390 0.177 0.031329

4 0.531 0.410 0.121 0.014641

5 0.707 0.605 0.102 0.010404

6 0.716 0.609 0.107 0.011449

7 0.651 0.632 0.019 0.000361

8 0.589 0.523 0.066 0.004356

9 0.469 0.411 0.058 0.003364

10 0.723 0.612 0.111 0.012321

Total 0.8040 0.08923

0.8040
d  0.0804
10

72
n

d i
2
 n( d ) 2
sd  i 1

n 1

0.08923  10(0.0804) 2
sd   0.052
10  1
The value of the test statistic
0.0804
t  4.89
0.052 / 10

The cut-of-value = tα,n-1 = t0.05,9 = 1.833


t = 4.89>1.833, reject H0, we conclude that, on
average, the bottom zinc concentration is higher
than the surface zinc concentration.

73
Comparing More Than Two Means:
One-Way ANOVA
Completely Randomized Design

 Suppose the only predictors in an analysis are


categorical (i.e. all predictors are factors).
- Predicting height from sex (M/F)
- Predicting white blood cell count from
treatment group (A,B,C)
- Predicting house cost from region
(N,E,S,W)
Independent variable = Categorical predictor is
called Factor
The r different values or levels of the factor are
called the treatments.

 If only 1 categorical predictor of a


continuous response →One-Way
ANOVA
μ1 μ2 μ3

74
Summary test for means

75

You might also like