You are on page 1of 53

HYPOTHESIS TESTING

COMPSTAT Group
www.compstatgroup.com
C
O
M
P
S
T
A
T

G
r
o
u
p
Key Concepts
t Test
Chi-square Test
Cochran-Mantel-Haenszel Test
Contents
C
O
M
P
S
T
A
T

G
r
o
u
p
Null Hypothesis
There is no (statistically) significant difference between male and female
students with respect to their math achievement
Alternative Hypothesis
There is a (statistically) significant difference between male and female
students with respect to their math achievement
Type I error
Reject the null hypothesis when it is true. Probability of type I error is
denoted by
Type II error
Fail to reject the null hypothesis when it is false. Probability of type II
error is denoted by
Hypothesis Testing
C
O
M
P
S
T
A
T

G
r
o
u
p
Power of the test
To be able to reject the null hypothesis when it is true, denoted by 1-
Variation
Chance variation
Effect variation
The difference that we might find between the boys, and girls' exam
achievement in our sample might have occurred by chance, or it might exist in
the population
P Value
Probability of getting a result as extreme as or more extreme than the one observed
when the null hypothesis is true .
When our study results in a probability of 0.01We say that the likelihood of getting the
difference we found by chance would be 1 in a 100.
The smaller the P value the greater the evidence we have against the null hypothesis.
What we usually do is compare the p-value with some pre-specified (a-priori) value
which we call a (the significance level). If p is less than a, we reject the null hypothesis
and accept the alternative hypothesis.
Hypothesis Testing
C
O
M
P
S
T
A
T

G
r
o
u
p
Steps in testing of hypothesis
State null hypothesis and the alternative hypothesis
Select the level of significance
Select a sample and calculate its mean
Calculate the test statistic
Compare the calculated test statistic with tabulated
If calculated value > tabulated value
then reject the Null Hypothesis at that level of significance and for
that degrees of freedom. Otherwise do not reject it.
We can also compute the corresponding p-values -
If p-value < (level of significance) then reject Ho
Hypothesis Testing
C
O
M
P
S
T
A
T

G
r
o
u
p
Let xi (i=1,2,3.n) be a random sample of size n
from a normal population with mean and variance

2
then students t is defined by the statistic
with (n-1) d. f.
where x is sample mean and S
2
is unbiased estimate
of the population variance.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
assumptions
The parent population from which the sample is
drawn is normal
T he sample observations are independent i.e. the
sample is random
The population standard deviation is unknown
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Types of t Test
One sample t Test
t Test for two independent (uncorrelated) samples
Equal variance
Unequal variance
t Test for paired (correlated) samples
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
One Sample t test
Compare the mean of a single group of
observation with a specified (hypothetical) value.
Assumptions:
Subjects are randomly drawn from a population and the
distribution of the mean being tested is normal.
Standard deviation is unknown.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
How to test
Let Xi (i=1, 2n) be a sample of size n from a population with specified
mean.
H
o
: There is no significant difference between sample mean and
population mean.
H
1
: Sample mean is not equal to population mean (Two tailed test)
Or
H
1
: Sample mean >Population mean (Right tailed test)
Or
H
1
: Sample mean < Population mean. (Left tailed test)
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Example: One sample t test
data time;
input time @@;
datalines;
43 90 84 87 116 95 86 99 93 92
121 71 66 98 79 102 60 112 105 98
;
run;
proc ttest h0=80 alpha=0.1;
var time;
run;
The VAR statement indicates that the time variable is being studied, while
the H0= option specifies that the mean of the time variable should be
compared to the value 80 rather than the default null hypothesis of 0.
This ALPHA= option requests 10% confidence intervals rather than the
default 5% confidence intervals.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Test Statistic
Then compare this calculated value with the tabulated value.
If t (calculated) > t (tabulated ) then reject H
o
If t (calculated) < t (tabulated) then accept H
o
Example:
Comparison of mean dietary intake of a particular group of individuals
with the recommended daily diet.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
t Test
Output
Summary statistics appear at the top of the output. The sample size (N), the mean and its
confidence bounds (Lower CL Mean and Upper CL Mean), the standard deviation and its
confidence bounds (Lower CL Std Dev and Upper CL Std Dev), and the standard error are
displayed with the minimum and maximum values of the time variable. The test statistic, the
degrees of freedom, and the p-value for the t test are displayed next; at the 10% -level, this
test indicates that the mean length of the court cases are significantly different from 80 days
(t=2.30, p=0.0329).
13
C
O
M
P
S
T
A
T

G
r
o
u
p
Comparison of two independent means
A t test is used when we wish to compare two means
Assumptions
The samples are random and independent of each other.
Parent population from which samples are coming is normally distributed.
The variances are equal in both the groups if variances are not equal , modified
t test is used.
Normality of data is tested by using normality test such as Shapiro-Wilk
test and Kolmogorov Smirnov test.
Equality of variance is tested either by F test ,Levenes test or Bartletts
test.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Type of data required
Independent variable one nominal variable with two
levels e.g. Boy/girl student, non smoking /heavy smoking
mothers
Dependent variable Continuous variable e.g. Marks
obtained by the students in annual exam , birth weight of
baby
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
How to test
Let X
i
(i=1, 2n
1
) and Y
j
(j=1, 2n
2
) be two samples of size n
1
and n
2
H
o
: There is no significant difference in means of two groups
H
1
: There is significant difference in means of two groups (Two tailed
test)
Or
H
1
: mean of first group > mean of second group (Right tailed test)
Or
H
1
: mean of first group < mean of second group (Left tailed test)
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
t Test
Test Statistic
With equal variance
Where S
p
is pooled variance of two groups
17
C
O
M
P
S
T
A
T

G
r
o
u
p
t Test
With unequal variance
Then compare this calculated value with the tabulated value.
If t (calculated) > t (tabulated ) then reject Ho
If t (calculated) < t (tabulated) then accept Ho
18
C
O
M
P
S
T
A
T

G
r
o
u
p
19
In SAS PROC TTEST is used to calculate the t statistic for two independent
sample.
In SAS output
If Prob F > 0.05 consider Prob T corresponding to Equal Variances.
If Prob F <=0.05 consider Prob T corresponding to Unequal Variances.
The test statistic is a students t-test with N-2 degrees of freedom, where N
is the total number of subjects.
A low p-value indicates evidence to reject the null hypothesis in favor of the
alternative. In other words, there is evidence that the means are not equal.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
The syntax for the TTEST procedure is
PROC TTEST <options>;
CLASS variable; <statements>;
The CLASS statement specifies the grouping variable for the analysis.
The data for this grouping variable must contain two and only two
values.
Example:
PROC TTEST;
CLASS GROUP;
VAR SCORE;
RUN;
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Example
Data Hospital;
input gender days;
lines;
1 13
1 15
1 9
1 18
1 11
1 20
1 24
1 22
1 25
2 11
2 14
2 10
2 8
2 16
2 9
2 17
2 21
RUN;
proc format;
value gender 1='male'
2='female';
run;
proc ttest;
class gender;
var days;
title 'two sample t-test for the difference
between the two population means';
run;
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Output 1
Output 2
Output3
22
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Paired t test
Used to compare means on the same or related subject over time or
in differing circumstances; subjects are often tested in a before-
after situation .
Same individuals are studied more than once in different
circumstances.
e.g. Measurements made on the same people before and after
intervention
The outcome variable should be continuous.
The difference between pre - post measurements should be
normally distributed.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Test Statistic
Suppose we want to test the efficacy of a particular drug for inducing sleep.
Let Xi and Yi (i=1,2n) be the reading in hours of sleep on the ith individual before and
after the drug is given respectively.
H
o
: There is no effect of drug on sleeping hours
H
1
: The new drug increases the sleeping hours.
calculate the test statistics t and
If t (calculated) > t (tabulated )
Then reject Ho and conclude that the new drug increases sleeping hours.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
If t (calculated) > t (tabulated)
Then accept Ho and conclude that there is no effect of the drug
on the sleeping hours.
Note: We can also compute the corresponding p-values and can
conclude in following manner
If p-value > 0.05 then accept Ho and conclude that there is no
effect of the new drug on sleeping hours.
If p-value < 0.05 then reject Ho and conclude that the new drug
increases the sleeping hours.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
If one wants to do paired t test with PROC MEANS in SAS, create a new
variable (say diff) in the DATA step, subtracting one tested variable from
the other.
Example
PROC MEANS;
BY GROUP;
VAR DIFF;
RUN;
If one wants to do paired t test with PROC TTEST in SAS, use paired
statement . The CLASS and VAR statements cannot be used with the
PAIRED statement.
Example
PROC TTEST';
PAIRED BEFORE *AFTER;
RUN;
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Example:
Data sleeping_pill;
input subject pill placebo;
lines;
1 7.3 6.8
2 8.5 7.9
3 6.4 6.0
4 9.0 8.4
5 6.9 6.5
RUN;
Using Paired option of proc t test:
proc ttest;
paired pill*placebo;
title 'paired t-test for the difference between the two population means';
run;
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Getting p value by calculating difference
data compare;
set sleeping_pill;
difference=pill-placebo;
run;
proc ttest data=compare;
var difference;
title 'An alternative paired t-test: t-test on the differences';
Using proc univariate
proc univariate data=compare;
var difference;
title 'Univariate test to get signed rank test and sign test';
run;
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Output 1:
Paired t-test for the difference between the two
population means
The TTEST Procedure
Output 2:
29
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
Syntax :
PROC TTEST < options > ;
CLASS variable ;
PAIREDvariables ;
BY variables ;
VAR variables ;
FREQ variable ;
WEIGHT variable ;
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
PROC TTEST < options > ;
The following options can appear in the PROC TTEST statement.
ALPHA=p
specifies that confidence intervals are to be 100(1-p)% confidence intervals, where 0<p<1.
By default, PROC TTEST uses ALPHA=0.05. If p is 0 or less, or 1 or more, an error
message is printed.
CI=EQUAL
CI=UMPU
CI=NONE
specifies whether a confidence interval is displayed for and, if so, what kind. The
CI=EQUAL option specifies an equal tailed confidence interval, and it is the default. The
CI=UMPU option specifies an interval based on the uniformly most powerful unbiased
test of . The CI=NONE option requests that no confidence interval be displayed for . The
values EQUAL and UMPU together request that both types of confidence intervals be
displayed. If the value NONE is specified with one or both of the values EQUAL and
UMPU, NONE takes precedence.
COCHRAN
requests the Cochran and Cox (1950) approximation of the probability level of the
approximate t statistic for the unequal variances situation.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
DATA=SAS-data-set
names the SAS data set for the procedure to use. By default, PROC TTEST uses the most
recently created SAS data set. The input data set can contain summary statistics of the
observations instead of the observations themselves. The number, mean, and standard
deviation of the observations are required for each BY group (one sample and paired
differences) or for each class within each BY group (two samples).
H0=m requests tests against m instead of 0 in all three situations (one-sample, two-sample,
and paired observation t tests). By default, PROC TTEST uses H0=0.
CLASS variable - A CLASS statement giving the name of the classification (or grouping)
variable must accompany the PROC TTEST statement in the two independent sample cases.
It should be omitted for the one sample or paired comparison situations. If it is used without
the VAR statement, all numeric variables in the input data set (except those appearing in the
CLASS, BY, FREQ, or WEIGHT statement) are included in the analysis. The class variable
must have two, and only two, levels. PROC TTEST divides the observations into the two
groups for the t test using the levels of this variable. You can use either a numeric or a
character variable in the CLASS statement.
BY variables -You can specify a BY statement with PROC TTEST to obtain separate analyses
on observations in groups defined by the BY variables. When a BY statement appears, the
procedure expects the input data set to be sorted in order of the BY variables.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
FREQ variable -The variable in the FREQ statement identifies a variable that contains the
frequency of occurrence of each observation. PROC TTEST treats each observation as if it
appears n times, where n is the value of the FREQ variable for the observation. If the value is
not an integer, only the integer portion is used. If the frequency value is less than 1 or is missing,
the observation is not used in the analysis. When the FREQ statement is not specified, each
observation is assigned a frequency of 1. The FREQ statement cannot be used if the DATA=
data set contains statistics instead of the original observations.
PAIRED PairLists -The PairLists in the PAIRED statement identifies the variables to be compared in
paired comparisons. You can use one or more PairLists. Variables or lists of variables are
separated by an asterisk (*) or a colon (:). The asterisk requests comparisons between each
variable on the left with each variable on the right. The colon requests comparisons between the
first variable on the left and the first on the right, the second on the left and the second on the
right, and so forth. The number of variables on the left must equal the number on the right when
the colon is used. The differences are calculated by taking the variable on the left minus the
variable on the right for both the asterisk and colon. A pair formed by a variable with itself is
ignored. Use the PAIRED statement only for paired comparisons. The CLASS and VAR
statements cannot be used with the PAIRED statement.
Examples of the use of the asterisk and the colon are shown in the following table.
33
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
VAR variables
-The VAR statement names the variables to be used in the
analyses. One-sample comparisons are conducted when the VAR
statement is used without the CLASS statement, while group
comparisons are conducted when the VAR statement is used with a
CLASS statement. If the VAR statement is omitted, all numeric variables
in the input data set (except a numeric variable appearing in the BY,
CLASS, FREQ, or WEIGHT statement) are included in the analysis. The
VAR statement can be used with one- and two-sample t tests and cannot
be used with the PAIRED statement.
WEIGHT variable -
The WEIGHT statement weights each observation in
the input data set by the value of the WEIGHT variable. The values of the
WEIGHT variable can be non-integral, and they are not truncated.
Observations with negative, zero, or missing values for the WEIGHT
variable are not used in the analyses. Each observation is assigned a
weight of 1 when the WEIGHT statement is not used. The WEIGHT
statement cannot be used with an input data set of summary statistics.
t Test
C
O
M
P
S
T
A
T

G
r
o
u
p
CHI-SQUARE TEST
Chi-square test is used to test the
independence of two variables.
The chi-square test assumes that the
frequency for each cell is five or higher for
2 X 2 contingency table.
However, if this assumption is not met
then we have to go for Fisher's exact test.
C
O
M
P
S
T
A
T

G
r
o
u
p
Chi-square Test
Expected= (row total * column total)/N;
Tabulated results of the Observed and Expected frequency
237 154 83 Total
(77.97) (42.03) 120 83 37 Girls
(76.02) (40.97) 117 71 46 Boys
Not in
trouble
(Expected)
Got into
trouble
(Expected)
Total Not in
trouble
(Observed)
Got into
trouble
(Observed)
237 154 83 Total
(77.97) (42.03) 120 83 37 Girls
(76.02) (40.97) 117 71 46 Boys
Not in
trouble
(Expected)
Got into
trouble
(Expected)
Total Not in
trouble
(Observed)
Got into
trouble
(Observed)
C
O
M
P
S
T
A
T

G
r
o
u
p
Expected values are the results we would
expect if the null hypothesis were true.
we calculate the difference between the
observed (O) and expected (E) frequency in
each cell, square that difference, and then
divide that by the Expected value.
chi-square= sum ((O - E)
2
/E)
Degrees of freedom= (r-1)(c-1)
C
O
M
P
S
T
A
T

G
r
o
u
p
What is the difference between a Chi-square test and Fishers exact
test?
Both test the association between two categorical variables.
The difference is that the Chi-square test requires the
expected cell counts in the crosstabulation of these two
categorical variables to be larger than 5. When this
assumption fails Fisher's exact test is recommended.
We can use Fisher's Exact Test when one of the cells in the
table has a zero in it.
Fisher's Exact Test is also very useful for highly
imbalanced tables. If one or two of the cells in a two by two
table have numbers in the thousands and one or two of the
other cells has numbers less than 5, we can still use
Fisher's Exact Test.
C
O
M
P
S
T
A
T

G
r
o
u
p
Fisher's Exact Test can be calculated with PROC FREQ by
specifying the Exact option in the TABLES statement.
How do you interpret the p-value of a Chi-square test or
Fishers exact test?
We begin by assuming there is no association between the
two (categorical) variables. In technical terms this is called
the null hypothesis. The alternative hypothesis would state
the two variables are associated is some way.
The p-value of a Chi-square test or Fishers exact test tells
us the likelihood of getting more extreme results than what
we got. If our assumption is correct then a p-value of 0.01
would suggest the chance of getting more extreme results
than we currently got is very small. In this case we have
evidence to suggest our assumption of no association is not
correct. Hence it would be reasonable to claim there is an
association between the two variables.
C
O
M
P
S
T
A
T

G
r
o
u
p
C
O
M
P
S
T
A
T

G
r
o
u
p
C
O
M
P
S
T
A
T

G
r
o
u
p
C
O
M
P
S
T
A
T

G
r
o
u
p
WHEN & WHY
The Cochran-Mantel-Haenszel (CMH) test compares two groups
on a binary response (yes, no ), adjusting for control variables.
The initial data are represented as a series of K 2x2 contingency
tables, where K is the number of strata. Traditionally, in each
table the rows correspond to the "Treatment group" values (e.g
"Placebo", "Drug A") and the column to the "Response" values (e.g
"No change," "Improvement").
The null hypothesis is that the response is conditionally
independent of the treatment in any given strata.
The stratification of the subjects into K groups (according to the
values of controlled variables - e.g. "Age group") increases the
power of the test to detect association. This increase in power
comes from comparing like subjects to like subjects
C
O
M
P
S
T
A
T

G
r
o
u
p
Layout
Assume there are k strata (k>=2). Within
Stratum j, there are N
j
patients (j= 1, 2,...,
k), randomly allocated to one of the two
groups. In group 1, there are n
j1
patients,
X
j1
of whom are considered responders.
Similarly, group 2 has n
j2
patients with
X
j2
responders.
C
O
M
P
S
T
A
T

G
r
o
u
p
Cochran-Mantel-Haenszel Test
Assume there are k strata (k>=2). Within Stratum j, there are N
j
patients (j= 1, 2,..., k), randomly
allocated to one of the two groups. In group 1, there are n
j1
patients, X
j1
of whom are
considered responders. Similarly, group 2 has n
j2
patients with X
j2
responders.
Layout
C
O
M
P
S
T
A
T

G
r
o
u
p
Let p
1
and p
2
denote the overall response rates for Group1 and
Group2 respectively for stratum j ,compute the quantities
num
j
=(X
j1
.n
j2
-X
j2
.n
j1
)/N
j
and
den
j
=n
j1
.n
j2
.(X
j1
+X
j2
).(N
j
-X
j1
-X
j2
)/N
j2
.(N
j
-1)and calculate the
Test Statistic =(num
j
)2/ den
j
Compare the calculated value of test statistic with tabulate and
accordingly reject or do not reject the null hypothesis.
In SAS CMH can be calculated with PROC FREQ by specifying
the CMH option in the TABLES statement.
C
O
M
P
S
T
A
T

G
r
o
u
p
Example : Computing Cochran-Mantel-Haenszel Statistics for a Stratified
Table
The data set Migraine contains hypothetical data for a clinical
trial of migraine treatment. Subjects of both genders receive
either a new drug therapy or a placebo. Their response to
treatment is coded as 'Better' or 'Same'. The data are recorded as
cell counts, and the number of subjects for each treatment and
response combination is recorded in the variable Count.
data Migraine;
input Gender $ Treatment $ Response $ Count @@;
datalines;
female Active Better 16 female Active Same 11 female Placebo
Better 5 female Placebo Same 20 male Active Better 12
male Active Same 16 male Placebo Better 7 male Placebo
Same 19 ;
C
O
M
P
S
T
A
T

G
r
o
u
p
The following statements create a three-way table
stratified by Gender, where Treatment forms the rows and
Response forms the columns. The CMH option produces the
Cochran-Mantel-Haenszel statistics. For this stratified 22
table, estimates of the common relative risk and the
Breslow-Day test for homogeneity of the odds ratios are
also displayed. The NOPRINT option suppresses the
display of the contingency tables. These statements
produce output 1 through output3.
proc freq data=Migraine;
weight Count;
tables Gender*Treatment*Response / cmh noprint;
title1 'Clinical Trial for Treatment of Migraine Headaches';
run;
C
O
M
P
S
T
A
T

G
r
o
u
p
C
O
M
P
S
T
A
T

G
r
o
u
p
For a stratified 22 table, the three CMH
statistics displayed in output1 test the
same hypothesis.
The significant p-value (0.004) indicates
that the association between treatment
and response remains strong after
adjusting for gender.
C
O
M
P
S
T
A
T

G
r
o
u
p
C
O
M
P
S
T
A
T

G
r
o
u
p
The large p-value for the Breslow-Day test
(0.2218) in output3
indicates no significant gender difference
in the odds ratios.
C
O
M
P
S
T
A
T

G
r
o
u
p
THANKS
C
O
M
P
S
T
A
T

G
r
o
u
p