You are on page 1of 67

COMP-STAT

Group
Analysis of Variance and
Covariance

Level : E2

Contents
Key concepts
Analysis of Variance
Analysis of Covariance

GLM Procedure

Analysis of Variance
ANOVA
used to uncover the main and interaction effects of categorical
independent variables (called "factors") on interval dependent
variable (s).

Example:

An experiment may measure weight change (the dependent variable) for men
and women who participated in two different weight-loss programs. The 4
cells of the design are formed by the 4 combinations of sex (men, women) and
program (A, B).

What will be wrong if we use t-test in case of three or more means?

Example:

Let us have a situation where we have three means A, B and C. We want to test the
H0 : A = B = C
Against H1 : at least one of them is different than others.

If we use t test repetitively, we will increase the ERRORS in our analysis.

Assumptions
The scale on which the dependent variable is measured has the properties of an equal interval
scale.
The k samples are independently and randomly drawn from the source population(s)

The source population(s) can be reasonably supposed to have a normal distribution.


The k samples have approximately equal variances.

Main Effect

the effect of a particular factor on average.

Interaction Effect
the effects of one factor differs according to the levels of another factor

The key statistic in ANOVA is the F-test of difference of group means, testing if the
means of the groups formed by values of the independent variable (or combinations of
values for multiple independent variables) are different enough not to have occurred by
chance.

ANOVA focuses on F-tests of significance of differences in group means. If one has an


complete enumeration rather than a sample, then any difference of means is "real."

However, when ANOVA is used for comparing two or more different samples, the real
means are unknown. The researcher wants to know if the difference in sample means is
enough to conclude the real means do in fact differ among two or more groups.
If the group means do not differ significantly then it is inferred that the independent
variable(s) did not have an effect on the dependent variable.

If the F test shows that overall the independent variable(s) is (are) related to the
dependent variable, then multiple comparison tests of significance are used to explore
just which values of the independent(s) have the most to do with the relationship.

Post-hoc Comparisons

If null hypothesis in ANOVA is rejected then go for the multiple comparison (Post-hoc Comparisons)
test.

The most common tests are

Least square difference (LSD)

Duncan

Dunnett

Tukeys Honest Square Difference (HSD)

Bonferroni, Scheffe

Suppose we are testing the null hypothesis that the four sample means are equal

H0 : m1 = m 2 = m3 = m 4
H1 : m1 m 2 m3 m 4
this hypothesis is rejected.
The F test in ANOVA tells that at least one mean is not same to the other but it
does not specify which particular mean it is
One of the possible ways to detect which particular sample mean is different may to
conduct the following six tests-

Unbalanced Designs

If the sample sizes for the treatment combinations are not all equal.

Unbalanced designs cause Confounding.

confounding is the condition that the effects of two (or more) explanatory variables cannot be
distinguished from each other

Types of Sum of Squares

Type I, Type II, Type III and Type IV sum of squares.

Type II sum of square are the reduction in the SSE due to adding the effect to a model that contains all other
effects except effects that contains the effect being tested.

Type III SS are each adjusted for all other effects in the model

If our model does not contain any interaction term then both will lead to same output

For the highest order interaction term the two methods will always provide same estimate

If interaction can be safely ignored then Type II provides more powerful than that obtained from Type III to test
the significance of main effect

If there are not sufficient reasons to ignore interactions then we should use Type III. This is the default type in
most of the softwares for Statistical Analysis

SAS Implementation

proc anova data = hhh;


class treat;
model weight = treat;
run;
PROC ANOVA takes into account the special structure of a balanced design, it is faster
and uses less storage than PROC GLM for balanced data, ), whereas the GLM
procedure can analyze both balanced and unbalanced data
The classification variable is specified in the CLASS statement
The MODEL statement names the dependent variables and independent effects

Example
title1 'Nitrogen Content
data Clover;
input Strain $ Nitrogen
datalines;
1 19.4
1 32.6
1
5 17.7 5
24.8
5
4 17.0 4 19.4
4
7 20.7 7 21.0
7
13 14.3 13 14.4
13
15 17.3 15 19.4
15

of Red Clover Plants';


@@;
27.0
27.9
9.1
20.5
11.8
19.1

proc anova data = Clover;


class strain;
model Nitrogen = Strain;
run;

1
5
4
7
13
15

32.1
25.2
11.9
18.8
11.6
16.9

1
5
4
7
13
15

33.0
24.3
15.8
18.6
14.2
20.8 ;

Results and interpretation

Dependent Variable: Nitrogen


Source
DF Sum of Squares Mean Square
Model
5
847.046667
169.409333
Error
24
282.928000
11.788667
Corrected Total 29
1129.974667

F value Pr>F
14.37 <.0001

R-Square Coeff Var RootMSESE


Nitrogen Mean
0.749616
17.26515
3.433463
19.88667
Source
DF
Anova SS
Mean Square F Value Pr > F
Strain
5
847.0466667
169.4093333 14.37
<.0001

The degrees of freedom (DF) column should be used to check the analysis
results. The model degrees of freedom for a one-way analysis of variance
are the number of levels minus 1; in this case, 6-1=5. The Corrected Total
degrees of freedom are always the total number of observations minus one;
in this case 30-1=29. The sum of Model and Error degrees of freedom
equal the Corrected Total.
The overall F test is significant (F=14.37, p<0.0001), indicating that the
model as a whole accounts for a significant portion of the variability in the
dependent variable. The F test for Strain is significant, indicating that some
contrast between the means for the different strains is different from zero.
Notice that the Model and Strain F tests are identical, since Strain is the
only term in the model.
The F test for Strain (F=14.37, p<0.0001) suggests that there are
differences among the bacterial strains, but it does not reveal any
information about the nature of the differences. Mean comparison methods
can be used to gather further information.

Analysis of Covariance
A combination of linear Regression and ANOVA.

If we have a continuous variable that can have an impact on the


dependent variable and we want to control that variable also the
we use ANCOVA at the place of ANOVA. That is, In experimental
designs, to control for factors which cannot be randomized but
which can be measured on an interval scale.
Example: In some study baseline values can be a variable which we
need to control to examine the significance of categorical
predictors.
When covariate scores are available we have information about
differences between treatment groups that existed before the
experiment was performed and we want to control for that.

As a general rule a very small number of covariates is best.

Correlated with the dependent variable.

Not correlated with each other (multi-colinearity)

Data on covariates should be gathered before treatment is administered

Failure to do this often means that some portion of the effect of the predictor is
removed from the dependent when the covariate adjustment is calculated.

The rules like that for sum of squares etc remain as they were in the case of
ANOVA.

GLM Procedures
The general linear model (GLM) is a statistical
linear model. It may be written as
Y = XB + U

where Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix,
B is a matrix containing parameters that are usually to be estimated and U is a matrix containing residuals
(i.e., errors or noise). The residual is usually assumed to follow a multivariate normal distribution. If the
residual is not a multivariate normal distribution, Generalized linear models may be used to relax
assumptions about Y and U.

The GLM procedure uses the method of least squares to fit general linear models.

GLM handles models relating one or several continuous dependent variables to one or several
independent variables. The independent variables may be either classification variables, which divide the
observations into discrete groups, or continuous variables.

Thus, the GLM procedure can be used for many different analyses, including

simple regression

multiple regression

analysis of variance (ANOVA), especially for unbalanced data

analysis of covariance (ANCOVA)

response-surface models

weighted regression

polynomial regression

partial correlation

multivariate analysis of variance (MANOVA)

repeated measures analysis of variance

SAS GLM procedure

PROC GLM DATA = SAS data-set;


CLASS variables;
MODEL dependents = independents </options>;
MEANS effects </options>;
LSMEANS effects </ options>;
OUTPUT OUT = SAS data-set keyword = variable... ;
RUN;
QUIT;

PROC GLM handles models relating one or several continuous dependent variables
to one or several independent variables.

CLASS specifies classification variables for the analysis.

MODEL specifies dependent and independent variables for the analysis

MEANS computes means of the dependent variable for each value of the specified
effect

LSMEANS produces means for the outcome variable, broken out by the variable
specified and adjusting for any other explanatory variables included on the MODEL
statement.

LSMEANS can also be used for multiple comparisons tests.

OUTPUT specifies an output data set that contains all variables from the input data
set and variables representing statistics from the analysis.

Example

title 'Analysis of Unbalanced 2-by-2 Factorial';


data exp;
input A $ B $ Y @@;
datalines;
A1 B1 12 A1 B1 14
A1 B2 11 A1 B2 9
A2 B1 20 A2 B1 18
A2 B2 17
;
proc glm;
class A B;
model Y=A B A*B;
run;

Result
Analysis of Unbalanced 2-by-2 Factorial
The GLM Procedure
Dependent Variable: Y
Source
DF
Sum of Squares Mean Square F Value
Model
3
91.71428571
30.57142857
15.29
Error
3
6.00000000
2.00000000
Corrected Total 6
97.71428571
R-Square
0.938596
Source
A
B
A*B
Source
A
B
A*B

DF
1
1
1
DF
1
1
1

Coeff Var
9.801480

Pr > F
0.0253

Root MSE
Y Mean
1.414214
14.42857

Type I SS
Mean Square
F Value
Pr > F
80.04761905
80.04761905
40.02
0.0080
11.26666667
11.26666667
5.63
0.0982
0.40000000
0.40000000
0.20
0.6850
Type III SS
Mean Square
F Value
Pr > F
67.60000000
67.60000000
33.80
0.0101
10.00000000
10.00000000
5.00
0.1114
0.40000000
0.40000000
0.20
0.6850

Interpretation

The degrees of freedom may be used to check your data. The Model degrees of freedom for a 2 2
factorial design with interaction are (ab-1), where a is the number of levels of A and b is the
number of levels of B; in this case, (22-1) = 3. The Corrected Total degrees of freedom are always
one less than the number of observations used in the analysis; in this case, 7-1=6.

The overall F test is significant (F=15.29, p=0.0253), indicating strong evidence that the means for
the four different AB cells are different. You can further analyze this difference by examining the
individual tests for each effect.

Four types of estimable functions of parameters are available for testing hypotheses in PROC GLM.
For data with no missing cells, the Type III and Type IV estimable functions are the same and test
the same hypotheses that would be tested if the data were balanced. Type I and Type III sums of
squares are typically not equal when the data are unbalanced; Type III sums of squares are
preferred in testing effects in unbalanced cases because they test a function of the underlying
parameters that is independent of the number of observations per treatment combination.

According to a significance level of 5% , the A*B interaction is not significant (F=0.20, p=0.6850).
This indicates that the effect of A does not depend on the level of B and vice versa. Therefore, the
tests for the individual effects are valid, showing a significant A effect (F=33.80, p=0.0101) but no
significant B effect (F=5.00, p=0.1114).

QUESTIONS ? ?

NON PARAMATRIC TEST

Key Concepts

Non-Parametric Tests

Mann - Whitney Test


Kruskal - Wallis Test
Friedman Test
McNemar Test
Log - Rank Test

Parametric Vs. Non-Parametric Tests

Parametric
These methods needs distributional
assumption from which samples are drawn.
Require a sufficiently large sample size.

Non Parametric
These methods needs no distributional assumption from which
samples are drawn i.e. to say it is Distribution Free Test.
It should be used when the sample size is small.

Mann-Whitney Wilcoxon Test


Introduction

Test for comparing two populations.


Used to test the null hypothesis that two independent samples have identical
distribution functions against the alternative hypothesis that the two
distribution functions differ only with respect to mean or median i.e. to say
used to make inferences about population mean or median without requiring
the assumption of normality.
Used as an alternative to the two sample t-test when the normality
assumption is not satisfied.
Applied when the observations in a sample are ranks, that is, ordinal data
rather than direct measurements

Assumptions

Two samples are randomly and independently drawn.

Dependent variable is continuous, capable of producing measures carried out to the nth decimal
place.

Measures within the two samples have the properties of at least an ordinal scale of measurement, so
that it is meaningful to speak of "greater than," "less than," and "equal to."

Data can be ranked including tied rank values wherever appropriate. Ranks helps to focus only on the
ordinal relationships among the raw measures"greater than," "less than," and "equal to.

Two population distributions differ only by a small shift in location.

Proc npar1way wilcoxon

In general, PROC NPAR1WAY performs an analysis of variance (option


ANOVA), tests for location differences (options WILCOXON, MEDIAN,
SAVAGE, and VW), and performs empirical distribution function tests (option
EDF). Call is

PROC NPAR1WAY < options > ;


BY variables ;
CLASS variable ;
EXACT <WILCOXON> < / computation-options > ;
FREQ variable ;
OUTPUT < OUT=SAS-data-set > < WILCOXON > ;
VAR variables ;
RUN;

Options are:
Task

Options

Description

Specify the input data set

DATA=

Include missing CLASS values

MISSING
Treats missing values of the CLASS variable as a
valid class level.

Suppress all displayed output

NOPRINT

Request analyses

WILCOXON

Names the SAS data set to be analyzed by PROC


NPAR1WAY. If you omit the DATA= option, the
procedure uses the most recently created SAS
data set

Suppress continuity correction CORRECT=NO

Suppresses the display of all output.


Requests an analysis using Wilcoxon scores.
When there are two classification levels, or two
samples, this option produces the Wilcoxon ranksum test. For any number of classification levels,
this option produces the Kruskal- Wallis test.

Suppresses the continuity correction for the


Wilcoxon two- sample test and the SiegelTukey two-sample test

BY statement do separate analyses on observations in groups defined by the


BY variables. When a BY statement appears, the procedure expects the input
data set to be sorted in order of the BY variables.

The CLASS variable identifies groups (or samples) in the data. The variable
can be character or numeric.

The FREQ statement names a numeric variable that provides a frequency for
each observation in the DATA= data set.

The VAR statement names the response or dependent variables to be analyzed. These
variables must be numeric. If the VAR statement is omitted, the procedure analyzes all
numeric variables in the data set except for the CLASS variable, the FREQ variable,
and the BY variables.

OUT=SAS-data-set names the output data set.

Computation-Options are:
Options

Description

ALPHA= value

specifies the level of the confidence limits for Monte Carlo p-value
estimates. The value of the ALPHA= option must be between 0 and 1,
and the default is 0.01 which produces produces 99% confidence limits
for the Monte Carlo estimates.

MAXTIME=value

specifies the maximum clock time (in seconds) that PROC NPAR1WAY
can use to compute an exact p-value. If the procedure does not complete
the computation within the specified time, the computation terminates.

MC

requests Monte Carlo estimation of exact p-values, instead of direct exact


p-value computation. Monte Carlo estimation can be useful for large
problems that require a great amount of time and memory for exact
computations

N=n

specifies the number of samples for Monte Carlo estimation. The value of
the N= option must be a positive integer, and the default is 10,000
samples. Larger values of n produce more precise estimates of exact pvalues.

POINT

requests exact point probabilities for the test statistics.

SEED=number

specifies the initial seed for random number generation for Monte Carlo
estimation. The value of the SEED= option must be an integer.

Examples

Global Evaluations of drug A & drug B in back pain: In a treatment it was found that patients
with low back pain experienced a decrease in pain after 6 to 8 weeks of daily treatment. So, a
study was conducted to determine whether this phenomenon is a drug related response or
coincidental. For this patients were asked to provide a global rating of their pain, relative to
baseline, on the following scale

For testing this phenomenon we use Mann-Whitney test.

Kruskal - Wallis Test


Introduction

Analogue of one way ANOVA without the assumption of normality.

Extension of Wilcoxon test for more then two groups.

Used to compare population location parameters among two or more groups based on independent
samples.

Used to test the null hypothesis that all populations have identical distribution functions against the
alternative hypothesis that at least two of the samples differ only with respect to location .

Assumptions

Same as Wilcoxon test.

Proc npar1way wilcoxon


Example :
Comparison of three groups of people on the basis of percent error: A study
comparing three group of people senior staff, junior staff and residents were
conducted. They were studied on daily basis and data was collected on the basis of
percent error made by them in their work.

Friedman Test
Introduction
Models the ratings of n judges (rows) on k treatments (column).
Generalization of sign test and spearman rank correlation test as it reduces to
sign test if there are two columns and reduces to spearman rank correlation
test if there are two rows.
Also called two-way analysis on ranks as is used for two=way repeated
measures analysis of variance by ranks.

Used to test null hypothesis that treatment effects have identical effects
against the alternative hypothesis that at least one treatment is different from
at least one other treatment.

Assumptions

There are k experimental treatments. k 2.

n rows are mutually independent. (i.e. results within one row do not affect the results within other
rows)

Data can be meaningfully ranked.

SAS Implementation
Proc freq with cmh2 option in table statement.

Friedman Test

Syntax

PROC FREQ < options > ;


BY variables ;
EXACT statistic-options < / computation-options > ;
OUTPUT < OUT=SAS-data-set > options ;
TABLES requests < / options > ;
Run;
Where

BY
calculates separate frequency or crosstabulation tables for each BY group.
EXACT requests exact tests for specified statistics.
OUTPUT
creates an output data set that contains specified statistics.
TABLES specifies frequency or crosstabulation tables and requests tests and measures of
association.
TEST requests asymptotic tests for measures of association and agreement.
WEIGHT
identifies a variable with values that weight each observation.

Friedman Test
Options
AGREE

McNemar's test for 2 2 tables, simple kappa coefficient, and weighted kappa
coefficient
BINOMIAL
binomial proportion test for one-way tables
CHISQ
chi-square goodness-of-fit test for one-way tables; Pearson chi-square, likelihoodratio chi-square, and Mantel-Haenszel chi-square tests for two-way tables
COMOR
confidence limits for the common odds ratio for h 2 2 tables; common odds ratio
test
FISHER
Fisher's exact test
JT
Jonckheere-Terpstra test
KAPPA
test for the simple kappa coefficient
LRCHI
likelihood-ratio chi-square test
MCNEM
McNemar's test
MEASURES tests for the Pearson correlation and the Spearman correlation, and the odds ratio
confidence limits for 2 2 tables
MHCHI
Mantel-Haenszel chi-square test OR confidence limits for the odds ratio for 2 2
tables
PCHI
Pearson chi-square test
PCORR
test for the Pearson correlation coefficient
SCORR
test for the Spearman correlation coefficient
TREND
Cochran-Armitage test for trend
WTKAP
test for the weighted kappa coefficient

Options
AGREE

McNemar's test for 2 2 tables, simple kappa coefficient, and weighted kappa
coefficient
BINOMIAL
binomial proportion test for one-way tables
CHISQ
chi-square goodness-of-fit test for one-way tables; Pearson chi-square, likelihoodratio chi-square, and Mantel-Haenszel chi-square tests for two-way tables
COMOR
confidence limits for the common odds ratio for h 2 2 tables; common odds ratio
test
FISHER
Fisher's exact test
JT
Jonckheere-Terpstra test
KAPPA
test for the simple kappa coefficient
LRCHI
likelihood-ratio chi-square test
MCNEM
McNemar's test
MEASURES tests for the Pearson correlation and the Spearman correlation, and the odds ratio
confidence limits for 2 2 tables
MHCHI
Mantel-Haenszel chi-square test OR confidence limits for the odds ratio for 2 2
tables
PCHI
Pearson chi-square test
PCORR
test for the Pearson correlation coefficient
SCORR
test for the Spearman correlation coefficient
TREND
Cochran-Armitage test for trend
WTKAP
test for the weighted kappa coefficient

McNemar Test
Introduction

Determine whether the row and column marginal frequencies are equal or not.

Uses matched pairs labels say, (A,B).

Tests whether pair (A,B) is as likely as (B,A).

Used when dichotomous outcomes are recorded twice for each patient under different conditions
(Eg different treatments or different measurement times).

Assumptions
Data consists of paired observations of labels (A,B).

Applied to 2x2 contingency tables with a dichotomous trait with matched pairs of subjects.

Used only when the conditions for the normal approximation apply.

SAS Implementation

Proc freq with agree option in table statement


Output gives Chi-Square p-value (two-tailed). One tailed can be obtained by
halving it.

Example

Comparing response rates (Eg. normal & abnormal of group of patients where data
are collected for pre and post study laboratory results) when patients are treated
under a particular drug say A. (Here, we need to test whether there is a change in
the pre - to - post - treatment rates of abnormalities.)

Suppose following program has been run where aim is to compare response
rates (yes/no) of case & control.

Log-Rank Test
Introduction

Used for comparing distributions of time until the occurrence of an event (Eg death, cure, failure,
relapse etc.) of interest occur among independent groups.

Used to test the null hypothesis that there is no difference between the populations in the
probability of an event at any time point.

Used when Wilcoxon test fails. (i.e. censoring condition is not satisfied)

Most likely to detect a difference between groups when the risk of an event is consistently greater
for one group than another.

Equivalent to applying CMH at each time point as the strata.

Assumptions

Censoring is unrelated to prognosis.

Survival probabilities are the same for subjects recruited early and late in the study, and the events
happened at the times specified.

Requires no assumption regarding the distribution of event times.

SAS Implementation

Proc lifetest
Output shows Chi-Square p-value.
PROC LIFETEST < options > ;
TIME variable < *censor(list) > ;
BY variables ;
FREQ variable ;
ID variables ;
STRATA variable < (list) > < ... variable < (list) > > ;
SURVIVAL options ;
TEST variables ;
Run;

Time statement used to indicate the failure time variable, where


variable is the name of the failure time variable that can be optionally followed
by an asterisk, the name of the censoring variable, and a parenthetical list of
values that correspond to right censoring. The censoring values should be
numeric, non missing values.

BY statement with PROC LIFETEST to obtain separate analyses on observations


in groups defined by the BY variables.

The variable in the FREQ statement identifies a variable containing the frequency
of occurrence of each observation.

The ID variable values are used to label the observations of the product-limit
survival function estimates.

The STRATA statement indicates which variables determine strata levels for
the computations. The strata are formed according to the nonmissing values of
the designated strata variables.
Options available with STRATA statement

MISSING
used to allow missing values as a valid stratum level.
GROUP=variable specifies the variable whose formatted values identify the various
samples whose underlying survival curves are to be compared.
NODETAIL
suppresses the display of the rank statistics and the corresponding
covariance matrices for various strata.
NOTEST
suppresses the k-sample tests, stratified tests, and trend tests
TREND
computes the trend tests for testing the null hypothesis that the
k
population hazards rate are the same versus an ordered alternatives
TEST=(list)
enables you to select the weight functions for the k-sample tests,
stratified tests, or trend tests. You can specify a list containing one
or more of the following keywords