You are on page 1of 15

Analysis of Variance and Design of

Experiments
General Linear Hypothesis and Analysis of Variance
:::
Lecture 3
Regression and Analysis of Variance Models

Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Slides can be downloaded from http://home.iitk.ac.in/~shalab/sp1


Regression model for the general linear hypothesis:
Let Y1 , Y2 ,..., Yn be a sequence of n independent random variables
associated with responses. Then we can write it as
p
E (Yi )    j xij , i  1, 2,..., n, j  1, 2,..., p
j 1

Var (Yi )   2 .

This is the linear model in the expectation form where 1 ,  2 ,...,  p


are the unknown parameters and xij’s are the known values of
independent covariates X 1 , X 2 ,..., X p .

2
Regression model for the general linear hypothesis:

Alternatively, the linear model can be expressed as


p
Yi    j xij ,   i , i  1, 2,..., n; j  1, 2,..., p
j 1

where i ’s are identically and independently distributed random

error component with mean 0 and variance  2 , i.e.,

E ( i )  0 V ar (  i )   2
and Cov ( i ,  j )  0(i  j ).

3
Regression model for the general linear hypothesis:

In matrix notations, the linear model can be expressed as


Y  X 
where
• Y  (Y1 , Y2 ,..., Yn ) ' is a n x 1 vector of observations on the
response variable,
 X 11 X 12 ... X 1 p 
 
• the matrix  X 21 X 22 ... X 2 p 
X  is a n x p matrix of n observations
   
 
 X X ... X 
 n1 n 2 np 

on p independent covariates X 1 , X 2 ,..., X p ,

4
Regression model for the general linear hypothesis:

•   ( 1 ,  2 ,...,  p ) is a p x 1 vector of unknown regression


parameters (or regression coefficients) 1 ,  2 ,...,  p associated
with X 1 , X 2 ,..., X p , respectively and
•   (1 ,  2 ,...,  n ) is a n x 1 vector of random errors or
disturbances.
• We assume that E ( )  0, the covariance matrix

V ( )  E ( ')   2 I p , rank ( X )  p

5
Regression model for the general linear hypothesis:
In the context of analysis of variance and design of experiments,
 the matrix X is termed as the design matrix;
 unknown  1 ,  2 ,...,  p are termed as effects,
 the covariates X 1 , X 2 ,..., X p , are counter variables or indicator
variables where xij counts the number of times the effect j
occurs in the ith observation xi .
 xij mostly takes the values 1 or 0 but not always.
 The value xij = 1 indicates the presence of effect j in xi and xij
= 0 indicates the absence of effect j in xi.

6
Regression model for the general linear hypothesis:

Note that in the linear regression model, the covariates are usually
continuous variables.

When some of the covariates are counter variables, and rest are
continuous variables, then the model is called a mixed model and
is used in the analysis of covariance.

7
Relationship between the regression and ANOVA models:
The same linear model is used in the linear regression analysis as
well as in the analysis of variance.

So it is important to understand the role of a linear model in the


context of linear regression analysis and analysis of variance.

Consider the multiple linear model

Y   0  X 1  1  X 2  2  ...  X p  p  

8
Relationship between the regression and ANOVA models:
In the case of analysis of variance model,
• the one‐way classification considers only one covariate,
• two‐way classification model considers two covariates,
• three‐way classification model considers three covariates and
so on.

9
Relationship between the regression and ANOVA models:
If    and  denote the effects associated with the covariates
X, Z and W which are the counter variables, then in
One‐way model: Y    X   

Two‐way model: Y    X   Z   

Three‐way model: Y    X   Z   W    and so on.

10
Relationship between the regression and ANOVA models:
Consider an example of agricultural yield. The study variable
denotes the yield which depends on various covariates X1,X2,…,Xp .

In the case of regression analysis, the covariates X1,X2,…,Xp are the


different variables like temperature, the quantity of fertilizer,
amount of irrigation etc.

11
Relationship between the regression and ANOVA models:
Now consider the case of one‐way model and try to understand
its interpretation in terms of the multiple regression model.

The covariate X is now measured at different levels, e.g., if X is


the quantity of fertilizer then suppose there are p possible values,
say 1 Kg., 2 Kg.,..., p Kg. then X 1 , X 2 ,..., X p denotes these p values
in the following way.

12
Relationship between the regression and ANOVA models:
The linear model now can be expressed as
Y   0   1 X 1   2 X 2  ...   p X p  

by defining

1 if effect of 1 Kg. fertilizer is present


X1  
0 if effect of 1 Kg. fertilizer is absent

1 if effect of 2 Kg. fertilizer is present


X2  
0 if effect of 2 Kg. fertilizer is absent

1 if effect of p Kg. fertilizer is present
Xp  
0 if effect of p Kg. fertilizer is absent.
13
Relationship between the regression and ANOVA models:
If the effect of 1 Kg. of fertilizer is present, then other effects will
obviously be absent and the linear model is expressible as
Y   0   1 ( X 1  1)   2 ( X 2  0)  ...   p ( X p  0)  
  0  1  

If the effect of 2 Kg. of fertilizer is present then


Y   0  1 ( X 1  0)   2 ( X 2  1)  ...   p ( X p  0)  
 0  2  

If the effect of p Kg. of fertilizer is present then


Y   0  1 ( X 1  0)   2 ( X 2  0)  ...   p ( X p  1)  
 0   p  

and so on.
14
Relationship between the regression and ANOVA models:

If the effect of p Kg. of fertilizer is present then


Y   0  1 ( X 1  0)   2 ( X 2  0)  ...   p ( X p  1)  
 0   p  

and so on.

15

You might also like