You are on page 1of 32

Analysis 

of Variance and Design of 
Experiments
Results from Matrix Theory and Random Variables 
:::
Lecture 2
Random Vectors and Linear Estimation

Shalabh 
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Slides can be downloaded from http://home.iitk.ac.in/~shalab/sp1
Random vectors:
Let Y1 , Y2 ,..., Yn be n random variables then Y  (Y1 , Y2 ,..., Yn ) ' is
called a random vector.
• The mean vector Y is
E (Y )  (( E (Y1 ), E (Y2 ),..., E (Yn )) '.

• The covariance matrix or dispersion matrix of Y is

 Var (Y1 ) Cov(Y1 , Y2 ) ... Cov(Y1 , Yn ) 


 
Cov (Y , Y ) Var (Y ) ... Cov (Y , Y )
Var (Y )   2 1 2 2 n 
     
 
 Cov(Yn , Y1 ) Cov(Yn , Y2 ) ... Var (Yn ) 

which is a symmetric matrix


2
Random vectors:
• If Y1 , Y2 ,..., Yn are independently distributed, then the
covariance matrix is a diagonal matrix.
• If Var (Yi )   2 for all i = 1, 2,…,n then Var (Y )   2 I n .

3
Linear function of random variable:
If   Y1 , Y2 ,..., Yn are n random variables, and k1 , k2 ,.., kn are scalars, 
n

then   kiYi is called a linear function of random variable                 .


Y1 , Y2 ,..., Yn
i 1

n
If Y  (Y1 , Y2 ,..., Yn ) ', K  (k1 , k2 ,..., kn ) ' then K ' Y   kiYi ,
i 1

n
• the mean K ' Y is E ( K ' Y )  K ' E (Y )   ki E (Yi ) and
i 1

• the variance of K ' Y is Var  K ' Y   K 'Var Y  K .

4
Multivariate normal distribution:
A random vector Y  (Y1 , Y2 ,..., Yn ) ' has a multivariate normal
distribution with mean vector   ( 1 , 2 ,..., n ) ' and dispersion
matrix  if its probability density function is

1  1 
f (Y |  , )  1/2
exp 
 2 (Y   ) '  1
(Y   ) 
(2 ) n /2 

assuming  is a nonsingular matrix.

5
Chi‐square distribution:
• If Y1 , Y2 ,..., Yk are independently distributed random variables
following the normal distribution with common mean 0 and
k

common variance 1, then the distribution of  Yi is called


2

i 1

the  ‐ distribution with k degrees of freedom.


2

 2
• The probability density function of ‐distribution with k
degrees of freedom is given as
1  x
k
1
f  2 ( x)  x exp    ; 0  x  .
2
(k / 2)2 k /2
 2

6
Chi‐square distribution:
• If    Y1 , Y2 ,..., Yk are independently distributed following the 
normal distribution with common mean 0 and common 
1 k

 Yi 2  2
 , 2
variance         then  has     distribution with k degrees 
2 i 1

of freedom.

• If the random variables Y1 , Y2 ,..., Yk are normally distributed


with non‐null means 1 ,  2 ,...,  k but common variance 1,
k

then the distribution of Y


i 1
i
2
has noncentral  distribution
2

with k degrees of freedom and noncentrality parameter


k
   i2 .
i 1
7
Chi‐square distribution:
• If Y1 , Y2 ,..., Yk are independently and normally distributed
following the normal distribution with means 1 , 2 ,..., k but
1 k
common variance  2 then
2
i
Y 2

i 1
has non‐central 2

distribution with k degrees of freedom and noncentrality


1 k

2  i
parameter    2
.
 i 1

• If U has a Chi‐square distribution with k degrees of freedom


then E (U )  k and Var (U )  2k .

8
Chi‐square distribution:
• If U has a noncentral Chi‐square distribution with k degrees
of freedom and noncentrality parameter  then
E U   k   and Var (U )  2k  4 .

• If U1 , U 2 ,..., U k are independently distributed random


variables with each U i having a noncentral Chi‐square
distribution with ni degrees of freedom and non‐centrality
k

parameter i , i  1, 2,..., k then U


i 1
i has noncentral Chi‐
k
square distribution with n
i 1
i degrees of freedom and
k
noncentrality parameter  .
i 1
i
9
Chi‐square distribution:
• Let X  ( X 1 , X 2 ,..., X n ) ' has a multivariate normal distribution
with mean vector  and positive definite covariance matrix .
Then X ' AX is distributed as noncentral  with k degrees
2

of freedom if and only if A is an idempotent matrix of


rank k.

10
Chi‐square distribution:
Let the two quadratic forms‐
X ' A1 X is distributed as  with n1 degrees of freedom
2

and noncentrality parameter  ' A1 and
• X ' A2 X is distributed as  2
with n2 degrees of freedom
and noncentrality parameter  ' A2  . .
Then X ' A1 X and X ' A2 X are independently distributed if
A1 A2  0.

11
t‐ distribution:
• If
X has a normal distribution with mean 0 and variance 1,
Y has a  2 distribution with n degrees of freedom, and
X and Y are independent random variables,
X
then the distribution of the statistic T  is called the t‐
Y /n
distribution with n degrees of freedom.

The probability density function of t is


 n 1 
  n 1 
  t 2   2 

 2 
fT (t )   1  ; -   t  .
n  n
   n
2 12
t‐ distribution:
X
• If the mean of X is nonzero then the distribution of
Y /n
is called the noncentral t ‐ distribution with n degrees of
freedom and noncentrality parameter  .

13
F ‐ distribution:

• If X and Y are independent random variables with ‐
2

distribution with m and n degrees of freedom respectively,


X /m
then the distribution of the statistic F  is called
Y /n
the F‐distribution with m and n degrees of freedom. The
probability density function of F is

m /2
 m  n  m 
    m2 
 mn 
 
 2   n    m   2 
fF ( f )  f  2 
1   n  f ; 0  f  .
m n    
  
 2  2

14
F ‐ distribution:
• If X has a noncentral Chi‐square distribution with m degrees
of freedom and noncentrality parameter  ; Y has a
 2 distribution with n degrees of freedom, and X and Y are

independent random variables, then the distribution of


X /m
F is the noncentral F distribution with m and n
Y /n
degrees of freedom and noncentrality parameter  .

15
Linear model:
Suppose there are n observations. In the linear model, we assume
that these observations are the values taken by n random
variables Y1 , Y2 ,..., Yn satisfying the following conditions:
1. E (Yi ) is a linear combination of p unknown parameters 1 ,  2 ,...,  p
E (Yi )  xi11  xi 2  2  ...  xip  p , i  1, 2,..., n

where xij ' s are known constants.

2. Y1 , Y2 ,.., Yn are uncorrelated and normality distributed with


variance Var (Yi )   2 .

16
Linear model:
The linear model can be rewritten by introducing independent
normal random variables following N (0,  2 ) , as
Yi  xi11  xi 2  2  ....  xip  p   i , i  1, 2,..., n.
These equations can be written using the matrix notations as
Y  X 
• Y is a n x 1 vector of observation,
• X is a n x p matrix of n observations on each of             
X 1 , X 2 ,..., X p
variables,       
•  is a p  1 vector of parameters and
•  is            a vector of random error components with   
n 1
 ~ N (0,  2 I n ). 17
Linear model:
Here
• Y is called study or dependent variable,
• X 1 , X 2 ,..., X p are called explanatory or independent  variables  and
• 1 ,  2 ,...,  p are called as regression coefficients.

Alternatively since  Y ~ N ( X  ,  2 I ) so the linear model can also be 
expressed  in the expectation form as a normal random variable Y
E (Y )  X 
with                            
Var (Y )   2 I .
 2
Note that     and       are unknown but X is known.
18
Estimable functions:
A linear parametric function  '  of the parameter is said to be
an estimable parametric function or estimable if there exists a
linear function of random variables  'Y of Y where Y  (Y1 , Y2 ,..., Yn ) '
such that
E ( ' Y )   ' 
with   (1 ,  2 ,...,  n ) ' and   (1 , 2 ,..., n ) ' being vectors of
known scalars.

19
Best linear unbiased estimates (BLUE):
The unbiased minimum variance linear estimate  'Y of an
estimable function  
'
is called the best linear unbiased
estimate (BLUE) of  '  .
• Suppose  1' Y and  '2Y are the BLUE of 1'  and 2' 
respectively.
(a11  a2  2 ) ' Y (a11  a2 2 ) '  .
Then                             is the BLUE of   
• If  '  is estimable, its best estimate is  ' ˆ where ˆ
is any solution of the equations X ' X   X ' Y .

20
Least squares estimation:
The least‐squares estimate of  is Y  X    is the value
of  which minimizes the error sum of squares  '  .
Let S   '   (Y  X  ) '(Y  X  )
 Y 'Y  2 ' X 'Y   ' X ' X  .
Minimizing S with respect to  involves
S
0

 X ' X   X 'Y
which is termed as normal equation.
This normal equation has a unique solution given by,
ˆ  ( X ' X ) 1 X ' Y
assuming rank(X) = p.
21
Least squares estimation:
2S
Note that  X ' X is a positive definite matrix.
 '

So   ˆ  ( X ' X ) 1 X ' Y is the value of     which minimizes          and 


  '

is termed as ordinary least squares estimator of      .
1 ,  2 ,...,  p
•     In this case,                     are estimable and consequently, all 
the linear parametric function  are estimable.

• E ( ˆ )  ( X ' X ) 1 X ' E (Y )  ( X ' X ) 1 X ' X   

• Var ( ˆ )  ( X ' X ) 1 X 'Var (Y ) X ( X ' X ) 1   2 ( X ' X ) 1


22
Least squares estimation:
• If  ' ˆ and  ' ˆ are the estimates of  '  and  ' 
respectively, then

 Var ( ' ˆ )   'Var ( ˆ )   2 [ '( X ' X ) 1  ]

 Cov ( ' ˆ ,  ' ˆ )   2 [  '( X ' X ) 1  ].

• Y  X ˆ is called the residual vector

• E (Y  X ˆ )  0.

23
Linear model with correlated observations:
In the linear model Y  X    with E ( )  0, Var ( )   and 
is normally distributed, we find
E (Y )  X  , Var (Y )  .

Assuming  to be positive definite, so we can write


  P'P
where P is a nonsingular matrix. Premultiplying Y  X   
by P, we get PY  PX   P
                               
or                         Y *  X *    *
where Y *  PY , X *  PX and  *  P .
Note that in this model E ( *)  0 and Var ( *)   2 I . 24
Distribution of l’Y :
In the linear model Y  X    ,  ~ N (0,  2 I ) consider a linear
function  'Y which is normally distributed with
E ( ' Y )   ' X  ,
Var ( ' Y )   2 ( ' ).

Then
 'Y  ' X  
~ N ,1 .
  '    ' 
( ' y ) 2
Further, 2 has a noncentral Chi‐square distribution with
  '
( ' X  ) 2
one degree of freedom and noncentrality parameter .
  '
2

25
Degrees of freedom:
A linear function l’Y of the observations (  0) is said to carry
one degrees of freedom.
A set of linear functions L’Y where L is r x n matrix, is said to have
M degrees of freedom if there exist M linearly independent
functions in the set and no more.

Alternatively, the degrees of freedom carried by the set L’Y


equals rank (L).
When the set L’Y are the estimates of ’ the degrees of
freedom of the set  will also be called the degrees of freedom
for the estimates of ’. 26
Sum of squares:
If  'Y is a linear function of observations, then the projection
of Y on 1 is the vector Y '  . .
 '

The square of this projection is called the sum of squares (SS)


( ' Y ) 2
due to  ' y is given by .
 '

Since  ' y has one degree of freedom, so the SS due  'Y to


has one degree of freedom.

27
Sum of squares:
The sum of squares and the degrees of freedom arising out of
the mutually orthogonal sets of functions can be added
together to give the sum of squares and degrees of freedom for
the set of all the function together and vice versa.

28
Sum of squares:
Let X  ( X 1 , X 2 ,..., X n ) has a multivariate normal distribution with
mean vector  and positive definite covariance matrix  .
Let the two quadratic forms.
• X ' A1 X is distribution  2 with n1 degrees of freedom and
noncentrality parameter  ' A1 and
• X ' A2 X is distributed as  2 with n2 degrees of freedom and
noncentrality parameter  ' A2  .

Then X ' A1 X and X ' A2 X are independently distributed if A1A2  0.

29
Fisher‐Cochran theorem:
If X  ( X 1 , X 2 ,..., X n ) ' has a multivariate normal distribution with
mean vector  and positive definite covariance matrix  and
let X ' 1 X  Q1  Q2  ...  Qk

where Qi  X ' Ai X with rank ( Ai )  Ni , i  1, 2,..., k . Then Qi ' s are


independently distributed noncentral Chi‐square distribution
with Ni degrees of freedom and noncentrality parameter  ' Ai 
k
if and only if N
i 1
i  N , is which case

k
 '      ' Ai  .
1

i 1
30
Derivatives of quadratic and linear forms:
Let X  ( x1 , x2 ,..., xn ) ' and f(X) be any function of n
independent variables x1 , x2 ,..., xn , then

 f ( X ) 
 x 
 1 
 f ( X ) 
f ( X )  
  x2  .
X
  
 
 f ( X ) 
 x 
 n 

K ' X
If K  (k1 , k2 ,..., kn ) ' is a vector of constants, then  K.
X
X ' AX
If A is a m  n matrix, then  2( A  A ') X .
X
31
Independence of linear and quadratic forms:
• Let Y be an n 1 vector having multivariate normal
distribution N ( , I ) and B be a m  n matrix.
m1
Then the           vector linear form BY  is independent of the    
quadratic   form  Y ' AY if   BA = 0   where   A  is   a  symmetric   
matrix of  known elements.
• Let Y be a n 1 vector having multivariate normal
distribution N ( , ) with rank ()  n.
If BA  0, then the quadratic form Y ' AY is independent
of linear form BY where B is a m  n matrix.

32

You might also like