You are on page 1of 6

Wishart Distribution

1
If 𝑥1 , 𝑥2 , … . . 𝑥𝑛 is a random sample from 𝑁(𝜇, 𝜎 2 ), and if = (𝑛−1) ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 , then it is

(𝑛 − 1)𝑠 2⁄ 2 2
well known that ( 𝜎 2 ) ~𝜒(𝑛−1) . The multivariate analogue of (𝑛 − 1) 𝑠 is the
matrix 𝐴 = ∑𝑁
𝛼=1(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′ and is called Wishart matrix. The generalization of 𝜒
2

distribution in the multivariate case is called the Wishart Distribution.

𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛: In a random sampling from a 𝑝 variate normal population 𝑁𝑝 (𝜇, 𝛴), the 𝑝 × 𝑝
symmetric matrix of sums of squares and cross products of deviations about the mean of the
sample observations given by

𝐴 = ∑𝑁 𝑁 ′
𝛼=1(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′ = ∑𝛼=1 𝑥𝛼 𝑥𝛼 − 𝑁𝑥̅ . 𝑥̅ ′ = 𝑋𝑋′ − 𝑁𝑥̅ . 𝑥̅ ′

is called the Wishart matrix and the distribution of the matrix 𝐴 is called Wishart distribution.

𝑎11 𝑎12 … 𝑎1𝑁


𝑎 𝑎22 … 𝑎2𝑁
𝐴 = [ 21⋮ ]
⋮ ⋱ ⋮
𝑎𝑝1 𝑎𝑝1 … 𝑎𝑝𝑁
(𝑝 𝑋 𝑝)

By definition of Wishart matrix, it is clear that the Wishart distribution is the joint distribution
𝑝(𝑝+1)
of elements of matrix 𝐴 as it is a symmetric matrix.
2

Probability Density Function of Wishart Distribution

In Lemma 2, we also demonstrated that 𝐴 = ∑𝑁−1 ′


𝛼=1 𝑦𝛼 𝑦𝛼 and we have proved in the Theorem

that the distribution of 𝐴 = 𝑁𝛴̂ is same as the distribution of ∑𝑁−1 ′


𝛼=1 𝑦𝛼 𝑦𝛼 where 𝑦𝛼 ; 𝛼 =

1, 2, … (𝑁 − 1) are independently distributed each following 𝑁𝑝 (0, 𝛴).

The probability density function of Wishart matrix 𝐴 is given by

1 1 −1 𝐴)
|𝐴|2(𝑛−𝑝−1) 𝑒 −2𝑡𝑟(𝛴
𝑓(𝐴) = ; 𝑛 = (𝑁 − 1)
𝑛 +1−𝑖
2𝑛𝑝⁄2 𝜋 𝑝(𝑝−1)⁄4 |𝛴|𝑛⁄2 ∏𝑝𝑖=1 𝛤 ( )
2
and we write that 𝐴~ 𝑊(𝛴, 𝑁 − 1 )

Corollary: Given 𝑝 variate random vectors 𝑥1 , 𝑥2 , … , 𝑥𝑁 from a 𝑝 variate normal


distribution 𝑁𝑝 (𝜇, 𝛴), the distribution of sample variance covariance matrix 𝑆 is
1
𝑊 ((𝑁−1) 𝛴, 𝑁 − 1 )
The Characteristic Function

To determine the characteristic function of a 𝑝 𝑋 𝑝 matrix 𝐴, is given by


𝛷𝐴 (𝛩) = 𝐸(𝑒 𝑖 𝑡𝑟(𝐴𝛩) )
where 𝛩 is a a 𝑝 𝑋 𝑝 real valued positive definite matrix.
Theorem 1: Let 𝐴 = ∑𝑁−1 ′
𝛼=1 𝑦𝛼 𝑦𝛼 , where 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently

distributed each following 𝑁𝑝 (0, 𝛴), be a 𝑝 𝑋 𝑝 matrix following 𝑊(𝛴, 𝑁 − 1 ), then the
𝑛
characteristic function of 𝐴 is given by 𝛷𝐴 (𝛩) = |𝐼 − 2𝑖 𝛴𝛩|− 2
Proof: Let us now consider the characteristic function of vector 𝑋
𝛷𝐴 (𝛩) = 𝐸(𝑒 𝑖 𝑡𝑟(𝐴𝛩) )
𝑁−1 ′
= 𝐸 (𝑒 𝑖 𝑡𝑟(∑𝛼=1 𝑦𝛼𝑦𝛼)𝛩 )
𝑁−1 ′
= 𝐸 (𝑒 𝑖 𝑡 ∑𝛼=1 𝑦𝛼 𝛩𝑦𝛼 )

= 𝐸(∏𝑁−1
𝛼=1 𝑒
𝑖 (𝑦𝛼 𝛩𝑦𝛼 )
)

= ∏𝑁−1
𝛼=1 𝐸(𝑒
𝑖 (𝑦𝛼 𝛩𝑦𝛼 )
) as 𝑦𝛼 are independently distributed
As 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are identically distributed each following 𝑁𝑝 (0, 𝛴).
′ 𝑛
𝛷𝐴 (𝛩) = [𝐸(𝑒 𝑖 (𝑦 𝛩𝑦)
)] ; 𝑛 = (𝑁 − 1)
Now since 𝛩 is a 𝑝 𝑋 𝑝 real non-singular positive definite matrix, it can be reduced to a
diagonal matrix using a matrix 𝑃 such that
𝑃′𝛩𝑃 = 𝐷 = 𝑑𝑖𝑎𝑔(𝜆1 , 𝜆2 , … . . 𝜆𝑝 )
As 𝛴 is positive definite symmetric, the matrix 𝑃 is used to reduce to an identity matrix
𝑃′ 𝛴 −1 𝑃 = 𝐼
and use this matrix 𝑃 to transform vector 𝑦, using
𝑦 = 𝑃𝑧 so that 𝐸(𝑦) = 𝑃𝐸(𝑧) = 0 and

𝑉(𝑧) = 𝑉(𝑃−1 𝑦) = 𝑃−1 𝛴(𝑃−1 )′ = (𝑃′ 𝛴 −1 𝑃)−1 = 𝐼

Therefore, 𝑧~𝑁𝑝 (0, 𝐼) . Using


′ 𝑃′𝛩𝑃𝑧) 𝑛 ′ 𝐷𝑧) 𝑛
𝛷𝐴 (𝛩) = [𝐸(𝑒 𝑖 (𝑧 )] = [𝐸(𝑒 𝑖 (𝑧 )]
𝜆1 0 ⋯ 0 𝑧1
⋯ 𝑧𝑝 ] [ 0 𝜆2 ⋯ 0 ] [𝑧2 ] = ∑𝑝 𝜆 𝑧 2
Consider 𝑧 ′ 𝐷𝑧 = [𝑧1 𝑧2 ⋮ 𝑗=1 𝑗 𝑗
⋮ ⋮ ⋱ ⋮
0 0 ⋯ 𝜆𝑝 𝑧𝑝
𝑝 𝑛
𝑖 (∑𝑗=1 𝜆𝑗 𝑧𝑗2 )
𝛷𝐴 (𝛩) = [𝐸 (𝑒 )]
2 𝑛
𝑝
= [𝐸 (∏𝑗=1 𝑒 𝑖 𝜆𝑗𝑧𝑗 )]
2 𝑛
= [∏𝑝𝑗=1 𝐸(𝑒 𝑖 𝜆𝑗𝑧𝑗 )]
𝑛
= (∏𝑝𝑗=1 𝜑𝑧 2 (𝜆𝑗 ))
𝑗

As each 𝑧𝑗 ~𝑁(0,1) ∀ 𝑗 = 1,2, … 𝑝, its square follows Chi square distribution. The

characteristic function of 𝑧𝑗2 is given by 𝜑𝑧 2 (𝜆𝑗 ) = (1 − 2𝑖 𝜆𝑗 )


𝑗

Thus,
𝑛
𝛷𝐴 (𝛩) = (∏𝑝𝑗=1(1 − 2𝑖 𝜆𝑗 ))
1 𝑛
= (|𝐼 − 2𝑖 𝐷|−2 )

Using 𝑃′𝛩𝑃 = 𝐷 𝑎𝑛𝑑 𝑃′ 𝛴 −1 𝑃 = 𝐼 in above, we can write


𝑛
𝛷𝐴 (𝛩) = (|𝑃′ 𝛴 −1 𝑃 − 2𝑖 𝑃′𝛩𝑃|)−
𝑛
= (|𝛴 −1 − 2𝑖 𝛩||𝑃𝑃′ |)−2
𝑛
= (|𝛴 −1 − 2𝑖 𝛩||𝛴|)−2
𝑛
= (|𝐼 − 2𝑖 𝛴 𝛩|)− 2 ; 𝑛 = (𝑁 − 1)

Sum of Wishart Matrices


Theorem 2: Let 𝐴1 , 𝐴2 , … . 𝐴𝑘 be 𝑝𝑋 𝑝 matrices each following independent Wishart
𝑗 𝑛
distributions , i.e 𝐴𝑗 ~𝑊(𝛴, 𝑛𝑗 ) where each 𝐴𝑗 = ∑𝛼=1 𝑦𝛼 𝑦𝛼′ ; 𝑗 = 1, 2, … . 𝑘 and 𝑦𝛼 ~𝑁𝑝 (0,
𝛴), then ∑𝑘𝑗=1 𝐴𝑗 ~𝑊(𝛴, ∑𝑘𝑗=1 𝑛𝑗 ).

Proof: As 𝐴𝑗 ~𝑊(𝛴, 𝑛𝑗 ), its characteristic function is given by


𝑛𝑗
𝛷𝐴𝑗 (𝛩) = (|𝐼 − 2𝑖 𝛴 𝛩|)− 2 ; 𝑗 = 1, 2, … . 𝑘

Therefore,
𝑘
𝛷∑𝑘 (𝛩) = 𝐸 (𝑒 𝑖 𝑡𝑟 𝛩(∑𝑗=1 𝐴𝑗 ) )
𝑗=1 𝐴𝑗

= 𝐸 (∏ 𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )
𝑗=1

= ∏ 𝐸(𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )
𝑗=1

= ∏𝑘𝑗=1 𝐸(𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )


𝑛𝑗
= ∏𝑘𝑗=1(|𝐼 − 2𝑖 𝛴 𝛩|)− 2
∑𝑘
𝑗=1 𝑛𝑗

= (|𝐼 − 2𝑖 𝛴 𝛩|) 2 ; 𝑛𝑗 = (𝑁𝑗 − 1 )

Transformation of Wishart Matrix

Theorem 3: Let 𝐴 = ∑𝑁−1 ′


𝛼=1 𝑦𝛼 𝑦𝛼 ,where𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently

distributed each following 𝑁𝑝 (0, 𝛴). If 𝑌 → 𝑍 using a 𝑝 𝑋 𝑝 non-singular matrix 𝐻 so that


𝑧𝛼 = 𝐻𝑦𝛼 ; 𝛼 = 1, 2, … 𝑛 ; 𝑛 = (𝑁 − 1), then 𝐻𝐴𝐻 ′ ~𝑊(𝐻𝛴𝐻 ′ , 𝑛).

Proof: 𝐸(𝑧𝛼 ) = 𝐻𝐸(𝑦𝛼 ) = 0

𝑉(𝑧𝛼 ) = 𝑉(𝐻𝑦𝛼 ) = 𝐻𝛴𝐻 ′

𝑧𝛼 ~ 𝑁𝑝 (0, 𝐻𝛴𝐻 ′ )

We also notice that 𝐻𝐴𝐻 ′ = 𝐻(∑𝑛𝛼=1 𝑦𝛼 𝑦𝛼′ )𝐻 ′ = ∑𝑛𝛼=1( 𝐻𝑦𝛼 ) (𝐻𝑦𝛼 )′ = ∑𝑛𝛼=1 𝑧𝛼 𝑧𝛼′

Hence 𝐻𝐴𝐻 ′ = ∑𝑛𝛼=1 𝑧𝛼 𝑧𝛼′ ~𝑊(𝐻𝛴𝐻 ′ , 𝑛)

Marginal Distribution

Theorem 4: Let then 𝐴~𝑊(𝛴, 𝑛) and let both 𝐴 and 𝛴 be partitioned into 𝑞 and 𝑝 − 𝑞 rows
and columns as follows:

𝐴 𝐴12 𝑞 𝛴11 𝛴12


𝐴 = [ 11 ] (𝑝 − 𝑞) 𝛴=[ ]
𝐴21 𝐴22 𝛴21 𝛴22

Then the marginal distribution of 𝐴11 follows Wishart Distribution with variance covariance
matrix 𝛴11 and 𝑛 degrees of freedom.

𝐼 0 𝑞𝑋1
Proof: Let us choose a matrix 𝐻 = [ ] and using it in Theorem 3 we notice
0 0 (𝑝 − 𝑞)𝑋1
that

𝐼 0 𝐴11 𝐴12 𝐼 0
𝐻𝐴𝐻 ′ = [ ][ ][ ] = 𝐴11
0 0 𝐴21 𝐴22 0 0

Similarly

𝐼 0 𝛴11 𝛴12 𝐼 0
𝐻𝛴𝐻 ′ = [ ][ ][ ] = 𝛴11
0 0 𝛴21 𝛴22 0 0
Therefore, 𝐴11 ~𝑊(𝛴11 , 𝑛) ; 𝑛 = (𝑁 − 1)

Corollary 1: All the diagonal submatrices of matrix 𝐴 have Wishart distribution.


1 1
Corollary 2: 𝛴 −2 𝐴𝛴 −2 ~𝑊(𝐼, 𝑛)

Theorem 5: Let then 𝐴~𝑊(𝛴, 𝑛) and let both 𝐴 and 𝛴 be partitioned into 𝑞 and 𝑝 − 𝑞 rows
and columns as before. Then the marginal distribution of 𝐴11.2 follows Wishart Distribution
with variance covariance matrix 𝛴11.2 and (𝑛 − (𝑝 − 𝑞)); degrees of freedom.

Generalized Variance

The multivariate analogue of the variance 𝜎 2 of a univariate distribution, apart from the
covariance matrix 𝛴 is the determinant of variance covariance matrix |𝛴| and is termed as
generalized variance of the multivariate distribution. Similarly, the generalized variance of a
sample vectors 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as

𝑁
1 |𝐴|
|𝑆| = | ∑(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′| =
𝑁 (𝑁 − 1)𝑝
𝛼=1

The generalized variance is also a measure of spread. A geometric interpretation of the


sample generalized variance comes from considering the 𝑝 rows of matrix 𝑋 as 𝑝 vectors in
𝑁-dimensional

space. Geometrically the value of |𝐴| may be interpreted in terms of 𝑁 points in a 𝑝


dimensional space. Let the columns of the matrix 𝑋 be written as points in 𝑝 dimensional
space as follows:

𝑥11 𝑥12 … 𝑥1𝑁


𝑥21 𝑥22 … 𝑥2𝑁
𝑋=[ ⋮ ] = [𝒙𝟏 𝒙𝟐 ⋯ 𝒙𝑵 ]
⋮ ⋱ ⋮
𝑥𝑝1 𝑥𝑝1 … 𝑥𝑝𝑁
(𝑝 𝑋 𝑁)

Clearly when 𝑝 = 1, the value of |𝐴| = ∑𝑁 2


𝛼=1(𝑥𝛼 − 𝑥̅ ) which is the sum of squares of the

distances from the points to the origin. In general, the value of |𝐴| is the sum of squares of
the volumes of all parallelotopes formed by taking as principal edges p vectors from the set
YI"'" YN'

Distribution of Generalized Variance

In order to derive the distribution of the generalized variance, the following Lemma is used:
Lemma: Let 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently distributed each following 𝑁𝑝 (0, 𝛴).
and let 𝐴 = ∑𝑁−1 ′
𝛼=1 𝑦𝛼 𝑦𝛼 = 𝑇 𝑇′ where 𝑇 is a lower triangular matrix 𝑇 = (𝑡𝑖𝑗 ); 𝑡𝑖𝑗 > 0, 𝑖 =

1, . . . , 𝑝, 𝑎𝑛𝑑 𝑡𝑖𝑗 = 0, 𝑖 < 𝑗. Then the elements of the matrix 𝑇 are independently distributed
with 𝑡𝑖𝑗 ~𝑁(0, 1) 𝑖 > 𝑗 and 𝑡𝑖𝑖2 follows a 𝜒 2 distribution with 𝑁 − 𝑖 degrees of freedom.

You might also like