Wishart Distribution

Wishart Distribution
1
If 𝑥1 , 𝑥2 , … . . 𝑥𝑛 is a random sample from 𝑁(𝜇, 𝜎 2 ), and if = (𝑛−1) ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 , then it is
(𝑛 − 1)𝑠 2⁄ 2 2
well known that ( 𝜎 2 ) ~𝜒(𝑛−1) . The multivariate analogue of (𝑛 − 1) 𝑠 is the
matrix 𝐴 = ∑𝑁
𝛼=1(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′ and is called Wishart matrix. The generalization of 𝜒
2
distribution in the multivariate case is called the Wishart Distribution.
𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛: In a random sampling from a 𝑝 variate normal population 𝑁𝑝 (𝜇, 𝛴), the 𝑝 × 𝑝
symmetric matrix of sums of squares and cross products of deviations about the mean of the
sample observations given by
𝐴 = ∑𝑁 𝑁 ′
𝛼=1(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′ = ∑𝛼=1 𝑥𝛼 𝑥𝛼 − 𝑁𝑥̅ . 𝑥̅ ′ = 𝑋𝑋′ − 𝑁𝑥̅ . 𝑥̅ ′
is called the Wishart matrix and the distribution of the matrix 𝐴 is called Wishart distribution.
𝑎11 𝑎12 … 𝑎1𝑁

𝑎 𝑎22 … 𝑎2𝑁
𝐴 = [ 21⋮ ]
⋮ ⋱ ⋮
𝑎𝑝1 𝑎𝑝1 … 𝑎𝑝𝑁
(𝑝 𝑋 𝑝)
By definition of Wishart matrix, it is clear that the Wishart distribution is the joint distribution
𝑝(𝑝+1)
of elements of matrix 𝐴 as it is a symmetric matrix.
2
Probability Density Function of Wishart Distribution
In Lemma 2, we also demonstrated that 𝐴 = ∑𝑁−1 ′

𝛼=1 𝑦𝛼 𝑦𝛼 and we have proved in the Theorem
that the distribution of 𝐴 = 𝑁𝛴̂ is same as the distribution of ∑𝑁−1 ′

𝛼=1 𝑦𝛼 𝑦𝛼 where 𝑦𝛼 ; 𝛼 =
1, 2, … (𝑁 − 1) are independently distributed each following 𝑁𝑝 (0, 𝛴).
The probability density function of Wishart matrix 𝐴 is given by
1 1 −1 𝐴)
|𝐴|2(𝑛−𝑝−1) 𝑒 −2𝑡𝑟(𝛴
𝑓(𝐴) = ; 𝑛 = (𝑁 − 1)
𝑛 +1−𝑖
2𝑛𝑝⁄2 𝜋 𝑝(𝑝−1)⁄4 |𝛴|𝑛⁄2 ∏𝑝𝑖=1 𝛤 ( )
2
and we write that 𝐴~ 𝑊(𝛴, 𝑁 − 1 )
Corollary: Given 𝑝 variate random vectors 𝑥1 , 𝑥2 , … , 𝑥𝑁 from a 𝑝 variate normal

distribution 𝑁𝑝 (𝜇, 𝛴), the distribution of sample variance covariance matrix 𝑆 is
1
𝑊 ((𝑁−1) 𝛴, 𝑁 − 1 )
The Characteristic Function
To determine the characteristic function of a 𝑝 𝑋 𝑝 matrix 𝐴, is given by

𝛷𝐴 (𝛩) = 𝐸(𝑒 𝑖 𝑡𝑟(𝐴𝛩) )
where 𝛩 is a a 𝑝 𝑋 𝑝 real valued positive definite matrix.
Theorem 1: Let 𝐴 = ∑𝑁−1 ′
𝛼=1 𝑦𝛼 𝑦𝛼 , where 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently
distributed each following 𝑁𝑝 (0, 𝛴), be a 𝑝 𝑋 𝑝 matrix following 𝑊(𝛴, 𝑁 − 1 ), then the
𝑛
characteristic function of 𝐴 is given by 𝛷𝐴 (𝛩) = |𝐼 − 2𝑖 𝛴𝛩|− 2
Proof: Let us now consider the characteristic function of vector 𝑋
𝛷𝐴 (𝛩) = 𝐸(𝑒 𝑖 𝑡𝑟(𝐴𝛩) )
𝑁−1 ′
= 𝐸 (𝑒 𝑖 𝑡𝑟(∑𝛼=1 𝑦𝛼𝑦𝛼)𝛩 )
𝑁−1 ′
= 𝐸 (𝑒 𝑖 𝑡 ∑𝛼=1 𝑦𝛼 𝛩𝑦𝛼 )
′
= 𝐸(∏𝑁−1
𝛼=1 𝑒
𝑖 (𝑦𝛼 𝛩𝑦𝛼 )
)
′
= ∏𝑁−1
𝛼=1 𝐸(𝑒
𝑖 (𝑦𝛼 𝛩𝑦𝛼 )
) as 𝑦𝛼 are independently distributed
As 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are identically distributed each following 𝑁𝑝 (0, 𝛴).
′ 𝑛
𝛷𝐴 (𝛩) = [𝐸(𝑒 𝑖 (𝑦 𝛩𝑦)
)] ; 𝑛 = (𝑁 − 1)
Now since 𝛩 is a 𝑝 𝑋 𝑝 real non-singular positive definite matrix, it can be reduced to a
diagonal matrix using a matrix 𝑃 such that
𝑃′𝛩𝑃 = 𝐷 = 𝑑𝑖𝑎𝑔(𝜆1 , 𝜆2 , … . . 𝜆𝑝 )
As 𝛴 is positive definite symmetric, the matrix 𝑃 is used to reduce to an identity matrix
𝑃′ 𝛴 −1 𝑃 = 𝐼
and use this matrix 𝑃 to transform vector 𝑦, using
𝑦 = 𝑃𝑧 so that 𝐸(𝑦) = 𝑃𝐸(𝑧) = 0 and
𝑉(𝑧) = 𝑉(𝑃−1 𝑦) = 𝑃−1 𝛴(𝑃−1 )′ = (𝑃′ 𝛴 −1 𝑃)−1 = 𝐼
Therefore, 𝑧~𝑁𝑝 (0, 𝐼) . Using

′ 𝑃′𝛩𝑃𝑧) 𝑛 ′ 𝐷𝑧) 𝑛
𝛷𝐴 (𝛩) = [𝐸(𝑒 𝑖 (𝑧 )] = [𝐸(𝑒 𝑖 (𝑧 )]
𝜆1 0 ⋯ 0 𝑧1
⋯ 𝑧𝑝 ] [ 0 𝜆2 ⋯ 0 ] [𝑧2 ] = ∑𝑝 𝜆 𝑧 2
Consider 𝑧 ′ 𝐷𝑧 = [𝑧1 𝑧2 ⋮ 𝑗=1 𝑗 𝑗
⋮ ⋮ ⋱ ⋮
0 0 ⋯ 𝜆𝑝 𝑧𝑝
𝑝 𝑛
𝑖 (∑𝑗=1 𝜆𝑗 𝑧𝑗2 )
𝛷𝐴 (𝛩) = [𝐸 (𝑒 )]
2 𝑛
𝑝
= [𝐸 (∏𝑗=1 𝑒 𝑖 𝜆𝑗𝑧𝑗 )]
2 𝑛
= [∏𝑝𝑗=1 𝐸(𝑒 𝑖 𝜆𝑗𝑧𝑗 )]
𝑛
= (∏𝑝𝑗=1 𝜑𝑧 2 (𝜆𝑗 ))
𝑗
As each 𝑧𝑗 ~𝑁(0,1) ∀ 𝑗 = 1,2, … 𝑝, its square follows Chi square distribution. The
characteristic function of 𝑧𝑗2 is given by 𝜑𝑧 2 (𝜆𝑗 ) = (1 − 2𝑖 𝜆𝑗 )

𝑗
Thus,
𝑛
𝛷𝐴 (𝛩) = (∏𝑝𝑗=1(1 − 2𝑖 𝜆𝑗 ))
1 𝑛
= (|𝐼 − 2𝑖 𝐷|−2 )
Using 𝑃′𝛩𝑃 = 𝐷 𝑎𝑛𝑑 𝑃′ 𝛴 −1 𝑃 = 𝐼 in above, we can write

𝑛
𝛷𝐴 (𝛩) = (|𝑃′ 𝛴 −1 𝑃 − 2𝑖 𝑃′𝛩𝑃|)−
𝑛
= (|𝛴 −1 − 2𝑖 𝛩||𝑃𝑃′ |)−2
𝑛
= (|𝛴 −1 − 2𝑖 𝛩||𝛴|)−2
𝑛
= (|𝐼 − 2𝑖 𝛴 𝛩|)− 2 ; 𝑛 = (𝑁 − 1)
Sum of Wishart Matrices

Theorem 2: Let 𝐴1 , 𝐴2 , … . 𝐴𝑘 be 𝑝𝑋 𝑝 matrices each following independent Wishart
𝑗 𝑛
distributions , i.e 𝐴𝑗 ~𝑊(𝛴, 𝑛𝑗 ) where each 𝐴𝑗 = ∑𝛼=1 𝑦𝛼 𝑦𝛼′ ; 𝑗 = 1, 2, … . 𝑘 and 𝑦𝛼 ~𝑁𝑝 (0,
𝛴), then ∑𝑘𝑗=1 𝐴𝑗 ~𝑊(𝛴, ∑𝑘𝑗=1 𝑛𝑗 ).
Proof: As 𝐴𝑗 ~𝑊(𝛴, 𝑛𝑗 ), its characteristic function is given by

𝑛𝑗
𝛷𝐴𝑗 (𝛩) = (|𝐼 − 2𝑖 𝛴 𝛩|)− 2 ; 𝑗 = 1, 2, … . 𝑘
Therefore,
𝑘
𝛷∑𝑘 (𝛩) = 𝐸 (𝑒 𝑖 𝑡𝑟 𝛩(∑𝑗=1 𝐴𝑗 ) )
𝑗=1 𝐴𝑗
= 𝐸 (∏ 𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )
𝑗=1
= ∏ 𝐸(𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )
𝑗=1
= ∏𝑘𝑗=1 𝐸(𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )

𝑛𝑗
= ∏𝑘𝑗=1(|𝐼 − 2𝑖 𝛴 𝛩|)− 2
∑𝑘
𝑗=1 𝑛𝑗
−
= (|𝐼 − 2𝑖 𝛴 𝛩|) 2 ; 𝑛𝑗 = (𝑁𝑗 − 1 )
Transformation of Wishart Matrix
Theorem 3: Let 𝐴 = ∑𝑁−1 ′

𝛼=1 𝑦𝛼 𝑦𝛼 ,where𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently
distributed each following 𝑁𝑝 (0, 𝛴). If 𝑌 → 𝑍 using a 𝑝 𝑋 𝑝 non-singular matrix 𝐻 so that

𝑧𝛼 = 𝐻𝑦𝛼 ; 𝛼 = 1, 2, … 𝑛 ; 𝑛 = (𝑁 − 1), then 𝐻𝐴𝐻 ′ ~𝑊(𝐻𝛴𝐻 ′ , 𝑛).
Proof: 𝐸(𝑧𝛼 ) = 𝐻𝐸(𝑦𝛼 ) = 0
𝑉(𝑧𝛼 ) = 𝑉(𝐻𝑦𝛼 ) = 𝐻𝛴𝐻 ′
𝑧𝛼 ~ 𝑁𝑝 (0, 𝐻𝛴𝐻 ′ )
We also notice that 𝐻𝐴𝐻 ′ = 𝐻(∑𝑛𝛼=1 𝑦𝛼 𝑦𝛼′ )𝐻 ′ = ∑𝑛𝛼=1( 𝐻𝑦𝛼 ) (𝐻𝑦𝛼 )′ = ∑𝑛𝛼=1 𝑧𝛼 𝑧𝛼′
Hence 𝐻𝐴𝐻 ′ = ∑𝑛𝛼=1 𝑧𝛼 𝑧𝛼′ ~𝑊(𝐻𝛴𝐻 ′ , 𝑛)
Marginal Distribution
Theorem 4: Let then 𝐴~𝑊(𝛴, 𝑛) and let both 𝐴 and 𝛴 be partitioned into 𝑞 and 𝑝 − 𝑞 rows
and columns as follows:
𝐴 𝐴12 𝑞 𝛴11 𝛴12

𝐴 = [ 11 ] (𝑝 − 𝑞) 𝛴=[ ]
𝐴21 𝐴22 𝛴21 𝛴22
Then the marginal distribution of 𝐴11 follows Wishart Distribution with variance covariance
matrix 𝛴11 and 𝑛 degrees of freedom.
𝐼 0 𝑞𝑋1
Proof: Let us choose a matrix 𝐻 = [ ] and using it in Theorem 3 we notice
0 0 (𝑝 − 𝑞)𝑋1
that
𝐼 0 𝐴11 𝐴12 𝐼 0
𝐻𝐴𝐻 ′ = [ ][ ][ ] = 𝐴11
0 0 𝐴21 𝐴22 0 0
Similarly
𝐼 0 𝛴11 𝛴12 𝐼 0
𝐻𝛴𝐻 ′ = [ ][ ][ ] = 𝛴11
0 0 𝛴21 𝛴22 0 0
Therefore, 𝐴11 ~𝑊(𝛴11 , 𝑛) ; 𝑛 = (𝑁 − 1)
Corollary 1: All the diagonal submatrices of matrix 𝐴 have Wishart distribution.

1 1
Corollary 2: 𝛴 −2 𝐴𝛴 −2 ~𝑊(𝐼, 𝑛)
Theorem 5: Let then 𝐴~𝑊(𝛴, 𝑛) and let both 𝐴 and 𝛴 be partitioned into 𝑞 and 𝑝 − 𝑞 rows
and columns as before. Then the marginal distribution of 𝐴11.2 follows Wishart Distribution
with variance covariance matrix 𝛴11.2 and (𝑛 − (𝑝 − 𝑞)); degrees of freedom.
Generalized Variance
The multivariate analogue of the variance 𝜎 2 of a univariate distribution, apart from the
covariance matrix 𝛴 is the determinant of variance covariance matrix |𝛴| and is termed as
generalized variance of the multivariate distribution. Similarly, the generalized variance of a
sample vectors 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as
𝑁
1 |𝐴|
|𝑆| = | ∑(𝑥𝛼 − 𝑥̅ )(𝑥𝛼 − 𝑥̅ )′| =
𝑁 (𝑁 − 1)𝑝
𝛼=1
The generalized variance is also a measure of spread. A geometric interpretation of the

sample generalized variance comes from considering the 𝑝 rows of matrix 𝑋 as 𝑝 vectors in
𝑁-dimensional
space. Geometrically the value of |𝐴| may be interpreted in terms of 𝑁 points in a 𝑝

dimensional space. Let the columns of the matrix 𝑋 be written as points in 𝑝 dimensional
space as follows:
𝑥11 𝑥12 … 𝑥1𝑁

𝑥21 𝑥22 … 𝑥2𝑁
𝑋=[ ⋮ ] = [𝒙𝟏 𝒙𝟐 ⋯ 𝒙𝑵 ]
⋮ ⋱ ⋮
𝑥𝑝1 𝑥𝑝1 … 𝑥𝑝𝑁
(𝑝 𝑋 𝑁)
Clearly when 𝑝 = 1, the value of |𝐴| = ∑𝑁 2

𝛼=1(𝑥𝛼 − 𝑥̅ ) which is the sum of squares of the
distances from the points to the origin. In general, the value of |𝐴| is the sum of squares of
the volumes of all parallelotopes formed by taking as principal edges p vectors from the set
YI"'" YN'
Distribution of Generalized Variance
In order to derive the distribution of the generalized variance, the following Lemma is used:
Lemma: Let 𝑦𝛼 ; 𝛼 = 1, 2, … (𝑁 − 1) are independently distributed each following 𝑁𝑝 (0, 𝛴).
and let 𝐴 = ∑𝑁−1 ′
𝛼=1 𝑦𝛼 𝑦𝛼 = 𝑇 𝑇′ where 𝑇 is a lower triangular matrix 𝑇 = (𝑡𝑖𝑗 ); 𝑡𝑖𝑗 > 0, 𝑖 =
1, . . . , 𝑝, 𝑎𝑛𝑑 𝑡𝑖𝑗 = 0, 𝑖 < 𝑗. Then the elements of the matrix 𝑇 are independently distributed
with 𝑡𝑖𝑗 ~𝑁(0, 1) 𝑖 > 𝑗 and 𝑡𝑖𝑖2 follows a 𝜒 2 distribution with 𝑁 − 𝑖 degrees of freedom.

Wishart Distribution

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wishart Distribution

Uploaded by

Copyright:

Available Formats

Wishart Distribution

distribution in the multivariate case is called the Wishart Distribution.

𝑎11 𝑎12 … 𝑎1𝑁

Probability Density Function of Wishart Distribution

In Lemma 2, we also demonstrated that 𝐴 = ∑𝑁−1 ′

that the distribution of 𝐴 = 𝑁𝛴̂ is same as the distribution of ∑𝑁−1 ′

1, 2, … (𝑁 − 1) are independently distributed each following 𝑁𝑝 (0, 𝛴).

The probability density function of Wishart matrix 𝐴 is given by

Corollary: Given 𝑝 variate random vectors 𝑥1 , 𝑥2 , … , 𝑥𝑁 from a 𝑝 variate normal

To determine the characteristic function of a 𝑝 𝑋 𝑝 matrix 𝐴, is given by

𝑉(𝑧) = 𝑉(𝑃−1 𝑦) = 𝑃−1 𝛴(𝑃−1 )′ = (𝑃′ 𝛴 −1 𝑃)−1 = 𝐼

Therefore, 𝑧~𝑁𝑝 (0, 𝐼) . Using

characteristic function of 𝑧𝑗2 is given by 𝜑𝑧 2 (𝜆𝑗 ) = (1 − 2𝑖 𝜆𝑗 )

Using 𝑃′𝛩𝑃 = 𝐷 𝑎𝑛𝑑 𝑃′ 𝛴 −1 𝑃 = 𝐼 in above, we can write

Sum of Wishart Matrices

Proof: As 𝐴𝑗 ~𝑊(𝛴, 𝑛𝑗 ), its characteristic function is given by

= ∏𝑘𝑗=1 𝐸(𝑒 𝑖 𝑡𝑟𝐴𝑗 𝛩 )

Transformation of Wishart Matrix

Theorem 3: Let 𝐴 = ∑𝑁−1 ′

distributed each following 𝑁𝑝 (0, 𝛴). If 𝑌 → 𝑍 using a 𝑝 𝑋 𝑝 non-singular matrix 𝐻 so that

Proof: 𝐸(𝑧𝛼 ) = 𝐻𝐸(𝑦𝛼 ) = 0

𝑉(𝑧𝛼 ) = 𝑉(𝐻𝑦𝛼 ) = 𝐻𝛴𝐻 ′

Hence 𝐻𝐴𝐻 ′ = ∑𝑛𝛼=1 𝑧𝛼 𝑧𝛼′ ~𝑊(𝐻𝛴𝐻 ′ , 𝑛)

𝐴 𝐴12 𝑞 𝛴11 𝛴12

Corollary 1: All the diagonal submatrices of matrix 𝐴 have Wishart distribution.

The generalized variance is also a measure of spread. A geometric interpretation of the

space. Geometrically the value of |𝐴| may be interpreted in terms of 𝑁 points in a 𝑝

𝑥11 𝑥12 … 𝑥1𝑁

Clearly when 𝑝 = 1, the value of |𝐴| = ∑𝑁 2

Distribution of Generalized Variance

You might also like