0% found this document useful (0 votes)
346 views4 pages

Advanced Statistics for Analysts

1) The generalized variance is a single numerical value that summarizes the variation described by the sample covariance matrix S, which contains variances and covariances for p variables. 2) It is calculated as the determinant (|S|) of the sample covariance matrix S. 3) Geometrically, the generalized variance is proportional to the square of the volume generated by the p deviation vectors in the data. Thus, a larger generalized variance corresponds to greater spread or variation in the data.

Uploaded by

Iklil Annaufal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
346 views4 pages

Advanced Statistics for Analysts

1) The generalized variance is a single numerical value that summarizes the variation described by the sample covariance matrix S, which contains variances and covariances for p variables. 2) It is calculated as the determinant (|S|) of the sample covariance matrix S. 3) Geometrically, the generalized variance is proportional to the square of the volume generated by the p deviation vectors in the data. Thus, a larger generalized variance corresponds to greater spread or variation in the data.

Uploaded by

Iklil Annaufal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Generalized Variance

With a single variable, the sample variance is often used to describe the amount of
variation in the measurements on that variable. When p variables are observed on each unit, the
variation is described by the sample variance--covariance matrix

S 11 S 12 S1P
S

[
S= 12 22

S

S2P
⋱ ⋮
S 1 P S 2 P ⋯ S PP
=¿
] 1
The sample covariance matrix contains p variances and p( p−1) potentially different
2
covariances. Sometimes it is desirable to assign a single numerical value for the variation
expressed by S. One choice for a value is the determinant of S, which reduces to the usual
sample variance of a single characteristic when p=1. This determinant 2 is called the generalized
sample variance:
Generalized sample variance=|S|
Example 3.7 (Calculating a generalized variance)
Employees ( x 1) and profits per employee ( x 2) for the 16 largest publishing firms in the United
States are shown in Figure 1.3. The sample covariance matrix, obtained from the data in the
April 30, 1990, Forbes magazine article, is
252.04 −68.43
S= [−68.43 123.67 ]
Evaluate the generalized variance.
In this case, we compute
|S|=( 252.04 )( 123. 67 )−(−68.43 )(−68.43 )=26,487
The generalized sample variance provides one way of writing the information on all
variances and covariances as a single number. Of course, when p>1, some information about the
sample is lost in the process. A geometrical interpretation of |S| I will help us appreciate its
strengths and weaknesses as a descriptive summary.
Consider the area generated within the plane by two deviation vectors d 1= y 1 −x́1 1 and
d 2= y 2 −x́2 1. Let Ld be the length of d 1 and Ld the length of d 2. By elementary geometry, we
1 2

have the diagram


and the area of the trapezoid is |Ld sin ⁡(θ)| Ld . Since cos 2 (θ) + sin2 (θ) = 1, we can express this
1 2

area as

Area=Ld Ld √ 1−cos2 (θ)


1 2

From (3-5) and (3-7),


n
Ld =
1 √∑ j=1
( x j 1− x́ 1 )2=√(n−1) s 11

n
Ld =
2 √∑ j=1
( x j 2− x́ 2 )2=√ ( n−1 ) s 22

And
cos ( θ )=r 12

Therefore,

Area=( n−1 ) √ s 11 √ s 22 √ 1−r 212=(n−1) √ s11 s22 (1−r 212)

Also,

s 11 √ s11 √ s 22 r 12
|S|= s 11 s 12 =
|[ s 12 s 22 ]| |[ √ s11 √ s22 r 12 s22 ]|
¿ s11 s 22−s11 s22 (1−r 212 )
If we compare (3-14) with (3-13), we see that

( area )2
|S|=
(n−1)2

Assuming now that |S|=( n−1 )−(p−1) (volume)2 holds for the volume generated in n space by the
p−1 deviation vectors d 1 , d 2 , … , d p−1, we can establish the following general result for p
deviation vectors by induction (see [1], p. 266):

Generalized sample variance=|S|=(n−1)− p (volume)2 (3-15)

Equation (3-15) says that the generalized sample variance, for a fixed set of data, is proportional
to the square of the volume generated by the p deviation vectors3 d 1= y 1− x́ 1 1, d 2= y 2 −x́2 1, ...,
d p= y p−x́ p 1. Figures 3.6(a) and (b) show trapezoidal regions, generated by p=3 residual
vectors, corresponding to "large" and "small" generalized variances.
For a fixed sample size, it is clear from the geometry that volume, or |S|, will increase when the
length of any d 1= y 1 − x́1 1(¿ √ s ii ) is increased. In addition, volume will increase if the residual
vectors of fixed length are moved until they are at right angles to one another, as in figure 3.6(a).
On the other hand, the volume, or |S|, will be small if just one of the sii is small or one of the
deviation vectors lies nearly in the (hyper) plane formed by the others, or both. In the second
case, the trapezoid has very little height above the plane. This is the situation in Figure 3.6(b),
where d 3 lies nearly in the plane formed by d 1 and d 2.

if generalized variance is defined in terms of the sample covariance matrix Sn= [ (n−1)
n ]
S, then,
p

using Result 2A.11, |Sn|=


|[ ] | |[
(n−1)
n
1p S =
(n−1)
n
1 p |S|=
(n−1)
n ] | |[ ] |
|S| . Consequently, using (3-
15), we can also write the following:

Generalized sample variance=|S n|=n− p ( volume)2.

Generalized variance also has interpretations in the p-space scatter plot representation of the
data. The most intuitive interpretation concerns the spread of the scatter about the sample mean
point x́ ' =[ x́ 1 , x́ 2 , … , x́ p ]. Consider the measure of distance-given in the comment below (2-19),
with x́ playing the role of the fixed point μ and S−1 playing the role of A. With these choices, the
coordinates x ' =[ x 1 , x 2 ,... , x p ] of the points a constant distance c from x́ satisfy

( x−x́ )' S−1 ( x−x́ )=c 2 (3-16)


2
( x 1−x́ 1)
[When p=1, ( x− x́ )' S−1 ( x− x́ )= is the squared distance from x 1 to x́ 1 in standard
s 11
deviation units.]
Equation (3-16) defines a hyperellipsoid (an ellipse if p=2) centered at x́. It can be shown using
integral calculus that the volume of this hyperellipsoid is related to |S|. In particular,
1
'
volume of { x : ( x− x́ ) S ( x− x́ ) ≤ c }=k p|S| c p
−1 2 2 (3-17)

Or

( Volume of ellipsoid )2=( constant )( generalized sample variance )


where the constant k p is rather formidable. A large volume corresponds to a large generalized
variance.
Although the generalized variance has some intuitively pleasing geometrical
interpretations, it suffers from a basic weakness as a descriptive summary of the sample
covariance matrix S, as the following example shows.

You might also like