You are on page 1of 6

LESSON 1(3hrs)

MULTIVARIATE DATA AND DATA MATRIX


1.1 Introduction
In this Lesson we introduce the concept of data matrix as we give some example of
multivariate data in real life situations.

1.2 Lesson Learning Outcomes


By the end of this Lesson you shall be able to:
1.2.1 Evaluate a sample mean vector and dispersion matrix from a data matrix
1.2.2 Evaluate the sample correlation matrix

1.2.1 Sample mean vector and dispersion matrix from a data matrix
In this section you shall learn how to compute the sample statistics of multivariate data
through some examples.

Example 1.1 : Measurements were taken on ten flea beetle Halticus. These are the thorax
length X 1 (in microns) and elytra length X 2 (in 0.01 mm).

Subject
Characteristic 1 2 3 4 5 6 7 8 9 10
Thorax 180 192 217 221 171 192 213 192 170 201
length( X 1 )
Elytra 245 260 276 299 239 262 278 255 244 276
Length( X 2 )

Compute the sample mean vector and the sample dispersion matrix.
Solution: We have

⎛ x ⎞ ⎛ 194.9 ⎞ ⎛ 330.32 325.27 ⎞ ⎛ 297.29 292.74 ⎞


X =⎜ 1 ⎟=⎜ ⎟ , Su = ⎜ ⎟ or S = ⎜ ⎟
⎝ x2 ⎠ ⎝ 263.4 ⎠ ⎝ 325.27 254.71 ⎠ ⎝ 292.74 319.24 ⎠

Where Su is the unbiased estimate while S is the biased estimate.

Example 1.2: Eight men each received a certain drug. The recorded changes in blood
sugar and blood pressure(systolic and diastolic) are listed in the table below:

1
Subject
Characteristic 1 2 3 4 5 6 7 8
Blood sugar( X 1 ) 30 90 -10 10 30 60 0 40
Blood -8 7 -2 0 -2 0 -2 1
pressure(systolic-
X2 )
Blood -1 6 4 2 5 3 4 2
pressure(diastolic-
X3 )

Compute the sample mean vector and the sample dispersion matrix.

Solution: We have

⎛ x1 ⎞ ⎛ 31.25 ⎞ ⎛ 1069.64 82.5 16.964 ⎞


⎜ ⎟ ⎜ ⎟ ⎜ ⎟
X = ⎜ x2 ⎟ = ⎜ −0.75 ⎟ , Su = ⎜ 82.5 17.537 6.393 ⎟ or
⎜ x ⎟ ⎜ 3.125 ⎟ ⎜ 16.964 6.393 4.696 ⎟
⎝ 3⎠ ⎝ ⎠ ⎝ ⎠

⎛ 935.94 72.19 14.84 ⎞


⎜ ⎟
S = ⎜ 72.19 15.69 5.59 ⎟
⎜ 14.84 5.59 4.11 ⎟
⎝ ⎠
Where Su is the unbiased estimate while S is the biased estimate.

Data Matrix

In general we can cast multivariate data in a data matrix as:

⎛ x11 x12 ... x1n ⎞


⎜ ⎟
⎜ x21 x22 ... x2 n ⎟
X =⎜ . . ... . ⎟
⎜ ⎟
⎜ . . ... . ⎟
⎜x xp2 ... x pn ⎟⎠
⎝ p1
Where the rows represent the number of variables in the multivariate vector while the
columns represent the observations. Thus the above matrix represents p − variate vector
observed n times

In the above Examples the components of X are computed by

2
n

∑X
1
Xi = ik , i = 1, 2, ..., p
n k =1

⎛ n n


1 n
∑ ∑X X ik jk ⎟
the components of S are computed by sij = ⎜ X ik X jk −
n ⎜ k =1
∑ k =1
n
k =1 ⎟,

⎜⎜ ⎟⎟
⎝ ⎠
X i, j = 1, 2, ..., p

while the components of Su are computed by

⎛ n n


1 ⎜ n
∑ ∑X X ik jk ⎟
sij =
n − 1 ⎜ k =1

X ik X jk − k =1
n
k =1 ⎟ , i, j = 1, 2, ..., p

⎜⎜ ⎟⎟
⎝ ⎠

The vector X is the sample mean vector while S and Su are called sample variance-
covariance matrices or dispersion matrices. In Example 1.1 p = 2 and n = 10 while in
Example 1.2 p = 3 and n = 8

1.2.2 The correlation matrix


If we define si = sii then we can define the correlation matrix as

⎛ 1 r12 ... r1 p ⎞ ⎛ s11 s12 ... s1 p ⎞


⎜ ⎟ ⎜ ⎟
⎜ r21 1 ... r2 p ⎟ ⎜ s21 s22 ... s2 p ⎟
R=⎜ . . ... . = D .S .D , where S = ⎜ .
⎟ − 12 − 12
. ... . ⎟ ,
⎜ ⎟ ⎜ ⎟
⎜ . . ... . ⎟ ⎜ . . ... . ⎟
⎜r rp 2 ... 1 ⎠ ⎟ ⎜s sp2 ... s pp ⎟⎠
⎝ p1 ⎝ p1

⎛1 ⎞
⎜s 0 ... 0 ⎟
⎜ 1 ⎟
⎜ 1 ⎟
⎜0 ... 0 ⎟
D − 12
=⎜
s2 ⎟ = diag ⎛⎜ 1 , 1
, ...,
1 ⎞

⎜ . . ... . ⎟ ⎜ s1 s2 sp ⎟
⎜ ⎟ ⎝ ⎠
⎜ . . ... . ⎟
⎜ 1 ⎟
⎜0 0 ... ⎟
⎜ s p ⎟⎠

Example 1.3 Compute the sample correlation matrix for the data in Example 1.1

3
Solution:

⎛ 1 0.95 ⎞
R=⎜ ⎟
⎝ 0.95 1 ⎠

1.5 Exercise

1.5.1. Given the seven pairs of measurements ( X 1 , X 2 )

X1 3 4 2 6 8 2 5
X2 5 5.5 4 7 10 5 7.5

(a) Calculate the sample means X 1 , X 2 ,the sample variances S11 , S 22 the and
sample covariance S12 .
(b) Use the results of part (i) to form the mean vector X and the sample correlation
matrix ℜ .
(c) Plot the graph of X 1 against X 2

1.5.2. A morning newspaper lists the following used-car prices for a particular model of a
car with age( X 1 ) measured in years and selling price measured in millions(Ksh)

X1 3 5 5 7 7 7 8 9 10 11
X2 2.3 1.9 1.0 0.70 0.30 1.00 1.05 0.45 0.70 0.30

(a) Construct a scatter plot of the data


(b) Infer the sign of the sample covariance S12 from the scatter plot.
(c) Compute the sample means X 1 and X 2 , the sample variances S11 , S 22 , the sample
covariance S12 and the sample correlation coefficient r12 . interpret these quantities

4
(d) Use the quantities computed in part(c) to form the sample mean vector X , the
sample variance-covariance matrix S and the correlation matrix ℜ .

1.5.3. Three measurements X 1 , X 2 and X 3 were made on five subject in an experiment


and the results shown in the table below:
X1 9 2 6 5 8
X2 12 8 6 4 10
X3 3 4 0 2 1

Compute X , S , and ℜ .

1.5.4. The 10 largest industrial corporations in KENAI yield the following data in
millions of dollars
Company X 1 =sales X 2 =profits X 3 =assets

A 126 4.2 173


B 97 3.8 160
C 86 3.5 83
D 63 3.7 78
E 55 3.9 128
F 50 1.8 39
G 39 2.9 38
H 36 0.4 51
I 35 2.4 34
J 32 2.4 26

(a) Find X , S and ℜ .


Plot the scatter graphs of ( X 2 , X 3 )and ( X 1 , X 3 ) and comment on the patterns

Summary
In this Lesson we have considered data matrix as we give some example of multivariate
data in real life situations. In particular we have:

5
1. Evaluated a sample mean vector and dispersion matrix from a data matrix
2. Evaluated the sample correlation matrix.

References

1. Manly, B.F.J.(2004). Multivariate Statistical Methods: A Primer, 3rd Edition.


Chapman& Hall/HRC. ISBN-1584884142, ISBN-13: 978-1583883149.
2. Morrison, D. F. (2004). Multivariate Statistical Methods; 4th Edition; McGraw
Hill; ISBN: 07-043185.
3. Krzanowski, W. J. (2000). Principles of Multivariate Analysis; 2nd Edition;
Oxford University Press; ISBN: 0198507089, 97801198507086.

4. Chartfield, C. and Collins A. J. (1980). Introduction to Multivariate Analysis; 1st


Edition; Chapman and Hall; ISBN: 978-3-642-80330-7, ISBN: 978-3642-80328-
5. https://www.worldcat.org/title/applied-multivariate-analysis/oclc/1035710263

You might also like