You are on page 1of 7

University of The Armed Forces ESPE.

Telecommunications Engineering 1

Erick Chicaiza Cuasapaz, and Brenda Iza Flores , (Escuela Politécnica del ejército - ESPE,
CORRELATION AND COVARIANCE
eschicaiza@espe.edu.ec, byizaa@espe.edu.ec)

MATRIX
A. Exercise 33.
Abstract— The following work presents the performance of
three exercises on variance and correlation of bivariate random
Statement:
variables, apart from the respective definition on each of these
topics; Each exercise that is solved in this document has its Reaven and Miller measured five variables in a comparison
resolution manually through formulas and also has its resolution of normal and diabetic patients. The following table provides
in Matlab, which consists of the code and screenshots of the partial data for normal patients only. The three variables of
simulations. greatest interest were:

x 1=impaired glucose tolerance ,


I. INTRODUCTION
x 2=insulin response ¿ oral glucose ,
T O talk about variance and correlation of bivariate random
variables in this document, the models that are taken as a
basis to solve the exercises are: correlation matrices and
x 3=insulin resistance .

covariance matrices, which help to reach the solution quickly. The two additional variables of least interest were.
Thus, with the correlation and covariance matrix, we can
measure the linear relationship of each pair of elements and y 1=relative weight ,
also measure the degree of the linear relationship. Part of the y 2=fasting plasma glucose .
information taken to solve the exercises and present the theory
is extracted mainly from the course taught on Stochastic a) Find the sample covariance matrix S and the
Processes, also from web pages found at the bottom of the correlation matrix R.
page, in the references. b) Try to make an interpretation of the results, especially
with respect to the correlation matrix, even if we are
II. DEVELOPMENT not experts in medicine.
Theoretical basis.
TABLE I
Correlation matrix RELATIVE WEIGHT, BLOOD GLUCOSE, AND INSULIN
The correlation matrix displays Pearson's correlation values, LEVELS.
which measure the degree of linear relationship between each
pair of items or variables. Correlation values can be between -1 Number
and +1. However, in practice, the items generally have positive of y1 y2 x1 x2 x3
correlations. If the two items tend to increase or decrease at the patients
same time, the correlation value is positive [6]. 1 0,81 80 356 124 55
2 0,95 97 289 117 76
3 0,94 105 319 143 105
Covariance matrix
4 1,04 90 356 199 108
The covariance matrix displays the covariance values, which 5 1 90 323 240 143
measure the linear relationship of each pair of elements or 6 0,76 86 381 157 165
variables. Positive covariance values indicate that values above 7 0,91 100 350 221 119
the average of one variable are associated with values above 8 1,1 85 301 186 105
the average of the other variable and that values below the 9 0,99 97 379 142 98
average of one variable are associated with values below the 10 0,78 97 296 131 94
average of the other variable. Negative covariance values 11 0,9 91 353 221 53
12 0,73 87 306 178 66
indicate that values above the average of one variable are
13 0,96 78 290 136 142
associated with values below the average of the other variable 14 0,84 90 371 200 93
[6]. 15 0,74 86 312 208 68
16 0,98 80 393 202 102
Unlike the correlation coefficient, the covariance is not 17 1,1 90 364 152 76
standardized. Therefore, the covariance values can be between 18 0,85 99 359 185 37
negative infinity and positive infinity and can be difficult to 19 0,83 85 296 116 60
interpret. To more easily interpret the linear relationship 20 0,93 90 345 123 50
21 0,95 90 378 136 47
between each pair of items or variables, use the correlation
22 0,74 88 304 134 50
matrix [6]. 23 0,95 95 347 184 91
Exercises: 24 0,97 90 327 192 124
University of The Armed Forces ESPE. Telecommunications Engineering 2

25 0,72 92 386 279 74


26 1,11 74 365 228 235
27 1,2 98 365 145 158 0.01618 0.21603 0.78717 −0.21385 2.18907
28
29
30
31
32
33
34
1,13
1
0,78
1
1
0,71
0,76
100
86
98
70
99
75
90
352
325
321
360
336
352
353
172
179
222
134
143
169
263
140
145
99
90
105
32
165
(
0.21603 70.5589 26.229 −23.956 −20.8415
S jk= 0.7817 26.229 1106.41 396 .732 108.384
−0.21385 −23.956 396.732 2381.88 1142.64
2.18907 −20.8415 108.384 1142.64 2136.4
)
35 0,89 85 373 174 78 For the correlation matrix R we have the following equation.
36 0,88 99 376 134 80
37 1,17 100 367 182 54 Sik S jk
38 0,85 78 335 241 175 R=r ik = =
39 0,97 106 396 128 80 √ S jj S kk S j Sk
40 1 98 277 222 186
41 1 102 378 165 117
The correlation matrix can be determined from the
42 0,89 90 360 282 160
43 0,98 94 291 94 71 covariance matrix with the following relationship:
44 0,78 80 269 121 29
45 0,74 93 318 73 42 R=Ds−1 S D s−1
46 0,91 86 328 106 56

Solution: So the diagonal matrix D s =diag ( √ S 11 , √ S22 , … , √ S pp )


a)
0.1272 0 0 0 0

( )
For the sample covariance matrix S, a 5x5 matrix has to be
found for the 5 variables that we have, thus we find the
0 8.3999 0 0 0
different values with the following equations: Ds = 0 0 33.2627 0 0
0 0 0 48.8045 0
1
n 0 0 0 0 46. 2212
S jk=
n−1 (∑
i=1
y ij y ik −n ý j ý k ) 7.8616 0 0 0 0

ý=
Being n = 46, the average vectors of each column are first
determined.
1
∑y
n

n i=1 i
ý ' =(0.91783 , 90.413)
D s−1= 0
0

0
0
(0.1190

0
0
0
0 0.03006
0
0
0
0
0.02049
0
0
0
0
0.02163
)
x́ ' =(340.826 , 171.37 , 97.7826) 0.1272 1.6983 6.18844 −1.0087 17.20

(
−2
2.5718 x 10 8.3999 3.1225 −2.8519 −2.48
n
Ds−1 S= 2.35 x 10−2 0.7885 33.2627 11.9272 3.258
The terms of the summation are determined ∑ yij y ik −4.381 x 10−3 −4.908 x 10 0.7309 48.8045 2.341
−2
i=1
4.736 x 10−2 −0.4509 2.3489 24.7211 46.22
S11
46 1 0.2022 0.1860 −3.44 x 10−2
∑ yi 1 y i 1=( 0.81 )( 0.81 )+( 0.95 )( 0.95 )+ …+( 0.91 ) ( 0.91 )=39.47880.2022
i=1

S11 =
1
46−1
( 39.4788−46(0.91783)(0.91783)

y1 y1 y1 y2 y1 x1 y1 x2 y1 x3
) =39.4788
R= 0.1847
−3.44 x (
10−2
−5.843
1
0.0938
x 10−2
0.0938 −5.843 x 10−2 −5
1
0.2197
0.3723 −5.368 x 10−2 7.049 x 10−2
0.2444
1
0.5065
7.0

100 20.22 18.60 −3.44 37.23

(
y2 y1 y2 y2 y2 x1 y2 x2 y2 x3
S jk= x 1 y 1 x1 y 2 x1 x 1 x 1 x 2 x 1 x 3
x 2 y 1 x2 y 2 x2 x 1 x 2 x 2 x 2 x 3
x3 y 1 x3 y 2 x3 x 1 x 3 x 2 x 3 x 3
) (
20.22 100
R %= 18.47 9.38
9.38 −5.843−5.368
100 24.44 7.0496
−3.44 −5.8435 21.97 100 50.65
37.23 −5.3680 7.0496 50.65 100
)
University of The Armed Forces ESPE. Telecommunications Engineering 3

For the sample covariance matrix S, a 5x5 matrix has to be


b) found for the 5 variables that we have, so we find the different
values with the following equations:
Given the correlation matrix, it can be determined that there
is a 50.65% correlation between the response variables of n
1
insulin to oral glucose and insulin resistance, which are
variables of great interest to take into account if the patient is
diabetic since It could have consequences if your insulin
S jk=
n−1 ( ∑ y ij y ik −n ý j ý k
i=1
)
response and resistance are not adequate.
Although the correlation of the variables insulin resistance Being n = 46, the average vectors of each column are first
and relative weight is less than 50% (this being 37.23%), they determined.
n
are important since there are patients of different weights 1
(normal, overweight and obesity) who may have a different ý= ∑y
n i=1 i
resistance to insulin given mostly to people who are overweight
or obese.
ý ' =(48.655 , 49.625 , 50.57 ,51.45)
Those that are least correlated are the variables related to
fasting plasma glucose, which is a test that measures the level n
of glucose in the blood, although it has a small correlation of The terms of the summation are determined ∑ yij y ik
9.38% with glucose intolerance, this value being too low it i=1
might not be of great importance.
B. Exercise .34 S11

Statement: 20

The data in the table consist of y1, y2, y3 and y4 ∑ yi 1 y i 1=( 47.8 ) ( 47.8 )+ ( 46.4 ) ( 46.4 ) +…+( 46.3 )( 46.3 )=4746
measurements of the limb bone of 4 different ages in each of i=1

the 20 children. 1
S11 = ( 47.466 .45−20(48.655)(48.655) )=39.4788
20−1
a) Find ý , S and R
b) Find |S| and tr(S) y1 y 1 y 1 y 2 y1 y 3 y 1 y 4

8yr
TABLE II
BRANCH BONE LENGTH AT FOUR AGES FOR 20 CHILDREN

81/2y
r
Age
9yr 91/2yr
(
S jk= y 2 y 1 y 2 y 2 y 2 y 3 y 2 y 4
y 3 y 1 y 3 y 2 y3 y 3 y 3 y 4
y4 y1 y4 y2 y4 y3 y4 y4
)
Individual y1 y2 y3 y4
6.3299 6.1890 5.777 5.5481

( )
1 47,8 48,8 49 49,7
2 46,4 47,3 47,7 48,4
S jk= 6.1890 6.4493 6.1534 5.9234
3 46,3 46,8 47,8 48,5 5.777 6.1534 6.918 6.9463
4 45,1 45,3 46,1 47,2
5 47,6 48,5 48,9 49,3 5.5481 5.9234 6. 9463 7.4647
6 52,5 53,2 53,3 53,7
7 51,2 53 54,3 54,5
8 49,8 50 50,3 52,7
9 48,1 50,8 52,3 54,4
10 45 47 47,3 48,3 For the correlation matrix R we have the following equation:
11 51,2 51,4 51,6 51,9
12 48,5 49,2 53 55,5 Sik S
13 52,1 52,8 53,7 55 R=r ik = = jk
14 48,2 48,9 49,3 49,8 √ S jj S kk S j Sk
15 49,6 50,4 51,2 51,8
16 50,7 51,7 52,7 53,3
The correlation matrix can be determined from the
17 47,2 47,7 48,4 49,5
18 53,3 54,6 55,1 55,3
covariance matrix with the following relationship:
19 46,2 47,5 48,1 48,4
20 46,3 47,6 51,3 51,8 R=D s−1 S D s−1
Solution:
a)
So the diagonal matrix Ds =diag( √ S 11 , √ S22 , … , √ S pp )
University of The Armed Forces ESPE. Telecommunications Engineering 4

2.5159 0 0 0 |S jk|=1.0683
0
0
(
Ds = 0 2.5395 0
0 2.6302 0
0
0

0 2.7321
) The trace of the matrix S is obtained by adding the
components of the diagonal of the matrix S jk.

tr ( S jk )=6.3299+6.4493+6.918+ 7.4647
0.3974 0 0 0
−1
Ds = 0
0
0
(
0.3937 0
0 0.38 0
0
0

0 0.36601
) tr ( S jk )=27.1619

C. Exercise 35.

2.5159 2.4599 2.2961 2.2052 Statement:


Ds −1

(
S= 2.4370 2.5395 2.4230 2.3325
2.1964 2.3395 2.6302 2.6409
2.0307 2.1680 2.5424 2.7322
) The data in Table 3.7 consist of head measurements of the
first and second children (Frets 1921). Define y1 and y2 as the
measures of the first child and x1 and x2 for the second child.
a) Find S , D S , R b) Find the mean vector for the four

1 0.9687 0.8730 0.8072 variables and divide it into ( x́ý ) .

(
R= 0.9687 1 0.9212 0.8537
0.8730 0.9212 1 0.9666
0.8072 0.8537 0.9666 1
) TABLE III
X AND Y VALUES TAKEN FROM THE MEASUREMENTS OF THE HEAD
OF THE FIRST AND SECOND CHILD.
First Son Second Son
100 96.87 87.30 80.72 Head Head Head Head

(
R %= 96.87 100 92.12 85.37
87.30 92.12 100 96.66
80.72 85.37 96.66 100
) length
y1
191
195
181
breadth
y2
155
149
148
length
x1
179
201
185
Breadth
x2
145
152
149
Correlation conclusions: 183 153 188 149
176 144 171 142
208 157 192 152
The branch bone identifies approximately the age of a child, 189 150 190 149
for which it is differentiated that there is a greater correlation 197 159 189 152
between the ages of 8 and 8 (1/2) years of 96.87% because they 188 152 197 159
are close ages, as well as the ages of 9 and 9 (1/2) years which 192 150 187 151
have a correlation of 96.66%. 179 158 186 148
183 147 174 147
174 150 185 152
Because the ages of the children vary by half a year each, 190 159 195 157
the next correlation with the highest percentage is that of 8 188 151 187 158
(1/2) and 9 years with a percentage of 92.12%. 163 137 161 130
195 155 183 158
186 153 173 148
As said previously, the branch bone has different mediated
181 145 182 146
according to the age of the children, since the ages do not have 175 140 165 137
a great variation, it is observed that the correlation is high and 192 154 185 152
varies between 80 and 87% in the ages of 8 -9 (1/2) years, 8 174 143 178 147
(1/2) -9 (1/2) years and 8-9 years. 176 139 176 143
197 167 200 158
190 163 187 150
The determinant of the covariance matrix is shown below |
S|.
Solution:

6.3299 6.1890 5.777 5.5481 a)

|
|S jk|= 6.1890 6.4493 6.1534 5.9234
5.777 6.1534 6.918 6.9463
5.5481 5.9234 6. 9463 7.4647
| Sample covariance matrix S

For S we have to find a 4x4 matrix for the 4 variables that


we have, thus we find the different values with the following
equations:
University of The Armed Forces ESPE. Telecommunications Engineering 5

n
1
S jk=
n−1 (∑i=1
y ij y ik −n ý j ý k ) b)

Dfffff
Being n = 25, the average vectors of each column are first
determined. D. Code made in Matlab
n
1
ý= ∑ y i % Informe 1.
n i=1
% Matrices de covarianza (S) y correlación (R).
' clc, clear all
ý =(185.72, 151.12) % Ejercicio 33
x́ ' =(183.84 ,149.24) datos1 = readmatrix('Ejercicio_33.xlsx'); % Datos
importados de un documento .xlsx
n S1 = cov(datos1) % Matriz de covarianza determinada
The terms of the summation are determined yij y ik
∑ con el comando conv(A)
i=1 D_s1 = sqrt(diag(diag(S1))) % Matriz diagonal Ds
S11 invD1 = inv(D_s1) % Matriz inversa de D_s^-1
25 Corr = invD1*S1*invD1 % Matriz de correlación
∑ yi 1 y i 1=( 191 )( 191 ) + ( 195 ) ( 195 ) +…+ ( 190 ) ( 190 )=864585
determinada por D_s^-1*S*D_s^-1
i=1 R1 = corrcoef(datos1) % Matriz de correlación
1
S =
11 864585−25(185.72)(185.72) =95.29333 determinada por la función corrcoef(A)
( )
25−1
% Ejercicio 34
datos2 = readmatrix('Ejercicio_34.xlsx'); % Datos
y1 y1 y1 y2 y1 x1 y1 x2

( )
importados de un documento .xlsx
S jk= y 2 y 1 y 2 y 2 y 2 x 1 y 2 x 2 S2 = cov(datos2) % Matriz de covarianza determinada
x 1 y 1 x1 y 2 x1 x 1 x 1 x 2 con el comando conv(A)
x 2 y 1 x2 y 2 x2 x 1 x 2 x 2 D_s2 = sqrt(diag(diag(S2))) % Matriz diagonal Ds
invD2 = inv(D_s2) % Matriz inversa de D_s^-1
Corr2 = invD2*S2*invD2 % Matriz de correlación
95.29333 52.86833 69.66166 46.11167 determinada por D_s^-1*S*D_s^-1

(
S jk= 52.86833 54.36 51.31167 35.05333
69.66166 51.31167 100.8067 56.54
46.11167 35.05333 56.54 45.02333
) R2 = corrcoef(datos2) % Matriz de correlación
determinada por la función corrcoef(A)

% Ejercicio 35
Matrix DS
datos3 = readmatrix('Ejercicio_35.xlsx'); % Datos
Sik S
R=r ik = = jk importados de un documento .xlsx
√ S jj S kk S j Sk S3 = cov(datos3) % Matriz de covarianza determinada
con el comando conv(A)
The correlation matrix can be determined from the D_s3 = sqrt(diag(diag(S3))) % Matriz diagonal Ds
covariance matrix with the following relationship: invD3 = inv(D_s3) % Matriz inversa de D_s^-1
R=D s−1 S D s−1 Corr3 = invD3*S3*invD3 % Matriz de correlación
determinada por D_s^-1*S*D_s^-1
R3 = corrcoef(datos3) % Matriz de correlación
So the diagonal matrix D s =diag ( √ S 11 , √ S22 , … , √ S pp ) determinada por la función corrcoef(A)

D s =¿
Program Run
Ds−1=¿
S1 =
−1
D s S=¿ 1.0e+03 *
Correlation matrix R
0.0000 0.0002 0.0008 -0.0002 0.0022
R=¿ 0.0002 0.0706 0.0262 -0.0240 -0.0208
0.0008 0.0262 1.1064 0.3967 0.1084
R %=¿ -0.0002 -0.0240 0.3967 2.3819 1.1426
0.0022 -0.0208 0.1084 1.1426 2.1364
University of The Armed Forces ESPE. Telecommunications Engineering 6

D_s1 = Corr2 =

0.1272 0 0 0 0 1.0000 0.9687 0.8730 0.8071


0 8.3999 0 0 0 0.9687 1.0000 0.9212 0.8537
0 0 33.2628 0 0 0.8730 0.9212 1.0000 0.9666
0 0 0 48.8045 0 0.8071 0.8537 0.9666 1.0000
0 0 0 0 46.2212

R2 =
invD1 =
1.0000 0.9687 0.8730 0.8071
7.8612 0 0 0 0 0.9687 1.0000 0.9212 0.8537
0 0.1190 0 0 0 0.8730 0.9212 1.0000 0.9666
0 0 0.0301 0 0 0.8071 0.8537 0.9666 1.0000
0 0 0 0.0205 0
0 0 0 0 0.0216
S3 =

Corr = 95.2933 52.8683 69.6617 46.1117


52.8683 54.3600 51.3117 35.0533
1.0000 0.2022 0.1860 -0.0344 0.3723 69.6617 51.3117 100.8067 56.5400
0.2022 1.0000 0.0939 -0.0584 -0.0537 46.1117 35.0533 56.5400 45.0233
0.1860 0.0939 1.0000 0.2444 0.0705
-0.0344 -0.0584 0.2444 1.0000 0.5065
0.3723 -0.0537 0.0705 0.5065 1.0000 D_s3 =

9.7618 0 0 0
R1 = 0 7.3729 0 0
0 0 10.0403 0
1.0000 0.2022 0.1860 -0.0344 0.3723 0 0 0 6.7099
0.2022 1.0000 0.0939 -0.0584 -0.0537
0.1860 0.0939 1.0000 0.2444 0.0705
-0.0344 -0.0584 0.2444 1.0000 0.5065 invD3 =
0.3723 -0.0537 0.0705 0.5065 1.0000
0.1024 0 0 0
0 0.1356 0 0
S2 = 0 0 0.0996 0
0 0 0 0.1490
6.3300 6.1891 5.7770 5.5482
6.1891 6.4493 6.1534 5.9234
5.7770 6.1534 6.9180 6.9463 Corr3 =
5.5482 5.9234 6.9463 7.4647
1.0000 0.7346 0.7108 0.7040
0.7346 1.0000 0.6932 0.7086
D_s2 = 0.7108 0.6932 1.0000 0.8393
0.7040 0.7086 0.8393 1.0000
2.5159 0 0 0
0 2.5396 0 0
0 0 2.6302 0 R3 =
0 0 0 2.7322
1.0000 0.7346 0.7108 0.7040
0.7346 1.0000 0.6932 0.7086
invD2 = 0.7108 0.6932 1.0000 0.8393
0.7040 0.7086 0.8393 1.0000
0.3975 0 0 0
0 0.3938 0 0
0 0 0.3802 0
0 0 0 0.3660
University of The Armed Forces ESPE. Telecommunications Engineering 7

III. CONCLUSIONS
It is considered that the answers of the exercises that we
state in this document are an excellent example of how to
obtain the correlation matrix and the covariance matrix,
these models in the end help us to measure and show the
interdependence in associated relationships. Also, it was
possible to observe that a great help to obtain these
calculations is the Matlab mathematical software, since, in
the document, the realization of each of these matrices is
evidenced, which require a lot of time and moderate
knowledge of the matter, but Matlab summarizes all that in a
simple and effective code because the answers obtained
through formulas are similar to those obtained in the
software.

REFERENCES
[1] Anonymus, “Resistencia a la Insulina y la prediabetes”. Accesed: July.
13, 2021. [Online]. Avalaible:
https://www.niddk.nih.gov/health-information/informacion-de-la-
salud/diabetes/informacion-general/que-es/resistencia-insulina-
prediabetes.
[2] Anonymus, “Pruebas y diagnóstico de la diabetes”. Accesed: July. 13,
2021. [Online]. Avalaible
https://www.niddk.nih.gov/health-information/informacion-de-la-
salud/diabetes/informacion-general/pruebas-diagnostico.
[3] Anonymus, “Matriz de correlación”. Accesed: July. 12, 2021. [Online].
Avalaible:
https://es.wikipedia.org/wiki/Matriz_de_correlaci%C3%B3n
[4] Anonymus, “Covarianza y Coficiente de correlación de Pearson”.
Accesed: July. 12, 2021. [Online]. Disponible:
http://agrega.juntadeandalucia.es/repositorio/21022018/61/es-
an_2018022112_9230106/2_correlacin_covarianza_y_coeficiente_de_cor
relacin_de_pearson.html.
[5] Anonymus, “Correlación lineal”. Accesed: July. 14, 2021. [Online].
Avalaible
https://es.mathworks.com/help/matlab/data_analysis/linear-correlation.html
[6] Anonymus, “Interpretar todos los estadísticos y gráficas para Análisis de
elementos”. Accesed: July. 14, 2021. [Online]. Disponible:
https://support.minitab.com/es-mx/minitab/18/help-and-how-to/modeling-
statistics/multivariate/how-to/item-analysis/interpret-the-results/all-
statistics-and-graphs/

You might also like