Professional Documents
Culture Documents
2020–2021
1 / 29
Data matrices
Mean vector
Linear transformations
2 / 29
Data matrices
3 / 29
Datasets as matrices
Data matrix X : n × p:
x11 . . . x1j . . . x1p
. .. ..
.. . .
X = (xij )i=1,...,n = xi1 . . . xij ... xip
j=1,...,p . .. ..
.
. . .
I p columns: variables (j = 1, . . . , p)
x1j
j ..
x = .
xnj
4 / 29
X <- read.delim(file = "data/food",
row.names = 1, comment.char = "#")
X
[1] "data.frame"
5 / 29
Visualizing a data matrix?
plot(X) # graph not very informative... => better way?
500 700 900 1100 1400 1800 2200 2600 250 350 450 550
900 1100
veget
700
500
800
fruit
600
400
1400 1800 2200 2600
meat
1000
poult
800
600
550
450
milk
350
250
450
wine
350
250
300 400 500 600 400 600 800 600 800 1000 250 350 450
6 / 29
Datasets as clouds of points
7 / 29
Mean vector
8 / 29
Sample means
x̄ 0 = (x̄1 , . . . , x̄p ) : 1 × p
9 / 29
Centering
1 x̄1 . . . x̄p
.. .. .. : n × p
0
1n x̄ = . x̄1 . . . x̄p = . .
1 x̄1 . . . x̄p
10 / 29
Xc <- sweep(X, MARGIN = 2, STATS = colMeans(X), FUN = "-")
Xc
11 / 29
Covariance and correlation
12 / 29
Sample variance and covariance
I Sample covariance between variables j and k:
n
1X
sjk = (xij − x̄j )(xik − x̄k )
n i=1
1 0
S = (sjk ) j=1,...,p = X Xc : p × p
k=1,...,p n c
14 / 29
n <- nrow(X) # number of observations
S <- ((n-1)/n) * cov(X) # function cov(): division by n-1 rather than by n
round(S, 2) # rounding up to two digits
15 / 29
Standardization
Variables may be expressed in different units and therefore difficult
to compare.
Standardization or scaling: divide each (centered) variable by its
standard deviation
!
xij − x̄j
X cs =
sj i=1,...,n
j=1,...,p
# centered and scaled variables
Xcs <- sweep(Xc, MARGIN = 2, STATS = sqrt(diag(S)), FUN = "/")
round(Xcs, 3)
17 / 29
Correlation
Correlation between variables j and k:
n
sjk 1X xij − x̄j xik − x̄k
ρjk = =
sj sk n i=1 sj sk
18 / 29
Correlations and angles
with θjk the angle between the centered variables, seen as vectors in
Rn .
ρjk θjk
1 0 parallel
0 π/2 orthogonal
−1 π anti-parallel
19 / 29
Linear transformations
20 / 29
Linear transformation of a data matrix
21 / 29
Creating ` = 2 new variables:
I vegan = sum of veget and fruit
I animal = sum of meat and poult
names(X)
[,1] [,2]
[1,] 0 0
[2,] 1 0
[3,] 1 0
[4,] 0 1
[5,] 0 1
[6,] 0 0
[7,] 0 0
22 / 29
Y <- as.data.frame(as.matrix(X) %*% A)
names(Y) <- c("vegan", "animal")
Y
vegan animal
w2 782 1963
e2 947 2094
m2 1329 2875
w3 904 2051
e3 1004 2059
m3 1532 3493
w4 1027 2258
e4 1183 2618
m4 1410 3515
w5 1199 2607
e5 1543 2949
m5 1984 3797
23 / 29
Transformed mean vector
ȳ 0 = x̄ 0 A
vegan animal
1237.000 2689.917
t(colMeans(X)) %*% A # idem
[,1] [,2]
[1,] 1237 2689.917
24 / 29
Transformed covariance matrix
S Y = A0 S X A : ` × `
vegan animal
vegan 106818.8 188667.5
animal 188667.5 378426.1
((n-1)/n) * t(A) %*% cov(X) %*% A # idem
[,1] [,2]
[1,] 106818.8 188667.5
[2,] 188667.5 378426.1
25 / 29
Cross-covariance matrix
S XY = S X A : p × `
[,1] [,2]
bread 14201.000 18566.81
veget 57320.667 96248.67
fruit 49498.167 92418.83
meat 117931.500 232451.23
poult 70736.000 145974.85
milk 19350.667 22156.69
wine -9719.833 -17956.78
26 / 29
Joint covariance matrix
27 / 29
Z <- cbind(X, Y)
Z
28 / 29
round(((n-1)/n) * cov(Z))
29 / 29