Professional Documents
Culture Documents
Convention and guideline
• Be always aware of dimensions of vectors and matrices and their
conformability
• Typically, a vector w.o. transpose (denoted by ‘ or t as superscripts) is
referred as a column‐vector. Often a vector is to be indicated by undercurl
17
Summary Statistics for Multivariate Data: The Mean Vector
• For a multivariate data set with p variables
X 1 , X p the population mean vector is:
• An estimate of is given by the sample mean vector
• This can be calculated in R with the colMeans function.
1 n
X Xi
n i 1
18
1
24‐06‐2021
The population (variance)‐covariance matrix
1,1 1,2 1, p
2,1 2,2 2, p
Cov( X ) , where
p ,1 p ,1 p , p
i ,i i2 Var ( X i )
i , j j ,i Cov( X i , X j )
It is a symmetric, nonnegative‐definite matrix. Positive‐definite if invertible/full rank
19
Sample covariance matrix
An unbiased estimator of is the sample covariance matrix
1 n
X i X X i X
t
S
n i 1
The diagonal elements are the variances, while the off‐diagonal elements of S
are the sample covariances between the variables.
The sample covariance matrix can be found in R using the var function or the cov
function.
20
2
24‐06‐2021
Sample and population correlation matrix
The correlation matrix for a multivariate RV is one whose diagonal elements
are 1’s and whose off‐diagonals are the respective correlation values
ij
ij Cor ( X i , X j )
i j
ij
The sample correlation matrix replaces with the sample correlation coefficient
and can be found by 1/ 2 1/ 2
RD SD
D is the diagonal matrix with elements
The sample correlation matrix can be found in R using the cor function.
21
Basic Matrix algebra and implementing in R
• Addition , multiplication of matrices
• Trace, determinant of a matrix
• Inverse of a matrix – solution of a system of linear equations
• Eigenvalue and eigenvectors:
22
3
24‐06‐2021
Get started with a
few multivariate
datasets
23
Romano‐British pottery (EH)
Data on chemical analysis of 45 pots in 3 regions (5 kilns)
For each pot (observation), 10 variables – 9 on chemical comp & 1
representing kiln no.
Question: whether the chemical profiles of each pot suggest different
types of pots and if any such types are related to kiln or region.
Implement cluster analysis
24
4
24‐06‐2021
Practice with datasets
• Husband‐wife data
• US air pollution
• Pottery
25