Professional Documents
Culture Documents
INTRODUCTION
TO
MULTIVARIATE ANALYSIS
2
11/02/2023
3
1.2 Data Organization
Multivariate data are a collection of observations (or measurements) of:
variables ().
“items” ().
“items” can also be thought of as subjects/examinees/individuals or entities (when
people are not under study) .
in some disciplines (such as educational measurement), “items” are considered the
variables
Collected per individual.
11/02/2023
1.3 Data Organization
4
= measurement of the kth variable on the jth entity.
Item 1
Item 2
..
Item j
….
Item n
11/02/2023
5
1.4 Arrays
To represent the entire collection of items and entities, a rectangular array can be constructed:
11/02/2023
Example 1.1:
• So,
6 putting things all together, envision standing outside of the Kansas Union
Bookstore, asking people for receipts. Interested in looking at two variables:
Variable 1: the total amount of the purchase.
Variable 2: the number of books purchased.
You find four people, and here is what you see observe:
Variable 1 42 52 48 58
Variable 2 4 5 4 3
( )
42 4
52 5
𝑿 =
48 4 11/02/2023
58 3
Notice for any variable, :
7
The first subscript represents the row location in the data array.
The second subscript represents the column location in the data array.
Correlations.
1.5.1 Population / Sample Mean Vector
The
8 population mean is the measure of central tendency for the population. Here, the population mean for
variable is
An array of the means for all variables then looks like this (which we will come to know as the mean vector):
11/02/2023
1.5.2 Population / Sample Variance Covariance
A variance
9 measures the degree of spread (dispersion) in a variable’s values. Theoretically, a
population variance is the average squared difference between a variable’s values and the
mean for that variable. The population variance for variable is
Note the “” subscript, this will be important because the equation that produces the
variance for a single variable is a derivation of the equation of the covariance for a pair of
variables.
Also note the division by n. Reasons for this will become apparent in the near future.
11/02/2023
10
The population covariance is a measure of the association between pairs of variables in a
population. Here, the population covariance between variables and is
11/02/2023
1.5.3 Population / Sample Covariance Matrix
11
• Making an array of all sample covariance give us:
11/02/2023
1.5.4
12 Sample Correlation
Sample covariance are dependent upon the scale of the variables under study.
For this reason, the correlation is often used to describe the association between two variables.
For a pair of variables, and , the sample correlation is found by dividing the sample covariance
by the product of the standard deviation of the variables:
Example 1.2: Find mean vector, variance covariance and correlation matrices for the
example 1.1.
11/02/2023