You are on page 1of 38

Budi Yuniarto, SST, M.

Si
Pertemuan IV
How about
Multivariate
Normality?

• The generalization of the well-known normal distribution to


multiple variables is called the multivariate normal
distribution (MVN).
• Many multivariate techniques rely on this distribution in
some manner.
Bivariate Normal Density
Why Multivariate Normality?
• Although real data may never come from a true MVN, the
MVN provides a robust approximation, and has many nice
mathematical properties.
• Furthermore, because of the central limit theorem, many
multivariate statistics converge to the MVN distribution as
the sample size increases.
• The lines of the contour plots denote places of
equal probability mass for the MVN distribution.
• These contours can be constructed from the
eigenvalues and eigenvectors of the covariance
matrix.
– The direction of the ellipse axes are in the direction of the
eigenvectors.
– The length of the ellipse axes are proportional to the
constant times the eigenvalues.
Illustration: Probability Contours
If X has a multivariate normal distribution, then:

1. Linear combinations of X are normally distributed.

2. All subsets of the components of X have a MVN distribution.

3. Zero covariance implies that the corresponding components


are independently distributed.

4. The conditional distributions of the components are MVN.


Illustration with Multiple Combinations
Illustration
Illustration
Illustration

Y|X
Y
X
f(X|Y=50)

f(X|Y=30)
Just a review
Multivariate Case

Sampling Distribution of ∑
Law of Large Numbers
Central Limit Theorem
ASSESSING MULTIVARIATE
NORMALITY
qj = (j–.5)/n, j = 1, 2,..., n.
Power Transformation
• Let X represent an arbitrary observation. The power family of
transformations is indexed by a parameter λ. A given value
for λ implies a particular transformation.
• For example, consider xλ with λ = —1. Since x-1 = 1/x, this
choice of λ corresponds to the reciprocal transformation.
• A sequence of possible transformations is

• To select a power transformation, an investigator looks at the


marginal plot diagram or histogram and decides whether
large values have to be "pulled in" or "pushed out"
• With multivariate observations, a power transformation must be
selected for each of the variables. Let λ1, λ2,..., λp be the power
transformations for the p measured characteristics. Each λk can
be selected by maximizing:

• where x1k, x2k, …, xnk are the n observations on the k-th variable
and

is the arithmetic average of the transformed observations.


• The j-th transformed multivariate observation is

You might also like