Mvaslides

Applied Multivariate Statistical Analysis
Wolfgang HÄRDLE
Leopold SIMAR
Ladislaus von Bortkiewicz Chair of

Statistics
Humboldt-Universität zu Berlin
and
C.O.R.E. /Institute de Statistique
Université Catholique de Louvain,
Belgium
http://lvb.wiwi.hu-berlin.de
http://www.stat.ucl.ac.be/ISpersonnel/simar
Comparison of Batches 1-1
Comparison of Batches
An old Swiss 1000-franc bank note.

Example: Swiss bank data

The authorities have measured
X1 = length of the bill

X2 = height of the bill (left)
X3 = height of the bill (right)
X4 = distance of the inner frame to the lower border
X5 = distance of the inner frame to the upper border
X6 = length of the diagonal of the central picture.

Example: (cont.)
The dataset consists of 200 measurements on Swiss bank notes.
The first half of these bank notes are genuine, the other half are
forged bank notes.
It is important to be able to decide whether a given banknote is
genuine.
We want to derive a good rule that separates the genuine and
counterfeit banknotes.
Which measurement is the most informative? We have to visualize
the difference.

Boxplots
Boxplot
is a graphical technique for displaying the distribution of
variables.
helps us in seeing location, skewness, spread, tail length and
outlying points.
is particularly useful in comparing different batches.
is a graphical representation of the Five Number Summary.

City Country Pop. (10000) Order Statistics

Tokyo Japan 3420 x(15)
Mexico City Mexico 2280 x(14)
Seoul South Korea 2230 x(13)
New York USA 2190 x(12)
Sao Paulo Brazil 2020 x(11)
Bombay India 1985 x(10)
Delhi India 1970 x(9)
Shanghai China 1815 x(8)
Los Angeles USA 1800 x(7)
Osaka Japan 1680 x(6)
Jakarta Indonesia 1655 x(5)
Calcutta India 1565 x(4)
Cairo Egypt 1560 x(3)
Manila Philippines 1495 x(2)
Karachi Pakistan 1430 x(1)
Table: The 15 largest world cities in 2006.

Five Number Summary
Upper quartile FU
Lower quartile FL
Median = deepest point
Extremes
Consider the order statistics
→ Depth of a data value x(i) : min{i, n − i + 1}
[depth of median] + 1
depth of fourth =
2

Median
Order statistics {x(1) , x(2) , . . . , x(n) } is the set of the ordered values
x1 , x2 , . . . , xn
where x(1) denotes the minimum and x(n) the maximum.
Median M

 x n+1 n odd
(n2 ) o
M=
 1
2 x( n ) + x( n +1)
2 2
n even

Construction of the Boxplot

Median: 1815 (depth of data 8)
Fourths (depth = 4.5): 1610=FL , 2105=FU
Extremes (depth = 1): 1430, 3420
F-spread: FU − FL = dF
Outside bars: FU + 1.5dF , FL − 1.5dF
1. Construct the box with borders at FU and FL
.
2. Draw Median as | and Mean as ..
3. Draw whiskers a to data within the outside bars
4. Mark outliers by • if they are outside [FL − 1.5dF , FU + 1.5dF ]
and by ? if they lie outside [FL − 3dF , FU + 3dF ]

Boxplot
3000
2500
Values
2000
1500
World Cities
Boxplot for world cities. MVAboxcity

Car Data
40
35
●
30
25
20
15
US JAPAN EU
Boxplot for the mileage of U.S. American, Japanese and European

cars (from left to right). MVAboxcar
Swiss Bank Notes
142
141
140
●
139
138
●
●
●
GENUINE COUNTERFEIT
Variable X6 (diagonal) of bank notes, the genuine on the left.

MVAboxbank6
Swiss Bank Notes
216
215.5
215
214.5
214
● ●
●
GENUINE COUNTERFEIT
Variable X1 (length) of bank notes, the genuine on the left.

MVAboxbank1
Summary: Boxplots
Median and mean bars indicate the central locations.

The relative location of median (and mean) in the box is a
measure of skewness.
The length of the box and whiskers is a measure of spread.
The length of whiskers indicate the tail length of the
distribution.

Summary: Boxplots
The outliers are marked by • if they are outside

[FL − 1.5dF , FU + 1.5dF ] and by ? if they lie outside
[FL − 3dF , FU + 3dF ]
The boxplots do not indicate multi-modality or clusters.
If we compare the relative size and location of the boxes, we
are comparing distributions.

Histograms
n
XX
fbh (x) = n−1 h−1 I{xi ∈ Bj (x0 , h)}I{x ∈ Bj (x0 , h)}
j∈Z i=1
Bj (x0 , h) = [x0 + (j − 1)h, x0 + jh), j ∈ Z.

[., .) denotes a left closed and right open interval.
I{•} denotes the indicator function.
h is a smoothing parameter and controls the width of the
histogram bins.

Swiss Bank Notes Swiss Bank Notes
10 20 30
8
Diagonal
Diagonal
4
0
0
138 139 140 141 138 139 140 141
h = 0.1 h = 0.3
40
Diagonal
Diagonal
15
20
0 5
0
138 139 140 141 138 139 140 141
h = 0.2 h = 0.4
Diagonal of counterfeit bank notes. Histograms with x0 = 137.8

and h = 0.1 (upper left), h = 0.2 (lower left), h = 0.3 (upper
right), h = 0.4 (lower right). MVAhisbank1
40
40
Diagonal
Diagonal
20
20
0
0
138 139 140 141 138 139 140 141
x_0 = 137.65 x_0 = 137.85

40
40
Diagonal
Diagonal
20
20
0
0
138 139 140 141 138 139 140 141
x_0 = 137.75 x_0 = 137.95
Diagonal of counterfeit bank notes. Histograms with h = 0.4 and

origins x0 = 137.65 (upper left), x0 = 137.75 (lower left),
x0 = 137.85 (upper right), x0 = 137.95 (lower right).
MVAhisbank2
Summary: Histograms
Modes of the density are detected with a histogram.

Modes correspond to strong peaks in the histogram.
Histograms with the same h need not be identical. They also
depend on the origin x0 of the grid.
The influence of the origin x0 is drastic. Changing x0 creates
different looking histograms.

Summary: Histograms
The consequence of a too large h is a flat and unstructured

histogram.
A too small binwidth h results in an unstable histogram.
√ 1
There is an optimal bandwidth hopt = 24n π .
3
It is recommended to use averaged histograms. They are

kernel densities.

Kernel densities
Histogram (at the center of a bin) can be written as

n
X
fbh (x) = n−1 (2h)−1 I(|x − xi | ≤ h)
i=1
Define K (u) = I (|u| ≤ 1/2)
n
X
x − xi
fbh (x) = n−1 h−1 K
h
i=1
K is the kernel.

Kernel functions
K (•) Kernel
K (u) = 12 I(|u| ≤ 1) Uniform

K (u) = (1 − |u|)I(|u| ≤ 1) Triangle
K (u) = 34 (1 − u 2 )I(|u| ≤ 1) Epanechnikov
K (u) = 15 2 2
16 (1 − u ) I(|u| ≤ 1) Quartic (Biweight)
2
K (u) = √12π exp(− u2 ) = ϕ(u) Gaussian
Table: Kernel functions.

Kernel functions
Uniform Triangle Epanechnikov
1 1 1
0.5 0.5 0.5
0 0 0
−2 0 2 −2 0 2 −2 0 2
Quartic (biweight) Gaussian
1 1
0.5 0.5
0 0
−2 0 2 −2 0 2
Kernel functions. MVAkernelfunctions

Swiss bank notes
0.8
Density estimates for diagonals
0.6
0.4
0.2
0.0
137 138 139 140 141 142 143
Counterfeit / Genuine
Densities of diagonals of genuine and counterfeit bank notes.

MVAdenbank
Choice of the bandwidth h

Silverman’s rule of thumb
Gaussian kernel
1 u2
K (u) = √ exp(− )
2π 2
1
σ n− 5
hG = 1.06b
Quartic kernel
15
K (u) = (1 − u 2 )2 I(|u| ≤ 1)
16
hQ = 2.62hG
s
n
P
Sample standard deviation: σ
b= n−1 (xi − x̄)2
i=1
143
0.02
0.04
0.06
142
0.12
0.14
0.18
141
0.16
0.1
0.1
140
0.12
0.16
0.14
139
0.08
138
9 10 11 12
Contours of the density of X5 , X6 of genuine and counterfeit bank

notes. MVAcontbank2
Contours of the density of X4 , X5 , X6 of genuine and counterfeit
bank notes. MVAcontbank3
Summary: Kernel densities
Kernel densities estimate distribution densities by the kernel

method.
The bandwidth h determines the degree of smoothness of the
estimate fb.
Kernel densities are smooth functions and they can graphically
represent distributions (up to 3 dimensions).

Summary: Kernel densities
A simple (but not necessarily correct) way to find a good

bandwidth is to compute the rule of thumb bandwidth
hG = 1.06bσ n−1/5 . This bandwidth is to be used only in
combination with a Gaussian kernel ϕ.
Kernel density estimates are a good descriptive tool for seeing
modes, location, skewness, tails, asymmetry, etc.

Scatterplots
Scatterplots - bivariate or trivariate plots of variables against each

other
Rotation of data
Separation lines
Draftman’s plot
Brushing
Parallel coordinate plots

Swiss bank notes
●
● ● ●
● ●
142
● ● ●
● ● ● ● ●● ●
● ●● ● ●
● ●● ● ● ● ●● ●
● ● ● ●●
● ●● ●● ● ●
● ● ● ●●●● ● ●
● ●● ● ● ●
● ● ● ●●
● ●● ●
141
● ● ●● ●
● ●● ●
●● ●●
●
● ●
●
140
●
139
138
7 8 9 10 11 12 13
2D scatterplot for X5 vs. X6 of the bank notes. Genuine notes are

circles, counterfeit are triangles. MVAscabank56
Swiss bank notes
● ●
●● ●●
●●
●● ● ●
142 ●●● ●●
●●
● ●● ●
●
●
●● ●●●
●●
●
● ●●●●●●●● ●●
●
● ● ● ●●●●●●
●●
● ●●●●● ●●
● ● ● ●● ● ●
Diagonal (X6)
141 ●●●●●● ●
●●
●●
●● ●
● ●
●●
140
●
140
139
8 10 11 12
11 9 10
Lower 13 14 8
5)
inner fr ame (X
ame (X inner fr
4 ) Upper
3D scatterplot for (X4 , X5 , X6 ) of the bank notes. Genuine notes

are circles, counterfeit are triangles. MVAscabank456
3
13
143
13
12
142
12
11
141
11
10
10
Y
140
9
139
9
8
138
8
7
137
7
128.5 129.0 129.5 130.0 130.5 131.0 131.5 128.5 129.0 129.5 130.0 130.5 131.0 131.5 128.5 129.0 129.5 130.0 130.5 131.0 131.5
X X X
● ●
4
13
143
131.0
● ●
12
142
● ● ●●● ●
130.5
● ● ●
● ● ●●●● ● ●●
● ● ●
11
141
● ●●
●●●● ● ●● ●
● ● ● ●●● ●● ● ●
● ● ●●● ●● ●●●● ●●
130.0
10
●● ● ●●●●●●●
140
●●●
● ● ●
● ● ● ●
● ● ●
139
9
●
129.5
●
●
138
8
129.0
137
7
7 8 9 10 11 12 7 8 9 10 11 12 13 7 8 9 10 11 12 13
X X X
● ● ●
5
143
●●
131.0
●
● ●●●●
12
●● ●●●●●
142
● ●● ● ●● ● ● ●●
●●
●● ●
●●●●●
●
130.5
●● ● ●●●●●
11
●● ●● ●● ●●●●●● ●
141
● ● ●●●●●●●●●● ●●●●● ● ●●
● ● ●●●●●●● ● ●●●●●●●●
●● ●●
●●●●● ●●●● ● ● ●●● ●● ●● ●●
●●
10
●●
130.0
●●●● ● ●● ●●●
140
● ●● ● ●● ●● ●●●●●●●
● ● ● ●● ●
● ● ●
139
9
● ● ●●
129.5
● ●●● ● ●●
●
138
8
●
129.0
137
7
8 9 10 11 12 8 9 10 11 12 7 8 9 10 11 12 13
X X X
● ● ●● ●
6
● ●● ●●●●
131.0
●●
●●● ● ●●●●
12
● ● ●●● ● ●●●● ●
12
● ● ●● ● ● ● ●
● ●● ●
●●●●●●●● ●● ●● ●●●● ●●●●
●● ● ●● ● ●●● ●● ●● ●● ● ● ●
● ●● ●● ●●●●●
●●●●
130.5
● ●● ●●
11
●●●● ●● ●
11
● ● ● ●●●●●●
●●●●●● ●●●●●
●●●
●●● ●●●●●●●●●
●●●●●● ●● ● ●
●●●●●● ●●●●● ●
●● ● ● ●● ●● ●
●● ● ●●●●●●● ● ● ● ●● ●●
●● ● ●● ●
10
●
130.0
10
● ●●●●●●
Y
Y
●●●●●● ●● ●●●● ● ●
●●●● ●● ●
● ●● ●
9
● ● ● ●
9
129.5
● ●● ●● ● ●
●
8
●
129.0
7
138 139 140 141 142 138 139 140 141 142 138 139 140 141 142
X X X
Draftman’s plot of the bank notes. The pictures in the left-hand

column show (X3 , X4 ), (X3 , X5 ) and (X3 , X6 ), in the middle we
have (X4 , X5 ) and (X4 , X6 ), and in the lower right (X5 , X6 ). The
upper right half contains the corresponding density contour plots.
MVAdraftbank4
Summary: Scatterplots
Scatterplots in two and three dimensions help us in seeing

separated points, clouds or sub-clusters.
They help us in judging positive or negative dependence.
Draftman scatterplot matrices are useful for detecting
structures conditioned on values of certain other variables.
As the brush of a scatterplot matrix is moving in the point
cloud we can study conditional dependence.

Chernoff-Flury Faces
91 92 93 94 95
96 97 98 99 100
Index Index Index Index Index
101 102 103 104 105
106 107 108 109 110
Chernoff-Flury faces Index

for observations
Index Index
91
Index
to 110
Index
of the bank notes.
MVAfacebank10
Six variables - face elements
X1 = 1, 19 (eye sizes)
X2 = 2, 20 (pupil sizes)
X3 = 4, 22 (eye slants)
X4 = 11, 29 (upper hair lines)
X5 = 12, 30 (lower hair lines)
X6 = 13, 14, 31, 32 (face lines and darkness of hair)

Observations 1 to 50
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
Index Index Index Index Index Index Index Index Index Index
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
Index
Flury facesIndexfor observations
Index Index
1Index
to 50Index
of theIndexbankIndex
notes.Index Index
MVAfacebank50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
Index Index Index Index Index Index

Flury faces for observations 51 to 100 of Index Index
the bank Index
notes. Index
MVAfacebank50
101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120
121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140
141 142 143 144 145 146 147 148 149 150
Index
Index Index Index
101 to Index
150 ofIndex Index
the bank Index
notes. Index
MVAfacebank50
151 152 153 154 155 156 157 158 159 160
161 162 163 164 165 166 167 168 169 170
171 172 173 174 175 176 177 178 179 180
181 182 183 184 185 186 187 188 189 190
191 192 193 194 195 196 197 198 199 200
Index
Index Index Index
151 to Index
200 ofIndex Index
the bank Index
notes. Index
MVAfacebank50
Summary: Faces
Faces can be used to detect subgroups in multivariate data.

Subgroups are characterized by similar looking faces.
Outliers are identified by extreme faces (e.g. dark hair, smile
or happy face).
If one element of X is unusual, the corresponding face
element changes significantly in shape.

Andrews’ Curves
Each multivariate observation Xi = (Xi,1 , .., Xi,p ) ∈ Rp is
transformed into a curve as follows
p odd

Xi,1 p−1 p−1
fi (t) = √ +Xi,2 sin(t)+Xi,3 cos(t)+. . .+Xi,p−1 sin t +Xi,p cos t
2 2 2
p even
Xi,1 p
fi (t) = √ + Xi,2 sin(t) + Xi,3 cos(t) + . . . + Xi,p sin t
2 2
such that the observation represents the coefficients of a so-called

Fourier series, t ∈ [−π, π].
Andrews’ Curves
Subgroups are characterized by similar curves.

Outliers are characterized by single curves.
Order plays an important role in the interpretation.

Let us take the 96th observation of the Swiss bank note dataset,
X96 = (215.6, 129.9, 129.9, 9.0, 9.5, 141.7)
The Andrews’ curve is:

215.6
f96 (t) = √ + 129.9 sin(t) + 129.9 cos(t) + 9.0 sin(2t) + 9.5 cos(2t) + 141.7 sin(3t)
2

Andrews curves (Bank data)
0.5
0.25
0
−0.25
0 1 2 3 4 5 6
Andrews’ curves of the observations 96–105 of the Swiss bank note

data. The order of the variables is 1,2,3,4,5,6. MVAandcur
Let us take the 96th observation of the Swiss bank note dataset,
X96 = (215.6, 129.9, 129.9, 9.0, 9.5, 141.7)
The Andrews’ curve using the reversed order of variables is:

141.7
f96 (t) = √ + 9.5 sin(t) + 9.0 cos(t) + 129.9 sin(2t) + 129.9 cos(2t) + 215.6 sin(3t)
2

Andrews curves (Bank data)
0.5
0.25
0
−0.25
0 1 2 3 4 5 6
Andrews’ curves of the observations 96–105 of the Swiss bank note

data. The order of the variables is 6,5,4,3,2,1. MVAandcur2
Summary: Andrews’ Curves
Outliers appear as single Andrew’s curves, which looks

different from the rest.
A subgroup is characterized by a set of similar curves.
The order of the variables plays an important role for
interpretation.
The order of variables may be optimized by Principal
Component Analysis.
For more than 20 observations we obtain a bad ”signal-to-ink-
ratio”, which means we cannot see the structure of so many
curves obtained.
Parallel Coordinate Plots
Parallel Coordinate Plots

Are not based on an orthogonal coordinate system
Allow to see more than four dimensions
Idea
Instead of plotting observations in an orthogonal coordinate system
one draws their coordinates in a system of parallel axes. This way
of representation is however sensitive to the order of the variables.

Parallel coordinates plot (Bank data)
1
0.8
0.6
0.4
0.2
0
V1 V2 V3 V4 V5 V6
Parallel coordinate plot of observations 96–105 MVAparcoo1

Parallel coordinates plot (Bank data)
1
0.8
0.6
0.4
0.2
0
V1 V2 V3 V4 V5 V6
The full bank dataset. Genuine bank notes displayed as black lines.
The forged bank notes are shown as red lines. MVAparcoo2
Summary: Parallel coordinate plots
Parallel coordinate plots overcome the visualisation problem of

the Cartesian coordinate system for dimensions greater than 4.
Outliers are seen as outlying polygon curves.
The order of variables is still important for detection of
subgroups.
Subgroups may be screened by selective coloring in an
interactive manner.

A Short Excursion into Matrix Algebra 2-1
A Short Excursion into Matrix Algebra
 
a11 · · · a1p
 . . .. 
A(n×p) =
 .
. .. . 

an1 · · · anp

Definition Notation
Transpose A>
Sum A+B
Difference A−B
Scalar product c ·A
Product A·B
Rank rank(A)
Trace tr(A)
Determinant det(A) = |A|
Inverse A−1
Generalised Inverse A− : AA− A = A
Table: Elementary matrix operations.

Name Definition Notation Example
scalar p=n=1 a 3 !
1
column vector p=1 a
3

row vector n=1 a> 1 3
!
1
vector of ones (1, . . . , 1)> 1n
| {z } 1
n !
0
vector of zeros (0, . . . , 0)> 0n
| {z } 0
n !
2 0
square matrix n=p A(p × p)
0 2
Table: Special matrices and vectors.

Name Definition Notation Example

!
1 0
diagonal matrix aij = 0, i 6= j, n = p diag(aii )
0 2
!
1 0
identity matrix diag(1, . . . , 1) Ip
| {z } 0 1
p !
1 1
unit matrix aij = 1, n = p 1n 1>
n
1 1
!
1 2
symmetric matrix aij = aji
2 3

Name Definition Example

!
0 0
null matrix aij = 0
0 0
 
1 2 4
 
upper triangular matrix aij = 0, i < j  0 1 3 
0 0 1!
1 1
idempotent matrix A2 = A 2
1
2
1
2 2 !
√1 √1
orthogonal matrix A> A = I = AA> 2 2
√1 − √12
2

Properties of a Square Matrix

For any A(n × n) and B(n × n) and any scalar c
tr(A + B) = tr(A) + tr(B)
tr(cA) = c tr(A)
|cA| = c n |A|
tr(AB) = tr(BA)
|AB| = |BA|
|AB| = |A||B|
|A−1 | = |A|−1

Eigenvalues and Eigenvectors

Square matrix A(n × n)
Eigenvalue λ = Eval(A)
Eigenvector γ = Evec(A)
Aγ = λγ
Using spectral decomposition, it can be shown that:
Yn
|A| = λj
j=1
n
X
tr(A) = λj
j=1

Summary: Matrix Algebra
The determinant |A| is a product of the eigenvalues of A.

The inverse of a matrix A exists if |A| =
6 0.
The trace tr(A) is the sum of the eigenvalues of A.
The sum of the traces of two matrices equals the trace of the
sum of the two matrices.
The trace tr(AB) equals tr(BA).
The rank(A) is the maximum number of linearly independent
rows (columns) of A .

Spectral Decomposition
Every symmetric matrix A(p × p) can be written as:
A = ΓΛΓ>
Xp
= λj γj γj>
j=1
Λ = diag (λ1 , · · · , λp )
Γ = (γ1 , · · · , γp )

Covariance matrix !
1 ρ
Σ=
ρ 1
Eigenvalues:
1−λ ρ

=0
ρ 1−λ
λ1 = 1 + ρ, λ2 = 1 − ρ, Λ = diag(1 + ρ, 1 − ρ)
Eigenvectors:
! ! !
1 ρ x1 x1
= (1 + ρ)
ρ 1 x2 x2
MVAspecdecomp

x1 + ρx2 = x1 + ρx1
ρx1 + x2 = x2 + ρx2
⇒ x1 = x2 .
√ !
1 2
γ1 = √ .
1 2
√ !
1 2
γ2 = √ .
−1 2
√ √ !
1 2 1 2
Γ = (γ1 , γ2 ) = √ √
1 2 −1 2
Check: A = ΓΛΓ>

Eigenvectors
The direction of the first eigenvector is the main direction of the

point cloud. The second eigenvector is orthogonal to the first one.
This eigenvector direction is in general different from the LS
regression shape line.

normal sample, n=150
4
original data (y2), rotated data (y2)
2
0
-2
-2 0 2
original data (x1), rotated data (x1)
Scatterplot of observed data (◦) (sample size n = 150) and the

same data (N) displayed in the coordinate system given by the
eigenvectors of the covariance matrix.
Singular Value Decomposition (SVD)
A(n × p), rank(A) = r
A = Γ Λ ∆>
Γ(n × r ), ∆(p × r ), Γ> Γ= ∆> ∆ = Ir and

1/2 1/2
Λ = diag λ1 , . . . , λr , λj > 0.
λj = Eval(AT A)
Γ and ∆ consist of the corresponding eigenvectors of AA> and
A> A
G-inverse of A may be defined as A− = ∆Λ−1 ΓT .
AA− A = A
Summary: Spectral Decomposition
The spectral (Jordan) decomposition gives a representation of

a symmetric matrix in terms of eigenvalues and eigenvectors.
The eigenvectors belonging to the largest eigenvalues point
into the ”main direction” of the data.
The Jordan decomposition allows to easily compute the power
of a matrix A: Aα = ΓΛα Γ> .
A−1 = ΓΛ−1 Γ> , A1/2 = ΓΛ1/2 Γ> .

Summary: Spectral Decomposition
The singular value decomposition (SVD) is a generalization of

the Jordan decomposition to non-quadratic matrices.
The direction of the first eigenvector of the covariance matrix
of a two-dimensional point cloud is different from the least
squares regression line.

Quadratic Forms
A(p × p) symmetric matrix can be written as

p X
X p
Q(x) = x > Ax = aij xi xj
i=1 j=1
Definiteness
Q(x) > 0 for all x 6= 0 positive definite (pd),

Q(x) ≥ 0 for all x =6 0 positive semidefinite (psd).
A is pd (psd) iff Q(x) = x > Ax is pd (psd).

Example:
Q(x) = x > Ax = x12 + x22 , A = 10 01
Eigenvalues: λ1 = λ2 = 1 positive definite
2 1 −1
Q(x) = (x1 − x2 ) , A = −1 1
Eigenvalues λ1 = 2, λ2 = 0 positive semidefinite
Q(x) = x12 − x22
Eigenvalues λ1 = 1, λ2 = −1 indefinite.

Theorem
If A is symmetric and Q(x) = x > Ax is the corresponding quadratic
form, then there exists a transformation x 7→ Γ> x = y such that
p
X
x> A x = λi yi2 ,
i=1
where λi are the eigenvalues of A.

Lemma
A > 0 ⇔ λi > 0,
A ≥ 0 ⇔ λi ≥ 0, i = 1, . . . , p.

Theorem (Theorem 2.5)

x > Ax
If A and B are symmetric and B > 0, then the maximum of x > Bx
is given by the largest eigenvalue of B −1 A. More generally,
x > Ax x > Ax
max = λ 1 ≥ λ2 ≥ · · · ≥ λp = min ,
x x > Bx x x > Bx
where λ1 , . . . , λp denote the eigenvalues of B −1 A. The vector

>
which maximises (minimises) xx >Ax Bx
is the eigenvector of B −1 A
which corresponds to the largest (smallest) eigenvalue of B −1 A. If
x > Bx = 1, we get
max x > Ax = λ1 ≥ λ2 ≥ · · · ≥ λp = min x > Ax

x x

Summary: Quadratic forms
A quadratic form can be described by a symmetric quadratic

matrix A.
Quadratic forms can always be diagonalized.
Positive definiteness of a quadratic form is equivalent to
positiveness of the eigenvalues of the matrix A.
The maximum and minimum of a quadratic form under
constraints can be expressed in terms of eigenvalues.

Derivatives
f : Rp → R, (p × 1) vector x:
∂f (x)
column vector of partial derivatives
∂x
∂f (x)
, j = 1, . . . , p
∂xj
∂f (x)
row vector of the same derivatives
∂x >
∂f (x)
is called the gradient of f .
∂x

Second order derivatives:

∂ 2 f (x)
∂x∂x >
(p × p) Hessian matrix of the second derivatives
2
∂ f (x)
, i = 1, . . . , p, j = 1, . . . , p.
∂xi ∂xj
Some useful formulae

A(p × p), x(p × 1) ∈ Rp , a(p × 1) and A = A>
∂a> x ∂x > a
= =a
∂x ∂x

Example:
f : Rp → R, f (x) = a> x
a = (1, 2)> , x = (x1 , x2 )>
∂a> x ∂(x1 + 2x2 )

= = (1, 2)> = a
∂x ∂x

Derivatives of the quadratic form
∂x > Ax
= 2Ax
∂x
∂ 2 x > Ax
= 2A
∂x∂x >

Summary: Derivatives
∂f (x)
The column vector ∂x is called the gradient.
>
∂a x ∂x > a
The gradient ∂x = ∂x equals a.
∂x > Ax
The derivative of the quadratic form ∂x equals 2Ax.
The Hessian of f : Rp → R is the (p × p) matrix of the
2 (x)
second derivatives ∂∂xfi ∂xj
.
The Hessian of the quadratic form x > Ax equals 2A.

Partitioned Matrices
!
A11 A12
A(n × p), B(n × p), A =
A21 A22
Aij (ni × pj ), n1 + n2 = n and p1 + p2 = p
!
A11 + B11 A12 + B12
A+B =
A21 + B21 A22 + B22
!
>
B11 >
B21
B> = > >
B12 B22
!
> + A B>
A11 B11 > >
12 12 A11 B21 + A12 B22
AB > = > + A B> > >
A21 B11 22 12 A21 B21 + A22 B22

A(p × p) nonsingular partitioned in such a way that A11 , A22 are

square matrices !
A 11 A12
A−1 =
A21 A22
where

 def

 A11 = (A11 − A12 A−1 22 A21 )
−1 = (A
11·2 )
−1

 12
A = −(A11·2 )−1 A12 A−1 22


 A21 = −A−1 A
22 21 (A 11·2 )−1

 22
A = A−1
22 + A −1 −1
22 21 (A11·2 ) A12 A22
A −1

Matrix A11 is non-singular
|A| = |A11 ||A22 − A21 A−1

11 A12 |
and A22 non-singular
|A| = |A22 ||A11 − A12 A−1

22 A21 |
!
1 b>
B=
a A
→ |B| = |A − ab > | = |A||1 − b > A−1 a|
A−1 ab > A−1

(A − ab > )−1 = A−1 +
1 − b > A−1 a

Summary: Partitioned Matrices
!
A11 A12
For a partitioned matrix A(n × p) = and
A21 A22
!
B11 B12
B(n × p) = holds
B21 B22
!
A11 + B11 A12 + B12
A+B = .
A21 + B21 A22 + B22

The product AB > equals

!
> + A B>
A11 B11 > >
12 12 A11 B21 + A12 B22
> + A B> > >
.
A21 B11 22 12 A21 B21 + A22 B22

For A nonsingular, A11 , A22 square matrices,

!
A11 A12
A−1 =
A21 A22

 −1 def
 A11 = (A11 − A12 A−1
 22 A21 ) = (A11·2 )−1

 12
A = −(A11·2 )−1 A12 A−1 22
 −1

 A 21 = −A22 A21 (A11·2 )−1

 22
A = A−1 −1 −1 −1
22 + A22 A21 (A11·2 ) A12 A22

!
1 b>
For B = and for non-singular A we have
a A
|B| = |A − ab > | = |A||1 − b > A−1 a|.
A−1 ab > A−1
(A − ab > )−1 = A−1 + 1−b > A−1 a

Geometrical Aspects
Distance function d : R2p → R+
d 2 (x, y ) = (x − y )> A(x − y ), A>0
A = Ip , Euclidean distance
Ed = {x ∈ Rp | (x − x0 )> (x − x0 ) = d 2 }
Example: x ∈ R2 , x0 = 0, x12 + x22 = 1
Norm of a vector w.r.t metric Ip
√
kxkIp = d(0, x) = x >x

Distance d. d 2 (x, y ) = (x − y )> (x − y )

Iso–distance sphere. A = I2 , (x1 − x01 )2 + (x2 − x02 )2 = d 2

Iso–distance ellipsoid. Ed = {x : (x − x0 )> A(x − x0 ) = d 2 },

γj = Evec(A), A > 0
Angle between Vectors
Scalar product
< x, y > = x > y

< x, y >A = x > Ay
Norm of a vector
√
kxkIp = d(0, x) = x >x
√
kxkA = x > Ax
Unit vectors
{x : kxk = 1}

Angle between Two Vectors
Angle of vectors x and y can be calculated as
x >y
cos θ =
kxk ky k
Example: Angle = Correlation
Observations {xi }ni=1 , {yi }ni=1
x =y =0 P
xi yi
rXY = qP P = cos θ
2
xi yi2
Correlation corresponds to the angle between x, y ∈ Rn .


Angle between vectors.

x >y x1 y1 + x2 y2
cos θ = = = cos θ1 cos θ2 + sin θ1 sin θ2
kxkky k kxkky k
Column space
X (n × p) data matrix
C (X ) = {x ∈ Rn | ∃a ∈ Rp so that X a = x}
Projection matrix
P(n × n), P = P > = P 2 (P is idempotent)
let b ∈ Rn , a = Pb is the projection of b on C (P)

Projection on C (X )
X (n × p), P = X (X > X )−1 X >

PX = X , P is a projector, PP = P.
Q = In − P, Q2 = Q
y >x
px = y
ky k2
PX = X
QX = 0

y >x
Projection. px = y (y > y )−1 y > x = y
ky k2
Summary: Geometrical aspects
A distance between two p-dimensional points x, y is a

quadratic form (x − y )> A(x − y ) in the vectors of differences
(x − y ). A distance defines the norm of a vector.
Iso-distance curves of a point x0 are all those points which
have the same distance from x0 . Iso-distance curves are
ellipsoids whose principal axes are determined by the direction
of the eigenvectors. The half-length of principal axes is
proportional to the inverse of the roots of the eigenvalues of
A.

Summary: Geometrical aspects
The angle between two vectors x and y is given by

>
cos θ = kxkxA Ay
ky kA w.r.t. the metric A.
For the Euclidean distance with A = I the correlation
between two centered data vectors x and y is given by the
cosine of the angle between them, i.e. cos θ = rXY .
The projection P = X (X > X )−1 X > is the projection onto the
column space C (X ) of X .
The projection of x ∈ Rn on y ∈ Rn is given by
y >x
px = ky k2
y.

Moving to Higher Dimensions 3-1
Covariance
Covariance is a measure of (linear) dependency between variables.
σXY = Cov(X , Y ) = E(XY ) − (E X )(E Y )
Covariance of X with itself:
σXX = Var(X ) = Cov(X , X )
Covariance matrix for p-dimensional X :
 
σX X . . . σX1 Xp
 .1 1 .. .. 
Σ=
 .
. . . 

σXp X1 . . . σ Xp Xp

Empirical versions:
n
X
sXY = n−1 (xi − x)(yi − y )
i=1
Xn
sXX = n−1 (xi − x)2
i=1
Empirical covariance matrix:

 
sX X . . . sX1 Xp
 1. 1 .. .. 
S=  .
. . .  
sXp X1 . . . sXp Xp

X1 = length of the bill

X2 = height of the bill (left)
X3 = height of the bill (right)
X4 = distance of the inner frame to the lower border
X5 = distance of the inner frame to the upper border
X6 = length of the diagonal of the central picture.

X full bank dataset

 
0.14 0.03 0.02 −0.10 −0.01 0.08
 
 0.03 0.12 0.10 0.21 0.10 −0.21 
 
 0.02 0.10 0.16 0.28 0.12 −0.24 
 
S= 
 −0.10 0.21 0.28 2.07 0.16 −1.03 
 
 −0.01 0.10 0.12 0.16 0.64 −0.54 
 
0.08 −0.21 −0.24 −1.03 −0.54 1.32
sX1 X1 = s11 = 0.14

sX4 X5 = 0.16

Scatterplots with point clouds that are ”upward-sloping” are

showing variables with positive covariance.
Scatterplots with ”downward-sloping” structure are showing
negative covariance.

Swiss bank notes
13
● ●● ●
● ● ●● ●●●
12
●● ●●●●
● ● ● ● ●
● ●● ●●● ● ●
● ●
● ● ●
● ●●● ●
● ● ●●●●● ●
●●
● ● ● ● ● ●● ●● ● ● ● ●
11
●● ●● ●● ●
●●
● ● ●●●
●● ●
● ●●● ● ●
● ●●
●●
●●
● ●
●● ● ● ● ●● ● ● ●●
●●● ●
●●
● ● ●●●● ● ●
● ● ● ●
●●● ●●● ●● ●●● ● ●
10
● ●● ●●●● ●●● ● ●
● ●● ●●● ●●●● ●●
●● ●●
● ●● ●
● ● ●
● ●●● ● ●
9
●
8
●
7
7 8 9 10 11 12 13
Scatterplot of variables X4 vs. X5 of the full bank dataset.

MVAscabank45
Example: “classic blue” pullover

Sales of “classic blue” pullovers in 10 periods.
X1 number of pullovers sold
X2 price in EUR
X3 advertisement cost in EUR
X4 presence of sales assistant in hours per period
Does price have a big influence on pullovers sold?
sX1 X2 = −80.02

Pullovers Data
240
●
200
●
●
● ●
Sales (X1)
● ●
●
160
●
120
●
80
80 90 100 110 120
Price (X2)
Scatterplot of variables X2 vs. X1 of the pullovers dataset.

MVAscapull1
Summary: Covariance
The covariance is a measure of dependence.

Covariance measures only linear dependence.
There are nonlinear dependencies that have zero covariance.
Zero covariance does not imply independence.
Independence implies zero covariance.
Covariance is scale dependent.

Summary: Covariance
Negative covariance corresponds to downward-sloping

scatterplots.
Positive covariance corresponds to upward-sloping
scatterplots.
The covariance of a variable with itself is its variance
Cov(X , X ) = σXX .
1
For small n we should replace the factor n for the
1
computation of the covariance by n−1 .

Correlation
Cov(X , Y )
ρXY = p
Var(X ) Var(Y )
The empirical version of ρXY :
sXY
rXY = √
sXX sYY

Correlation matrix:
 
ρX X . . . ρ X1 Xp
 .1 1 .. .. 

P =  .. . .  
ρXp X1 . . . ρXp Xp
Empirical correlation matrix:

 
rX X . . . rX1 Xp
 1. 1 .. .. 
R=  .
. . .  
rXp X1 . . . rXp Xp


For genuine bank notes:
 
1.00 0.41 0.41 0.22 0.05 0.03
 
 0.41 1.00 0.66 0.24 0.20 −0.25 
 
 0.41 0.66 1.00 0.25 0.13 −0.14 
 
Rg =  
 0.22 0.24 0.25 1.00 −0.63 −0.00 
 
 0.05 0.20 0.13 −0.63 1.00 −0.25 
 
0.03 −0.25 −0.14 −0.00 −0.25 1.00

For forged bank notes:

 
1.00 0.35 0.24 −0.25 0.08 0.06
 
 0.35 1.00 0.61 −0.08 −0.07 −0.03 
 
 0.24 0.61 1.00 −0.05 0.00 0.20 
 
Rf =  
 −0.25 −0.08 −0.05 1.00 −0.68 0.37 
 
 0.08 −0.07 0.00 −0.68 1.00 −0.06 
 
0.06 −0.03 0.20 0.37 −0.06 1.00
The correlation between X4 and X5 is negative!

If X and Y are independent, then Cov(X , Y ) = ρ(X , Y ) = 0.

The converse is not true in general
Example:
standard normal distributed random variable X

random variable Y = X 2 which is surely not independent of X
Cov(X , Y ) = E(XY ) − E(X ) E(Y ) = E(X 3 ) = 0
(because E(X ) = 0 and E(X 2 ) = 1) and therefore ρ(X , Y ) = 0,

too.

Test of Correlation
Fisher’s Z -transformation (variance stabilizing transformation):

1 1 + rXY
W = log
2 1 − rXY

E(W ) ≈ 21 log 1+ρ XY
1−ρXY
1
Var(W ) ≈ (n−3)
W − E(W ) L
Z= p −→ N(0, 1)
Var(W )

Example: Car dataset

Correlation between mileage (X2 ) and weight (X8 )
n = 74, rX2 X8 = −0.823
H0 : ρ = 0 H1 : ρ 6= 0

1 1 + rX2 X8 −1.166 − 0
w = log = −1.166, z = q = −9.825
2 1 − rX2 X8 1
71
H0 : ρ = −0.75
−1.166 − (−0.973)
z= q = −1.627.
1
71

Car Data
4000
Weight (X8)
3000
●
●
● ●
● ●
●
●
●
2000 ●
15 20 25 30 35 40
Mileage (X2)
Mileage (X2 ) vs. weight (X8 ) of U.S. (star), European (plus) and
Japanese (circle) cars. MVAscacar
Summary: Correlation
The correlation is a standardized measure of dependence.

The absolute value of the correlation is always less or equal to
one.
Correlation measures only linear dependence.
There are nonlinear dependencies that have zero correlation.
Zero correlation does not imply independence.

Summary: Correlation
Independence implies zero correlation.

Negative correlation corresponds to downward-sloping
scatterplots.
Positive correlation corresponds to upward-sloping
scatterplots.
Fisher’s Z-transformation helps us in testing hypotheses on
correlation.
For small samples, Fisher’s Z-transformation can be improved
by W ∗ = W − 3W 4(n−1)
+tanh(W )
.

Summary Statistics
 
x11 · · · x1p
 .. .. 
 . . 
 
X = .. .. 
 . . 
 
xn1 . . . xnp
xi = (xi1 , · · · , xip )> ∈ Rp : i-th observation of a p-dimensional

random variable X ∈ Rp

Mean  
x1
 . 
x = .  −1 >
 .  = n X 1n
xp
Empirical covariance matrix
S = n−1 X > X − x x >

= n−1 (X > X − n−1 X > 1n 1> −1 >
n X ) = n X HX
Centering matrix
H = In − n−1 1n 1>
n

Empirical correlation matrix
R = D−1/2 SD−1/2
−1/2
with D = diag(sXj Xj ) and D−1/2 = diag(sXj Xj ) for j = 1, . . . , p.

Linear Transformations
A (q × p) matrix
Y = X A> = (y1 , . . . , yn )>

y = n−1 Y > 1n = Ax
SY = n−1 Y > HY = ASX A>
Example:
Let x = (1, 2)> and y = 4x, x ∈ R2
Then y = 4x = (4, 8)> .

Mahalanobis Transformation
Z = (z1 , . . . , zn )>
zi = S −1/2 (xi − x), i = 1, . . . , n
SZ = n−1 Z > HZ = Ip
Z=0
The Mahalanobis transformation leads to standardized

uncorrelated zero mean data matrix Z.

Summary: Summary Statistics
The center of gravity of a data matrix is given by its mean

vector x = n−1 X > 1n .
The dispersion of the observations in a data matrix is given by
the empirical covariance matrix S = n−1 X > HX .
The empirical correlation matrix is given by
R = D−1/2 SD−1/2 .

Summary: Summary Statistics
A linear transformation Y = X A> of a data matrix X has

mean Ax and empirical covariance ASX A> .
The Mahalanobis transformation is a linear transformation
zi = S −1/2 (xi − x) which gives a standardized, uncorrelated
data matrix Z.

One-sample t-test
We have iid observations x1 , . . . , xn .

Assume that the observations stem from N(µ, σ 2 ).
Then x̄n ∼ N(µ, σ 2 /n), i.e.
√ (x̄n − µ)
n ∼ N(0, 1).
σ

H0 : µ = µ0 H1 : µ 6= µ0
2
Assume that σ is known:
√ |x̄n − µ0 |
n ∼ N(0, 1)
σ
Show that P(reject H0 |H0 is true) = α.

Usually σ 2 is not known and we have to estimate it:

n
1 X
bn2
σ = (xi − x̄n )2 .
n−1
i=1
It can be shown that

√ (x̄n − µ)
n ∼ tn−1 .
bn
σ
Note: t-distribution tn approaches N(0, 1) as n → ∞ (parameter
n: degrees of freedom).

Test:
H0 : E (X ) = µ0 H1 : E (X ) 6= µ0
We reject H0 if
√ |x̄n − µ0 |
n > t1−α/2;n−1 .
bn
σ
t1−α/2;n−1 : 1 − α critical value (i.e. 1 − α/2 quantile) of the
Student’s t-distribution with (n − 1) degrees of freedom

Example: Car damage

McCullagh and Nelder (1989). The response variable C̄n is
“average costs of claims (in British pounds)”.
H0 : average costs = 200 H1 : average costs 6= 200
C̄n = 222.11
bn2 = 123.22
σ
n = 128
√ (C̄n − 200)
n = 2.0301 > t0.975;n−1 = 1.9788
bn
σ
We reject that average costs are equal to 200.

Two-sample t-test
We have two iid samples y11 , . . . , y1n and y21 , . . . , y2m .

Assume that Y11 ∼ N(µ1 , σ 2 ) and Y21 ∼ N(µ2 , σ 2 )
H 0 : µ1 = µ2 H1 : µ1 6= µ2
Pooled estimate of variance
 
1 X n Xm 
bP2 =
σ (y1i − ȳ1 )2 + (y2j − ȳ2 )2
m+n−2 
i=1 j=1

Test statistic
r
m + n (ȳ1 − ȳ2 ) − (µ1 − µ2 )
T = ∼ tn+m−2
mn bP
σ
Reject H0 if |T | > t1−α/2;n+m−2 .

Linear Model for Two Variables
yi = β0 + β1 xi + εi , E (εi ) = 0, Var (εi ) = σ 2 , i = 1, . . . , n

β0 = intercept, β1 = slope
Estimate (β0 , β1 ) by least squares
n
X
(βb0 , βb1 ) = arg min (yi − β0 − β1 xi )2
(β0 ,β1 )
i=1
sXY Cov(X , Y )
βb1 = =
sXX Var(X )
βb0 = y − βb1 x

Pullovers Data
240
●
200
●
●
● ●
Sales (X1) ● ●
●
160
120 ●
●
80
80 90 100 110 120
Price (X2)
Regression of sales (X1 ) on price (X2 ) of pullovers, βb0 = 210.8,

βb1 = −0.36. MVAregpull
Swiss bank notes
11.5
Upper inner frame (X5), genuine
●
● ●
● ●
●● ●
● ●●●●
● ● ●● ●
● ● ● ●
10.5
● ● ● ● ●●
●● ● ●
● ●
● ● ●●●● ● ●
●●● ● ●
●
● ●● ●● ●●● ●
●● ●
● ● ● ●●
● ●● ● ●●
9.5 ●
●
●●
● ●● ●
● ●
●
●
● ● ● ●
8.5
●
7.5
7 8 9 10
Lower inner frame (X4), genuine
Regression of upper inner frame (X5 ) on lower inner frame (X4 ) for
genuine bank notes. MVAregbank
Total variation
Regression equations: yi = β0 + β1 xi + εi and ybi = βb0 + βb1 xi

n
X n
X n
X
(yi − y )2 = y i − y )2 +
(b (yi − ybi )2
|i=1 {z } |i=1 {z } |i=1 {z }
SSTO SSTR SSE
SSTO = SSTR + SSE
SSTO - Variation in the response variable (total variation)

SSTR - Variation explained by linear regression
SSE - Error sum of squares

Pullover Data
195
●
185
Sales (X1)
●
175
●
165
88 90 92 94 96 98 100 102
Price (X2)
Regression of sales (X1 ) on price (X2 ) of pullovers with highlighted

distances. MVAregzoom
Coefficient of determination
P
n
yi − y )2
(b
i=1 SSTR
r2 = =
Pn
SSTO
(yi − y )2
i=1
r 2 = 1: variation fully explained by linear regression, i.e. y is

a linear function of x.
P
n
(yi − ybi )2
i=1
r2 = 1 −
P
n
(yi − y )2
i=1

Example: “Classic blue” pullover data

Regress sales on price: βb0 = 210.774, βb1 = −0.364, r 2 = 0.028.
Low r 2 : sales are not influenced very much by the price (in a linear
way).
Regression of Y on X is dissimilar to regression of X on Y .

t-Test for β1
H0 : β1 = 0 (ρXY = 0) H1 : β1 6= 0
σb2 b
σ βb1
Var(βb1 ) = , SE (βb1 ) = , t =
(n · sXX ) (n · sXX )1/2 SE (βb1 )
t1−α/2;n−2 : 1 − α critical value (i.e. 1 − α/2 quantile) of the
Student’s t-distribution with (n − 2) degrees of freedom
Do not reject H0 if |t| ≤ t1−α/2;n−2

Distance of the inner frame to the lower and to the upper border,
i.e. X4 vs. X5 .
Why is negative slope to be expected?
sXY −0.26347
βb0 = 14.666 and βb1 = = = −0.626.
sXX 0.41321
|t| = |−8.064| > t0.975;98 = 1.9845

Summary: Linear Regression
The linear regression y = β0 + β1 x + ε models a linear

relation between two one-dimensional variables.
The sign of the slope βb1 is the same as that of the covariance
and the correlation of x and y .
A linear regression predicts values of Y given a possible
observation x of X .

The coefficient of determination r 2 measures the amount of

variation in Y which is explained by a linear regression on X .
If the coefficient of determination is r 2 = 1, then all points lie
on one line.
The regression line of X on Y and the regression line of Y on
X are in general different.

βb1
The t-test for the hypothesis β1 = 0 is t = , where
SE (βb1 )
SE (βb1 ) = σ
b
(n·sXX )1/2
.
The t-test rejects the null hypothesis β = 0 at the level of
significance α if |t| ≥ t1−α/2;n−2 where t1−α;n−2 is the
1 − α/2 quantile of the Student’s t-distribution with (n − 2)
degrees of freedom.
The standard error SE (β) b increases/decreases with less/more
spread in the X variables.

Simple Analysis of Variance (ANOVA)
Assumptions
Average values of the response variable y are induced by one

simple factor
Factor takes on p values
For each factor level, we have m = n/p observations
All observations are independent

sample element factor levels l

1 y11 · · · y1l · · · y1p
.. .. ..
2 . . .
.. .. .. ..
. . . .
k yk1 · · · ykl · · · ykp
.. .. .. ..
. . . .
m = n/p ym1 ··· yml ··· ymp
Table: Observations structure of a simple ANOVA.

Simple ANOVA Model
ykl = µl + εkl for k = 1, . . . , m and l = 1, . . . , p. (1)
Note
I Each factor has a mean value µl
I Observation ykl equals the sum of µl and a zero mean random
error εkl
I Linear regression model: m = 1, p = n and µi = α + βxi ,
where xi is the i-th level value of the factor

Analyse the effect of three marketing strategies:

1. Advertisement in local newspapers
2. Presence of sales assistant
3. Luxury presentation in shop windows
p = 3 factors, 10 different shops and n = mp = 30 observations

shop marketing strategy

k factor l
1 2 3
1 9 10 18
2 11 15 14
3 10 11 17
4 12 15 9
5 7 15 14
6 11 13 17
7 12 7 16
8 10 15 14
9 11 13 17
10 13 10 15
Table: Pullover sales as function of marketing strategy.

Do all three strategies have the same mean effect?
Test
H0 : µl = µ for l = 1, . . . , p vs. H1 : µl 6= µl 0 for some l and l 0
Alternative: one marketing strategy is better than the others

Decomposition of sums of squares

p X
X m p
X p X
X m
2 2
(ykl − ȳ ) = m (ȳl − ȳ ) + (ykl − ȳl )2
l=1 k=1 l=1 l=1 k=1
Total variation (sum of squares = SS)

p X
X m p X
X m
2 −1
SS(reduced) = (ykl − ȳ ) , ȳ = n ykl
l=1 k=1 l=1 k=1
Variation under H1
Xp Xm m
X
SS(full) = (ykl − y¯l )2 , y¯l = m−1 ykl
l=1 k=1 k=1

F -test
{SS(reduced) − SS(full)}/{df (r ) − df (f )}
F =
SS(full)/df (f )
Degrees of freedom
I Number of observations minus the number of parameters
I Full model df (f ) = n − p
I Reduced model df (r ) = n − 1

ANOVA Table
SS df MS F -stat p-value
SS(explained) SS(explained)/(p−1)
SS(explained) p−1 p−1 MSE
p-value
SS(full)
SS(full) n−p n−p
= MSE
SS(reduced) n−1
F ∼ Fp−1,n−p
Test: reject H0 if F > F1−α;p−1,n−p , or if p-value< α

Reduced model: H0 : µl = µ l = 1, 2, 3
Full model: H1 : µl different
df (r ) = n − #parameters(r ) = 30 − 1 = 29
df (f ) = n − #parameters(f ) = 30 − 3 = 27
SS(reduced) = 260.3
SS(full) = 157.7
(260.3 − 157.7)/(29 − 27)
F = = 8.78 > F2;27 (0.95) = 3.35
157.7/27

SS df MS F -stat p-value
102.6 2 51.30 8.78 0.001

157.7 27 5.84
260.3 29

F -test in a linear regression model
Reduced model: yi = β0 + 0 · xi + εi
n
X
SS(reduced) = (yi − ȳ )2
i=1
n
X
SS(full) = (yi − ybi )2 = RSS
i=1
{SS(reduced) − SS(full)}/{1}
F =
SS(full)/ (n − 2)

Explained Variation
n
X n
X 2
yi − ȳ )
(b 2
= βb0 + βb1 xi − ȳ
i=1 i=1
n
X
= βb12 (xi − x̄)2
i=1
= βb12 n · sXX
βb12 n · sXX
F =
RSS/(n − 2)
!2
βb1
=
SE(βb1 )

Summary: ANOVA
Simple ANOVA models an output Y as a function of one

factor.
The reduced model is the hypothesis of equal means.
The full model is the alternative hypothesis of different means.
The F -test is based on a comparison of the sum of squares
under the full and the reduced models.

Summary: ANOVA
The degrees of freedom are calculated as the number of

observations minus the number of parameters.
The F -statistic is
F = .
SS(full)/df (f )
Reject the null if the F -statistic is larger than the
(1 − α)-quantile of the Fdf (r )−df (f ),df (f ) distribution.
The F -test statistic for the slope of the linear regression
model yi = β0 + β1 xi + εi is the square of the t-test statistic.

Multiple Linear Model
y (n × 1), X (n × p), β = (β1 , . . . , βp )

Approximate y by a linear combination yb of columns of X
Find βb such that yb = X βb is the best fit of y = X β + ε (errors ε)
βb = arg min (y − X β)> (y − X β)

β
n
X −1
= arg min (yi − xi> β)2 = X X>
X >y ,
β
i=1
if X > X is of full rank.

Linear Model with Intercept
yi = β0 + β1 xi1 + . . . + βp xip + εi i = 1, . . . , n
can be written as
y = X ∗β∗ + ε
where
X ∗ = (1
n X )
β0
βb∗ = = (X ∗> X ∗ )−1 X ∗> y
b
βb


Approximate the sales as a linear function of the three other
variables: price (X2 ), advertisement (X3 ) and presence of sales
assistants (X4 )
Adding a column of ones to the data (in order to estimate also the
intercept β0 ) leads to
βb0 = 65.670, βb1 = −0.216, βb2 = 0.485, βb3 = 0.844.
Coefficient of determination: r 2 = 0.907

Remark:
The coefficient of determination is influenced by the number of
regressors.
For a given sample size n, the r 2 value will increase by adding
more regressors into the linear model.
A corrected coefficient of determination for p regressors and
a constant intercept:
2 p(1 − r 2 )
radj = r2 −
n − (p + 1)


Corrected coefficient of determination:
2 3(1 − 0.9072 )
radj = 0.907 −
10 − 3 − 1
= 0.818.
81.8% of the variation of the response variable is explained by the

explanatory variables.

Simple ANOVA Model

 
1m 0m 0m
 
X =  0m 1m 0m 
0m 0m 1m
m = 10, p = 3, n = mp = 30; X (n × p)
β = (µ1 , µ2 , µ3 )> parameter vector
y = X β + ε linear model

Reduced model (µ1 = µ2 = µ3 = µ)
βbH0 = y
df (r ) = n − 1
Full model (µi 6= µj )
βbH1 = (X > X )−1 X > y

df (f ) = n − 3
n
X
SS(reduced) = (yi − ybi )2 = ky − X βbH0 k2
i=1
SS(full) = ky − X βbH1 k2

Simple ANOVA Model - F −test
F =
SS(full)/df (f )
{||y − X βH || − ||y − X βbH ||2 }/{df (r ) − df (f )}
b 2
0 1
=
||y − X βbH1 ||2 /df (f )
Comparing the lengths of projections into different column

spaces.

Summary: Multiple Linear Model
The relation y = X β + ε models a linear relation between

a one-dimensional variable Y and a p-dimensional variable X .
Py gives the best linear regression fit of the vector y onto
C (X ). The least squares parameter estimator is
βb = (X > X )−1 X > y .
The simple ANOVA model can be written as a linear model.

Summary: Multiple Linear Model
The ANOVA model can be tested by comparing the length of

the projection vectors.
The test statistic of the F -test can be written as
{||y − X βbH0 ||2 − ||y − X βbH1 ||2 }/{df (r ) − df (f )}
.
||y − X βbH ||2 /df (f )
1
The adjusted coefficient of determination is
2 p(1 − r 2 )
radj = r2 − .
n − (p + 1)

Multivariate Distributions 4-1
Multivariate Distributions
Random vector X ∈ Rp
(Multivariate) distribution function is
F (x) = P(X ≤ x) = P(X1 ≤ x1 , X2 ≤ x2 , . . . , Xp ≤ xp )
f (x) denotes density of X , i.e.
Z x
F (x) = f (u)du
Z ∞ −∞
f (u) du = 1
−∞
Zb
P{X ∈ (a, b)} = f (x)dx
a

X = (X1 , X2 )> , X1 ∈ Rk X2 ∈ Rp−k
Marginal density of X1 is
Z ∞
fX1 (x1 ) = f (x1 , x2 )dx2
−∞
Conditional density of X2 (conditioned on X1 = x1 )
fX2 |X1 =x1 (x2 ) = f (x1 , x2 )/fX1 (x1 )

Example
(
1
2 x1 + 32 x2 0 ≤ x1 , x2 ≤ 1,
f (x1 , x2 ) =
0 otherwise.
f (x1 , x2 ) is a density since
Z 1 1
1 x12 3 x22 1 3
f (x1 , x2 )dx1 x2 = + = + = 1.
2 2 0 2 2 0 4 4

The marginal densities

Z Z 1
1 3 1 3
fX1 (x1 ) = f (x1 , x2 )dx2 = x1 + x2 dx2 = x1 + ;
0 2 2 2 4
Z Z 1
1 3 3 1
fX2 (x2 ) = f (x1 , x2 )dx1 = x1 + x2 dx1 = x2 + ·
0 2 2 2 4
The conditional densities

1 3 1 3
2 x1 + 2 x2 2 x1 + 2 x2
f (x2 | x1 ) = 1 3
and f (x1 | x2 ) = 3 1
·
2 x1 + 4 2 x2 + 4
These conditional pdf’s are nonlinear in x1 and x2 although the

joint pdf has a simple (linear) structure.

Definition of independence
X1 , X2 are independent iff
f (x) = f (x1 , x2 ) = fX1 (x1 )fX2 (x2 )
Two random variables may have identical marginals but

different joint distribution.

Example
f (x1 , x2 ) = 1, 0 < x1 , x2 < 1,
f (x1 , x2 ) = 1+α(2x1 −1)(2x2 −1), 0 < x1 , x2 < 1, −1 ≤ α ≤ 1.
fX1 (x1 ) = 1, fX2 (x2 ) = 1.

Z 1
1 + α(2x1 − 1)(2x2 − 1)dx2 = 1 + α(2x1 − 1)[x22 − x2 ]10 = 1.
0

Swiss bank notes Swiss bank notes
0.6
0.5
0.5
0.4
0.4
Density
Density
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
7 8 9 10 11 7 8 9 10 11 12
Lower Inner Frame (X4) Upper Inner Frame (X5)
Univariate estimates of the density of X4 (left) and X5 (right) of

the bank notes. MVAdenbank2
0.2
0.15
0.1
0.05
0
12 14
10 12
10
8 8
6
Product of univariate density estimates for X4 and X5 of the bank

notes. MVAdenbank3
0.2
0.15
0.1
0.05
0
5
15
10
10
15 5
Joint density estimate for X4 and X5 of the bank notes.

MVAdenbank3
Summary: Distributions
The cumulative distribution function (cdf) is

F (x) = P(X < x).
If a probability
Z x density function (pdf) f exists then
F (x) = f (u)du.
−∞
Let X = (X1 , X2 )> be partitioned in subvectors X1 and X2
with joint cdf F . Then FX1 (x1 ) = P(X1 ≤ x1 ) is the marginal
cdf of X1 . ZThe marginal pdf of X1 is
∞
fX1 (x1 ) = f (x1 , x2 )dx2 .
−∞

Summary: Distributions
Different joint pdf’s may have the same marginal pdf’s.

The conditional pdf of X2 given X1 = x1 is
f (x1 , x2 )
f (x2 | x1 ) = ·
fX1 (x1 )
Two random variables X1 , X2 are called independent iff
f (x1 , x2 ) = fX1 (x1 )fX2 (x2 ). This is equivalent to
f (x2 | x1 ) = fX2 (x2 ).

Moments and Characteristic Functions
E X ∈ Rp denotes the p-dimensional vector of expected values of

the random vector X
   R 
E X1 Z x1 f (x)dx
 .   .. 
  
E X =  ..  = xf (x)dx =  .  = µ.

R
E Xp xp f (x)dx
The properties of the expected value follow from the properties of

the integral:
E (αX + βY ) = α E X + β E Y

If X and Y are independent then
Z
E(XY > ) = xy > f (x, y )dxdy
Z Z
= xf (x)dx y > f (y )dy = E X E Y >
Definition of the covariance matrix (Σ)
Σ = Var(X ) = E(X − µ)(X − µ)>
We say that a random vector X has a distribution with the vector

of expected values µ and the covariance matrix Σ,
X ∼ (µ, Σ)

Properties of the Covariance Matrix
Elements of Σ are variances and covariances of the components of

the random vector X :
Σ = (σXi Xj )
σXi Xj = Cov(Xi , Xj )
σXi Xi = Var(Xi )
Computational formula: Σ = E(XX > ) − µµ>

Covariance matrix is positive semidefinite, Σ ≥ 0
(variance a> Σa of any linear combination a> X cannot be
negative).
Properties of Variances and Covariances
X
Var(a> X ) = a> Var(X ) a = ai aj σXi Xj
i,j
>
Var(AX + b) = A Var(X ) A
Cov(X + Y , Z ) = Cov(X , Z ) + Cov(Y , Z )
Var(X + Y ) = Var(X ) + Cov(X , Y ) + Cov(Y , X ) + Var(Y )
Cov(AX , BY ) = A Cov(X , Y ) B > .

Example
(
1
2 x1 + 32 x2 0 ≤ x1 , x2 ≤ 1,
f (x1 , x2 ) =
0 otherwise.
The conditional densities

1 3 1 3
2 x1 + 2 x2 2 x1 + 2 x2
f (x2 | x1 ) = 1 3
and f (x1 | x2 ) = 3 1
·
2 x1 + 4 2 x2 + 4

Z Z Z 1Z 1
1 3
µ1 = x1 f (x1 , x2 )dx1 dx2 = x1 x1 + x2 dx1 dx2
0 0 2 2
Z 1 3 1 2 1
1 3 1 x1 3 x1
= x1 x1 + dx1 = +
0 2 4 2 3 0 4 2 0
1 3 4+9 13
= + = = ,
6 8 24 24
Z Z Z 1Z 1
1 3
µ2 = x2 f (x1 , x2 )dx1 dx2 = x2 x1 + x2 dx1 dx2
0 0 2 2
Z 1 2 1 3 1
1 3 1 x2 3 x2
= x2 + x2 dx2 = +
0 4 2 4 2 0 2 3 0
1 1 1+4 5
= + = = ·
8 2 8 8
Covariance Matrix
σX1 X1 = E X12 − µ21 with
Z 1Z 1
2 1 3
E X12 = x1 x1 + x2 dx1 dx2
0 0 2 2
4 1 3 1
1 x1 3 x1 3
= + =
2 4 0 4 3 0 8
σX2 X2 = E X22 − µ22 with

Z 1Z 1
1 3
E X22 = x22 x1 + x2 dx1 dx2
0 0 2 2
3 1 4 1
1 x2 3 x2 11
= + =
4 3 0 2 4 0 24

σX 1 X 2 = E(X1 X2 ) − µ1 µ2 with
Z 1Z 1
1 3
E(X1 X2 ) = x1 x2 x1 + x2 dx1 dx2
0 0 2 2
Z 1
1 3
= x2 + x22 dx2
0 6 4
1 1
1 x22 3 x23 1
= + = .
6 2 0 4 3 0 3
!
0.0815 0.0052
Σ =
0.0052 0.0677

Conditional Expectations
Random vector X = (X1 , X2 )> , X1 ∈ Rk X2 ∈ Rp−k

Conditional expectation of X2 , given X1 = x1 :
Z
E(X2 | x1 ) = x2 f (x2 | x1 ) dx2
and conditional expectation of X1 , given X2 = x2 :

Z
E(X1 | x2 ) = x1 f (x1 | x2 ) dx1
The conditional expectation E(X2 | x1 ) is a function of x1 .

Typical example of this setup is a simple linear regression, where
E(Y | X = x) = X β.
Error term in approximation:
U = X2 − E(X2 | X1 )
(1) E(U) = 0
(2) E(X2 |X1 ) is the best approximation of X2 by a function h(X1 )
of X1 in the sense of mean squared error (MSE) when
MSE (h) = E[{X2 − h(X1 )}> {X2 − h(X1 )}] and
h : Rk −→ Rp−k .

Summary: Moments
R
The expectation of a random vector X is µ = xf (x) dx, the
covariance matrix Σ = Var(X ) = E(X − µ)(X − µ)> . We
denote X ∼ (µ, Σ).
Expectations are linear, i.e., E(αX + βY ) = α E X + β E Y . If
X , Y are independent then E(XY > ) = E X E Y > .

Summary: Moments
The covariance between two random vectors X , Y is ΣXY =

Cov(X , Y ) = E(X − E X )(Y − E Y )> = E(XY > ) − E X E Y > .
If X , Y are independent then Cov(X , Y ) = 0.
The conditional expectation E(X2 |X1 ) is the MSE best
approximation of X2 by a function of X1 .

Characteristic Functions
The characteristic function (cf) of a random vector X ∈ Rp is

defined as
Z
> >
ϕX (t) = E(e it X ) = e it x f (x) dx, t ∈ Rp ,
where i is the complex unit: i2 = −1.

Properties of cf:
ϕX (0) = 1, |ϕX (t)| ≤ 1
Z ∞
if ϕ is absolutely integrable ( |ϕ(x)|dx exists and is finite)
−∞
then Z ∞
1 >x
f (x) = e −it ϕX (t) dt.
(2π)p −∞
if X = (X1 , X2 , . . . , Xp )> then for t = (t1 , t2 , . . . , tp )> :
ϕX1 (t1 ) = ϕX (t1 , 0, . . . , 0), . . . , ϕXp (tp ) = ϕX (0, . . . , 0, tp ).

For X1 , . . . , Xp independent RV’s and t = (t1 , t2 , . . . , tp )> is:
p
Y
ϕX (t) = ϕXj (tj ).
j=1
For X1 , . . . , Xp independent RV’s, t ∈ R is:

p
Y
ϕX1 +...+Xp (t) = ϕXj (t).
j=1
The characteristic function allows to recover all the cross-product

moments of any order: ∀jk ≥ 0, k = 1, . . . , p, t = (t1 , . . . , tp )> we
have " #
1 ∂ϕX (t)
j1 jp
E X1 · · · Xp = j1 +...+jp j
.
i ∂t1j1 · · · ∂tpp t=0

X ∈ R1 follows the standard normal distribution

2
1 x
fX (x) = √ exp −
2π 2
Z ∞ 2
1 itx x
ϕX (t) = √ e exp − dx
2π −∞ 2
2 Z ∞
t 1 (x − it)2
= exp − √ exp − dx
2 −∞ 2π 2
2
t
= exp − ,
2
R n 2
o
since i2 = −1 and √12π exp − (x−it)
2 dx = 1.

Theorem (Cramér-Wold)
The distribution of X ∈ Rp is completely determined by the set of
all (one-dimensional) distributions of t > X , t ∈ Rp .
This theorem says that we can determine the distribution of X in
Rp by specifying all the one-dimensional distributions of the linear
combinations
p
X
tj Xj = t > X , t = (t1 , t2 , . . . , tp )> .
j=1

Summary: Characteristic Functions
The characteristic function (cf) of a random vector X is

ϕX (t) = E(e it X ).
>
The distribution of a p-dimensional random variable X is

completely determined by all one-dimensional distributions of
t > X , t ∈ Rp (Theorem of Cramer-Wold).

Cumulants
For a random variable X with density f and finite moments of

order k the characteristic function ϕX (t) = E(e itX ) has a derivative
" (j) #
∂ϕX
= κj , j = 1, . . . , k.
∂t
t=0
The values κj are called cumulants or semi-invariants since κj does

not change (for j > 1) under a shift transformation X 7→ X + a.
The cumulants are natural parameters for dimension reduction
methods, in particular the Projection Pursuit method.

The relation between the first k moments m1 , . . . , mk and the

cumulants is given by

m1 1! ... 0

1
m2 m1 ...
0
k−1
κk = (−1) . .. .. ..
.. .! . .
!

k −1 k −1
mk mk−1 ... m1
0 k −2
.

Suppose that k = 1, then
κ1 = m 1 .
For k = 2 we obtain

m1 1!

κ2 = − 1 = m2 − m12
m2 m1
0

For k = 3 we have to calculate

m1 1 0

κ3 = − m2 m1 1

m3 m2 2m1
Calculating this determinant we arrive at:

m 1 1 0 1 0
1
κ3 = m1 − m2 + m3
m2 2m1 m2 2m1 m1 1
= m1 (2m12 − m2 ) − m2 (2m1 ) + m3
= m3 − 3m1 m2 + 2m13 .
In a similar way one calculates
κ4 = m4 − 4m3 m1 − 3m22 + 12m2 m12 − 6m14 .

In a similar fashion we find the moments from the cumulants:
m1 = κ1
m2 = κ2 + κ21
m3 = κ3 + 3κ2 κ1 + κ31
m4 = κ4 + 4κ3 κ1 + 3κ22 + 6κ2 κ21 + κ41
A very simple relationship can be observed between the

semi-invariants and the central moments µk = E(X − µ)k , where
µ = m1 as defined before. We have, in fact, κ2 = µ2 , κ3 = µ3 ,
κ4 = µ4 − 3µ22 .

Skewness γ3 and kurtosis γ4 are defined as:
γ3 = E(X − µ)3 /σ 3
γ4 = E(X − µ)4 /σ 4
The skewness and kurtosis determine the shape of one-dimensional

distributions. The skewness of a normal distribution is 0 and the
kurtosis equals 3. The relation of these parameters to the
cumulants is given by:
κ3
γ3 = 3/2
κ2
κ4
γ4 =
κ22

Transformations
Suppose X ∼ fX . What is the pdf of Y = 3X ?
X = u(Y )
one-to-one transformation u: Rp → Rp
Jacobian:
∂xi ∂ui (y )
J = =
∂yj ∂yj
fY (y ) = abs(|J |)fX {u(y )}

Example
(x1 , . . . , xp )> = u(y1 , . . . , yp )
Y = 3X → X = 13 Y = u(y )
 
1
3 0
 .. 
J =
 . 

1
0 3

1 p
abs(|J |) = 3

Y = AX + b, A nonsingular
X = A−1 (Y − b)
J = A−1
fY (y ) = abs(|A|−1 )fX {A−1 (y − b)}

X = (X1 , X2 ) ∈ R2 with density fX (x) = fX (x1 , x2 )

! !
1 1 0
A= , b= .
1 −1 0
!
X1 + X2
Y = AX + b =
X1 − X2
!
1 1 −1 −1
|A| = −2, abs(|A| ) = , A−1 = −
−1
.
2 2 −1 1

1 1 1
fY (y ) = fX (y1 + y2 ), (y1 − y2 ) .
2 2 2

Summary: Transformations
If X has pdf fX (x), then a transformed random vector Y ,

X = u(Y ), has pdf fY (y ) = abs(|J |) · fX {u(y )}, where J
∂u(yi )
denotes the Jacobian J = ∂yj .
In the case of a linear relation Y = AX + b the pdf’s of X
and Y are related via fY (y ) = abs(|A|−1 )fX {A−1 (y − b)}.

Multinormal Distribution
The pdf of a multinormal is (assuming that Σ has full rank):

−1/2 1 > −1
f (x) = |2πΣ| exp − (x − µ) Σ (x − µ) .
2
X ∼ Np (µ, Σ)
Expected value is E X = µ,
Covariance matrix of X is Var(X ) = Σ > 0.
(What is the meaning of the quadratic form (x − µ)> Σ−1 (x − µ)
in the formula for density?)
Geometry of the Np (µ, Σ) Distribution
Density of Np (µ, Σ) is constant on ellipsoids of the form
(x − µ)> Σ−1 (x − µ) = d 2
If X ∼ Np (µ, Σ), then the variable Y = (X − µ)> Σ−1 (X − µ) is

χ2p distributed, since the Mahalanobis transformation yields
P
Z = Σ−1/2 (X − µ) ∼ Np (0, Ip ) and Y = Z > Z = pj=1 Zj2 .

Normal sample Contour Ellipses
10
● ●
●●
● ●
● ● ●
6
● ● ●
● ●
● ● ● ● 0.015
● ●● ● ●●
● ● ● ● ● ●●
●●●●●
5
●●●
●●
4
● ● ●●● 0.025
●● ●● ●●●
● ●● ●●
●● ● ● ●● ●●
●●
●●●●● ●
●●●● ●● ●
● ● 0.035
●●
●● ●●●● ●●
● ●●
●●
● ● ●●● ●● ●
X2
X2
● ●●
●
● ●●●●● ● ●●
●
●●●●●●●● ●●●●●
●●●●
●●●
●●● ●
2
●●
● ●●●● ●
●●●
●● ● ●
●●●● ●● ●●●●
● ● ●● ●
● ●● ●● ●●●
● ●●
●● ●
●●●●●● ● ●●●
●
● ●● ●●●●●
0
●● ● ● ●
● ●● ●
0.03
●●● ● ●● ●● ● ●
● ● ●
● ●
0
●
●●● ● ● 0.02
● ● ●
●● ●●●
● ●●●● ●
● ● ●
● ●●●●● ● ●● 0.01
● ● ● ● ●●● 0.005
−2
●
●●●
−5
●
●
●●
●
−1 0 1 2 3 4 5 6 −2 0 2 4 6
X1 X1
!
3
Scatterplot of normal sample and contour ellipses for µ =
2
!
1.0 −1.5
and Σ = MVAcontnorm
−1.5 4.0
Singular Normal Distribution
Definition of “Normal” distribution in case that the matrix Σ is

singular-we use its eigenvalues λi and the generalized inverse Σ− :
rank(Σ) = k < p, λ1 , . . . , λk

(2π)−k/2 1 > −
f (x) = exp − (x − µ) Σ (x − µ)
(λ1 · · · λk )1/2 2
Σ− = G-inverse

Summary: Multinormal Distribution
The pdf of a p-dimensional multinormal X ∼ Np (µ, Σ) is

−1/2 1 > −1
f (x) = |2πΣ| exp − (x − µ) Σ (x − µ) .
2
The contour curves of a multinormal are ellipsoids with
√
half-lengths proportional to λi , where λi , i = 1, · · · , p,
denote the eigenvalues of Σ.
The Mahalanobis transformation transforms X ∼ Np (µ, Σ) to
Y = Σ−1/2 (X − µ) ∼ Np (0, Ip ). Vice versa, one can create
X ∼ Np (µ, Σ) from Y ∼ Np (0, Ip ) via X = Σ1/2 Y + µ.

Summary: Multinormal Distribution
If the covariance matrix Σ is singular (i.e., rank(Σ) < p), then

it defines a singular normal distribution.
The density of a singular normal distribution is given by

(2π)−k/2 1 > −
f (x) = exp − (x − µ) Σ (x − µ) ,
(λ1 · . . . · λk )1/2 2
where Σ− denotes the G-inverse of Σ.

Limit Theorems
Central Limit Theorem describes the (asymptotic) behaviour of

sample mean
X1 , X2 , . . . , Xn , i.i.d. with Xi ∼ (µ, Σ)
√ L
n(x̄ − µ) −→ Np (0, Σ) for n −→ ∞.
The CLT can be easily applied for testing.

Normal distribution plays a central role in statistics.

Asymptotic Distribution, n = 5 Asymptotic Distribution, n = 35
0.4
0.4
Estimated and Normal Density
Estimated and Normal Density

0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
−3 −2 −1 0 1 2 3 −4 −2 0 2 4
1000 Random Samples 1000 Random Samples
The CLT for Bernoulli distributed random variables. Sample size

n = 5 (left) and n = 35 (right). MVAcltbern
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
5 5
4 4
0 2 0 2
0 0
−2 −2
−5 −4 −5 −4
The CLT in the two-dimensional case. Sample size n = 5 (left) and

n = 85 (right). MVAcltbern2
b a consistent estimator of Σ: b→P

Σ Σ Σ.
x̄ is asymptotically normal:
√ 1 L
b − 2 (x̄ − µ) −→ Np (0, Ip )
nΣ as n → ∞
Confidence interval for (univariate) mean µ

Xi ∼ N(µ, σ 2 )

√ x̄ − µ L
n −→ N(0, 1) as n → ∞
b
σ

Define u1−α/2 as the 1 − α/2 quantile of the N(0, 1) distribution.

Then we get the following 1 − α confidence interval:

b
σ b
σ
C1−α = x̄ − √ u1−α/2 , x̄ + √ u1−α/2
n n
P(µ ∈ C1−α ) −→ 1 − α for n → ∞.

EDF and CFD
1.0
●
Empirical
●
Theoretical
0.8
EDF(X), CDF(X)
0.6
0.4
0.2
0.0
−3 −2 −1 0 1 2 3
The standard normal cdf and the empirical distribution function for
n = 100. MVAedfnormal
EDF and CFD
1.0
●
Empirical
●
Theoretical
0.8
EDF(X), CDF(X)
0.6
0.4
0.2
0.0
−3 −2 −1 0 1 2 3
The standard normal cdf and the empirical distribution function for
n = 1000 MVAedfnormal
EDF and 2 bootstrap EDFs, n = 100
1.0
edf
1. bootstrap edf
0.8
2. bootstrap edf
edfs[1..3](x)
0.6
0.4
0.2
0.0
−3 −2 −1 0 1 2
X
The edf Fn and two bootstrap edf‘s Fn∗ . MVAedfbootstrap

Bootstrap confidence intervals
Empirical distribution function

P
edf Fn (x) = n−1 ni=1 I(xi ≤ x)
Xi ∼ F
Xi∗ ∼ Fn
x̄ ∗ = mean of bootstrap sample
√ ∗ √
∗ n(x̄ − x̄) n(x̄ − µ) a.s.
sup P ∗
<u −P <u −→ 0
u b
σ σ
Construction of confidence intervals possible! The unknown
distribution of x̄ can be approximated by the known distribution of
x̄ ∗ .
Transformation of Statistics
√ L
If n(t − µ) −→ Np (0, Σ) and if f = (f1 , . . . , fq )> : Rp → Rq
are real valued functions which are differentiable at µ ∈ Rp , then
f (t) is asymptotically normal with mean f (µ) and covariance
D> ΣD, i.e.,
√ L
n{f (t) − f (µ)} −→ Nq (0, D> ΣD) for n −→ ∞,
where
∂fj
D= (t)
∂ti t=µ
(p × q) matrix of all partial derivatives.
This theorem can be applied e.g. to find the “variance stabilizing”
transformation.
Example
Suppose
!
0 1 0.5
{Xi }ni=1 ∼ (µ, Σ); µ= , Σ= , p = 2.
0 0.5 1
We have by CLT for n → ∞

√ L
n(x̄ − µ) → N(0, Σ).
!
x̄12 − x̄2
The distribution of ?
x̄1 + 3x̄2
This means to consider f = (f1 , f2 )> with
f1 (x1 , x2 ) = x12 − x2 , f2 (x1 , x2 ) = x1 + 3x2 , q = 2.

0

Then f (µ) = 0 and
! !
∂fj 2x1 1 0 1

D = (dij ), dij = = = .
∂xi x=µ −1 3 −1 3
x=0
We have the covariance

! ! ! !
0 −1 1 12 0 1 1 − 72
1
=
1 3 2 1 −1 3 − 27 13 .
D> Σ D D> ΣD
This yields
! !!
√ x̄12 − x̄2 L 0 1 − 72
n → N2 , .
x̄1 + 3x̄2 0 − 27 13

Summary: Limit Theorems
If X1 , . . . , Xn are i.i.d. random vectors with Xi ∼ (µ, Σ), then

√
the distribution of n(x̄ − µ) is asymptotically N(0, Σ)
(Central Limit Theorem).
If X1 , . . . , Xn are i.i.d. random variables with Xi ∼ (µ, σ),
then an asymptotic confidence interval can be constructed by
the CLT:
b
σ
x̄ ± √ u1−α/2 .
n

Summary: Limit Theorems
For small sample sizes the Bootstrap improves the precision of

this confidence interval.
The Bootstrap estimates x̄ ∗ have the same asymptotic limit.
If t is a statistic that is asymptotically normal, i.e.,
√ L
n(t − µ) → Np (0, Σ), then this holds also for a function
√
f (t), i.e., n{f (t) − f (µ)} is asymptotically normal.

Heavy-Tailed Distributions
Introduced by Pareto, studied by Paul Lévy

Applications: finance, medicine, seismology, engineering
I asset returns in financial markets
I stream flow in hydrology
I insurance
I precipitation and hurricane damage in meteorology
I earthquake prediction in seismology
I pollution
I material strength

Definition
A distribution is called heavy-tailed if it has higher probability

density in its tail area compared with a normal distribution with
same mean µ and variance σ 2 .

Distribution Comparison
0.4
−2f −1f 1f 2f ●
Gauss
●
Cauchy
0.2
Y
0.0
−6 −2 0 2 4 6
X
Figure: Comparison of the pdf of a standard Gaussian (blue) and a

Cauchy distribution (red) with location parameter 0 and scale parameter
1. MVAgausscauchy

Kurtosis
In terms of kurtosis, a heavy-tailed distribution has kurtosis greater

than 3, which is called leptokurtic, in contrast to mesokurtic
distribution (kurtosis = 3) and platykurtic distribution (kurtosis
< 3).

Generalised Hyperbolic Distribution
Introduced by Barndorff-Nielsen and at first applied to model grain

size distributions of wind blown sands.
Applications:
stock price modelling,
market risk measurement.

PDF of GH Distribution
The density of a one-dimensional generalised hyperbolic (GH)
distribution for x ∈ R is
fGH (x; λ, α, β, δ, µ) =
p p
( α2 − β 2 /δ)λ Kλ−1/2 {α δ 2 + (x − µ)2 } β(x−µ)
=√ p · p e ,
2πKλ (δ α2 − β 2 ) ( δ 2 + (x − µ)2 /α)1/2−λ
Kλ is a modified Bessel function of the third kind with index λ
Z
1 ∞ λ−1 − x (y +y −1 )
Kλ (x) = y e 2 dy
2 0

Parameters
The domain of variation of the parameters is µ ∈ R and
δ ≥ 0, |β| < α, if λ > 0

δ > 0, |β| < α, if λ = 0
δ > 0, |β| ≤ α, if λ < 0
where µ is location, δ scale parameter.

Mean and Variance of GH Distribution
p
Kλ+1 (δ α2 − β 2 )
δβ
E[X ] = µ + p p
α2 − β 2 Kλ (δ α2 − β 2 )
" p
2 Kλ+1 (δ α2 − β 2 )
Var[X ] = δ p p
δ α2 − β 2 Kλ (δ α2 − β 2 )
p
β2 Kλ+2 (δ α2 − β 2 )
+ 2 p
α − β 2 Kλ (δ α2 − β 2 )
p #
Kλ+1 (δ α2 − β 2 ) 2
− p
Kλ (δ α2 − β 2 )

Hyperbolic and Normal-Inverse Gaussian

Distributions
With specific values of λ we obtain different sub-classes of GH.
For λ = 1 we obtain the hyperbolic distributions (HYP)
p √2
α2 − β 2 2
fHYP (x; α, β, δ, µ) = p e {−α δ +(x−µ) +β(x−µ)}
2αδK1 (δ α2 − β 2 )
where x, µ ∈ R, δ ≥ 0 and |β| < α.
For λ = −1/2 we obtain the normal-inverse Gaussian distribution

(NIG)
p
αδ K1 α (δ 2 + (x − µ)2 ) {δ√α2 −β 2 +β(x−µ)}
fNIG (x; α, β, δ, µ) = p e .
π δ 2 + (x − µ)2
PDF of GH, HYP and NIG CDF of GH, HYP and NIG
1.0
0.5
●
GH ●
GH
●
NIG ●
NIG
●
HYP ●
HYP
0.8
0.4
0.6
0.3
Y
Y
0.2
0.4
0.1
0.2
0.0
0.0
−6 −2 2 6 −6 −2 2 6
X X
Figure: pdf (left) and cdf (right) of GH (λ = 0.5), HYP and NIG with
α = 1, β = 0, δ = 1, µ = 0 MVAghdis

Student’s t-distribution
Introduced by Gosset (1908). Published under the pseudonym

”Student” by request of his employer.
Let X be a normally distributed rv with mean µ and variance σ 2 ,
and Y be the rv such that Y 2 /σ 2 has a chi-square distribution with
n degrees of freedom. Assume that X and Y are independent, then
√
def X n
t =
Y
is distributed as Student’s t with n degrees of freedom.

PDF of Student’s t-distribution
The t-distribution has the following density function

!− n+1
Γ n+1
2 x2 2
ft (x; n) = √ 1+
nπΓ n2 n
where n is the number of degrees of freedom, −∞ < x < ∞, and

Γ is the gamma function
Z ∞
Γ(α) = x α−1 e −x dx
0

PDF of t−distribution CDF of t−distribution
1.0
0.4
●
t3 ●
t3
●
t6 ●
t6
●
t30 ●
t30
0.8
0.3
0.6
0.2
Y
Y
0.4
0.1
0.2
0.0
0.0
−4 −2 0 2 4 −4 −2 0 2 4
X X
Figure: pdf (left) and cdf (right) of t-distribution with different degrees
of freedom (t3 stands for t-distribution with 3 degrees of freedom)
MVAtdis

Mean, Variance, Skewness and Kurtosis
The mean, variance, skewness and kurtosis of Student’s

t-distribution (n > 4) are:
µ = 0
n
σ2 =
n−2
Skewness = 0
6
Kurtosis = 3 + .
n−4

Property
Student’s t-distribution approaches the normal distribution as n

increases, since
1 x2
lim ft (x; n) = √ e − 2 .
n→∞ 2π

Tail of Student’s t-distribution

In the tail x is proportional to |x|−(n+1) .
Tail comparison of t−distribution
0.04
●
t1
●
t3
●
t9
●
t45
●
Gaussian
0.03
0.02
Y
0.01
0.00
2.5 3.0 3.5 4.0

X
Figure: Tails of pdf curves of t-distributions. With higher degree of

freedom, the t-distribution decays faster. MVAdistail
Laplace Distribution
The univariate Laplace distribution with mean zero was introduced

by Laplace (1774).
The Laplace distribution can be defined as the distribution of
differences between two independent variates with identical
exponential distributions.

PDF and CDF of Laplace Distribution
The Laplace distribution with mean µ and scale parameter θ has

the pdf
1 |x−µ|
fLaplace (x; µ, θ) = e − θ
2θ
and the cdf
1 |x−µ|
FLaplace (x; µ, θ) = 1 + sgn(x − µ)(1 − e − θ ) ,
2
where sgn is signum function.

The mean, variance, skewness and kurtosis of the Laplace

distribution:
µ = µ
σ 2 = 2θ2
Skewness = 0
Kurtosis = 6

PDF of Laplace distribution CDF of Laplace distribution
1.0
0.5
●
L1 ●
L1
●
L1.5 ●
L1.5
●
L2 ●
L2
0.8
0.4
0.6
0.3
Y
Y
0.4
0.2
0.2
0.1
0.0
0.0
−6 −2 2 6 −6 −2 2 6
X X
Figure: pdf (left) and cdf (right) of Laplace distributions with zero mean
and different scale parameters (L1 stands for Laplace distribution with
θ = 1) MVAlaplacedis
Standard Laplace Distribution
Standard Laplace distribution has mean 0 and θ = 1
e −|x|
f (x) =
(2
ex
2 for x < 0
F (x) = e −x
1− 2 for x ≥ 0

Cauchy Distribution
Named after Augustin Cauchy and Hendrik Lorentz.

Applications
in physics – the solution to the differential equation describing
forced resonance,
in spectroscopy – the description of the line shape of spectral
lines.

PDF and CDF of the Cauchy Distribution
1 1
fCauchy (x; m, s) =
sπ 1 + ( x−m
s )
2

1 1 x −m
FCauchy (x; m, s) = + arctan
2 π s
where m and s are location and scale parameter respectively.

Standard Cauchy Distribution
Standard Cauchy distribution has m = 0 and s = 1:

1
fCauchy (x) =
π(1 + x 2 )
1 arctan(x)
FCauchy (x) = +
2 π

PDF of Cauchy distribution CDF of Cauchy distribution
0.4
●
C1 C1
0.6
●
0.2 ●
C1.5 ●
C1.5
●
C2 ●
C2
Y
Y
0.2
0.0
−6 −2 2 6 −6 −2 2 6
X X
Figure: pdf (left) and cdf (right) of Cauchy distributions with m = 0 and
different scale parameters (C1 stands for Cauchy distribution with s = 1)
MVAcauchy

The mean, variance, skewness and kurtosis of Cauchy distribution

are all undefined, since its moment generating function diverges.
But it has mode and median, both equal the location parameter m.

Mixture Model
Mixture modelling concerns modelling a distribution by a mixture

(weighted sum) of different distributions.

PDF of Mixture Model

The pdf of a mixture distribution
Xn
f (x) = wl pl (x)
l=1
under the constraints:
0 ≤ wl ≤ 1
X n
wl = 1
l=1
Z
pl (x)dx = 1,
pl (x) is the pdf of the l’th component density and wl is a weight.

Mean, Variance,
n
Skewness and Kurtosis
X
µ = wl µl
l=1
Xn
σ2 = wl {σl2 + (µl − µ)2 }
l=1
n
( 3 3 )
X σl 3σ 2 (µl − µ) µl − µ
Skewness = wl SKl + l 3 +
σ σ σ
l=1
n (
X σl 4 6(µl − µ)2 σl2 4(µl − µ)σl3
Kurtosis = wl Kl + + SKl
σ σ4 σ4
l=1
)
µl − µ 4
+ ,
σ
where µl , σl , SKl and Kl correspond to l’th distribution.
Gaussian Mixture Models

The pdf for a Gaussian mixture:
n (x−µl )2
X w −
fGM (x) = √ l e 2σ 2
l .
l=1
2πσl
When Gaussian distributions have mean 0:
Xn 2
wl − x2
fGM (x) = √ 2σ
e l,
l=1
2πσl
with variance, skewness and kurtosis
Xn
σ2 = wl σl2 Skewness = 0
l=1
n
X 4
σl
Kurtosis = 3wl
σ
l=1
Pdf of a Gaussian mixture and Gaussian Cdf of a Gaussian mixture and Gaussian
0.4
1.0
●
Gaussian Mixture ●
Gaussian Mixture
●
Gaussian ●
Gaussian
0.8
0.3
0.6
0.2
Y
Y
0.4
0.1
0.2
0.0
0.0
−6 −2 2 6 −6 −2 2 6
X X
Figure: pdf (left) and cdf (right) of a Gaussian mixture MVAmixture
Remark The Gaussian Mixture is not in general a Gaussian

distribution.
Multivariate Generalised Hyperbolic

Distribution
The multivariate Generalised Hyperbolic Distribution (GHd ) has

the following pdf
n p o
Kλ− d α δ 2 + (x − µ)> ∆−1 (x − µ)
β > (x−µ)
fGHd (x; λ, α, β, δ, ∆, µ) = ad n 2 o d −λ e
p 2
α−1 δ 2 + (x − µ)> ∆−1 (x − µ)
p λ
α2 − β > ∆β/δ
ad = ad (λ, α, β, δ, ∆) = d p .
(2π) 2 Kλ (δ α2 − β > ∆β

Parameters of GHd
The domain of variation of the parameters:
λ ∈ R, β, µ ∈ Rd
δ > 0, α > β > ∆β
∆ ∈ Rd×d positive definite matrix
|∆| = 1
For λ = d+12 we obtain the multivariate hyperbolic (HYP)

distribution; for λ = − 12 we get the multivariate normal inverse
Gaussian (NIG) distribution.

Second Parameterization
Blæsild and Jensen (1981) introduced a second parameterization

(ζ, Π, Σ), where
p
α2 − β > ∆β
ζ = δ
s
∆
Π = β
α − β > ∆β
2
Σ = δ2∆

Second Parameterization
The mean and variance of X ∼ GHd

1
E[X ] = µ + δRλ (ζ)Π∆ 2
1 1
Var[X ] = δ 2 ζ −1 Rλ (ζ)∆ + Sλ (ζ)(Π∆ 2 )> (Π∆ 2 )
where
Kλ+1 (x)
Rλ (x) =
Kλ (x)
2 (x)
Kλ+2 (x)Kλ (x) − Kλ+1
Sλ (x) =
Kλ2 (x)

Multivariate t-distribution
X ∼ Np (µ, Σ) and Y ∼ χ2n are independent and

If p
X n/Y = t − µ, then the pdf of t is
Γ {(n + p)/2}
ft (t; n, Σ, µ) = (n+p)/2
1/2
Γ(n/2)np/2 π p/2 |Σ| 1 + n1 (t − µ)> Σ−1 (t − µ)
The distribution of t is the noncentral t-distribution with n degrees

of freedom and the noncentrality parameter µ.

Multivariate Laplace Distribution
Let g and G be the pdf and cdf of a d-dimensional Gaussian

distribution Nd (0, Σ), the pdf and cdf of a multivariate Laplace
distribution can be written as
Z ∞
1 1 d
fMLaplaced (x; m, Σ) = g (z − 2 x − z 2 m)z − 2 e −z dz
Z0 ∞
1 1
FMLaplaced (x, m, Σ) = G (z − 2 x − z 2 m)e −z dz
0

PDF of Multivariate Laplace Distribution
The pdf can also be described as

> Σ−1 m λ
2e x x > Σ−1 x 2
fMLaplaced (x; m, Σ) =
(2π) |Σ|
d
2
1
2 2 + m> Σ−1 m
q
×Kλ (2 + m> Σ−1 m)(x > Σ−1 x)
2−d
where λ = 2 and Kλ (x) is the modified Bessel function of the
third kind
Z
1 x λ ∞ −λ−1 −t− x 2
Kλ (x) = t e 4t dt, x >0
2 2 0

Mean and Variance of Multivariate Laplace

Distribution
E[X ] = m
Cov[X ] = Σ + mm>

Multivariate Mixture Model
A multivariate mixture model comprises multivariate distributions,

e.g. the pdf of a multivariate Gaussian distribution can be written
as
Xn
wl − 12 (x−µl )> Σ−1 (x−µl )
f (x) = 1 e
l=1 |2πΣl |
2

Generalised Hyperbolic Distribution
The GH distribution has an exponential decaying speed
fGH (x; λ, α, β, δ, µ = 0) ∼ x λ−1 e −(α−β)x as x → ∞.

Distribution comparison Tail comparison ●
0.020
●
0.5
●
Laplace ●
Laplace ●
●
NIG ●
NIG ●
●
Cauchy ●
Cauchy ●
●
Gaussian ●
Gaussian ●
●
0.4
0.015
●
●
●
●
●
●
●● ●
0.3
●● ●
● ● ●
●
● ● ●
0.010
●
● ● ●
Y
Y
●
● ● ●
●
●
● ● ●
0.2
● ●
● ●
● ●
● ●
0.005
● ●
● ●
● ●
0.1
● ●
● ●
● ●
● ●
●
● ●
●
●
● ●
●
●
● ●
●
●●
● ●
●●
●
●● ●●
●
●
●●
● ●
●●
●
●
●●
●●
● ●
●●
●●
●
0.000
●●
●●
●●
●●
● ●
●●
●●
●●
●●
●
●●
●●
●●
●●
●● ●●
●●
●●
●●
●●
●
0.0
Figure: Graphical comparison of tail behavior. For all distributions means

−6 −4 −2 0 2 4 6 −5.0 −4.8 −4.6 −4.4 −4.2 −4.0
X X
equal 0 and variances equal 1. The NIG distribution (line) with λ = − 21

decays second fast in the tails and has the highest peak. The Cauchy
(dots) distribution has the lowest peak and the fattest tails.
MVAghadatail
Copulae vs Normal Distribution
1. The empirical marginal distributions are skewed and fat tailed.

2. Multivariate normal distribution does not consider the
possibility of extreme joint co-movement of asset returns.
The dependency structure of portfolio asset returns is different
from the Gaussian one.

Advantages
1. Copulae are useful tools to simulate asset return distributions

in a more realistic way.
2. Copulae allow to model the dependence structure
independently from the marginal distributions
I construct a multivariate distribution with different margins
I the dependence structure is given by the copula.

Dependency Structures
4
4
●
● ●
● ●
● ● ●
●
● ● ●
● ● ● ● ●● ● ●
●● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
●● ● ● ●
●● ● ● ● ● ●
● ● ● ●
2
2
● ● ● ● ● ● ●
●
● ● ●● ● ● ● ●● ●● ● ● ● ●●● ●
●●
●● ● ●● ● ● ● ● ● ● ●●
●● ●● ● ● ●● ●
●
● ● ●● ● ●●●
● ●● ●● ●
● ●● ● ● ● ●● ● ● ● ●
● ● ● ●●●● ●● ● ● ●●● ●● ●●●● ● ● ●●
● ●●
●
● ●●●●●● ●
● ●● ● ● ●● ● ●● ●● ●●
● ● ●●●●● ● ● ●●
● ● ●● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ●
●●● ● ●●
●
● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●
● ●●● ●●
● ● ●● ● ● ● ● ● ●
●●● ● ● ●● ●● ● ● ●●●●● ● ● ●●●● ● ● ●
●
● ● ● ●●●
●
●●●●●●● ●●
● ●● ●● ● ● ●
● ● ●● ● ●● ●●●
● ● ●● ● ● ●●
● ●●●●● ● ● ● ●
●●
●
● ● ●●●
● ● ● ●
●●
● ●● ●●● ● ●● ●
● ● ●●● ●
● ● ●
● ●●
● ●●●● ●● ● ●●●●●● ●●
●● ●● ●●●● ● ●●●
●● ●● ●
●●
●●● ●●●
●●● ●●
● ● ● ●
● ● ●
●●● ●●●
● ● ● ●● ● ● ● ●
●●
●●● ●
●●● ● ● ●● ●● ● ● ● ●●●●●● ●
●●●● ● ●●● ●
●●●●● ● ● ●
● ● ● ●●
● ●●● ●●● ●● ● ● ●
● ● ●
● ● ●●● ● ●●
● ●●● ● ●
●●●●● ●
●
●● ●
● ● ●● ●● ● ●● ● ● ● ●●● ●●
●
● ● ●● ●
● ● ●● ● ● ●●●● ● ●●●
●● ● ●●
● ● ●● ● ●● ●● ● ●
● ● ● ●
●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●●●
● ●●● ● ● ●● ● ●●
●● ●●
● ● ● ●● ●
● ● ●●● ● ●●●●●● ●●● ●●● ● ●●●● ● ●● ●●●● ●●●● ●● ●
● ● ● ● ●● ●●● ●● ●● ●● ●● ●
●● ● ●●●
●
●●● ● ● ●●●● ●●●●● ●● ●●● ●●
●
● ●● ● ● ● ● ● ●●●● ●●● ●●●●●
●●● ●● ●●● ●● ●● ● ● ● ●● ●●● ●● ● ●
●●
● ●
● ●● ● ●●●●●● ●●● ●● ●● ●
●
● ● ● ●● ●
● ●●
● ● ● ●● ●● ● ●●● ● ●● ● ●●●
● ●
● ●
● ●●
● ● ●●● ● ● ●●● ●
●●● ●
●
●
●
● ●
●●● ● ●
● ●●
●● ● ●
●
● ●● ● ● ● ● ●●●●● ●●● ●●●●● ●●
● ●● ● ●● ● ●● ● ●●●● ● ● ● ●
●●● ●●●● ●● ● ● ● ●● ● ● ● ● ●●
● ● ●
●
●● ●
●●● ●● ● ● ●●●
●●● ●● ●
● ●
●●●
●
●● ● ● ●● ●●● ●
●● ● ● ● ● ●● ● ●
●●●●●
●
●●●● ●
●
●●●
●●●●
●
●●
●●●
●●● ● ●● ● ● ●● ●
●
●●● ● ●● ●
●●●● ●●●● ●●● ●
●● ●●● ●●●
● ●●●● ● ● ●
●● ●●●● ● ●● ●●●
●● ●● ●● ● ●
●●●● ● ● ● ●●● ●● ● ● ●●●● ●● ● ●● ● ●●
●●● ● ●● ●●
●● ●●
● ●● ●●●● ●●●
●●
● ●●●●● ●●● ●●● ● ●●●●●●●●● ● ● ● ●
●
●●
●●● ●● ●●●●●●
●
●
●●●●
●●● ●
●●●
●● ● ● ● ●● ●●● ● ●●●
●● ●
●
●
●● ●● ●
●●●
● ●
●●●● ● ●
●●●●
● ●● ●●●
● ●●●● ●●
●●●●
● ● ●●
●● ● ●●● ● ●● ● ●● ● ● ● ●● ●●●
●●●●●
●
● ● ●●●●●
● ●●
●
●●● ● ● ●● ● ●● ●
● ● ● ●● ●● ● ● ●● ●●
● ● ●●●● ● ● ● ●
● ● ● ● ●● ● ●
●● ● ●
● ●●●
●●
●●●
● ● ●●
●●
●●●● ●
●●
●● ● ●●●
● ● ●
● ● ● ●● ● ●●●●
●● ●
●●● ● ●●●
● ●●
●●●
●
●●●● ● ●●
●● ● ● ●●● ●
●●●●●● ●
●●●
●
●
●●●
●
●● ● ●● ●
● ● ●
●
0
0
●●●● ●●●
0
● ● ● ●●● ● ●
●
●●●● ● ●● ●● ● ●● ●●● ●●● ● ● ● ● ●●● ● ●●● ●●● ●
● ● ●● ● ●●● ●● ●● ●●● ● ●● ● ●
●●● ● ●
●●
● ●●● ● ●
● ●●● ●●● ●● ●●●● ●●●●
● ● ● ●● ●
● ● ●● ● ●●
●● ●●●● ● ●● ● ● ●● ● ●● ●
●●
●
●● ●●●●●● ●● ●● ● ●●● ●
●● ● ●●● ●●● ●● ●●
● ●● ●●●● ●●●●●●
● ● ● ●● ●●●● ●● ● ●●
●●●● ● ● ● ● ●● ●●
●● ● ●●
●●●●
●●
●●●
● ● ● ●● ● ●● ● ●
● ●● ● ●
●●●● ●●● ●●● ● ●
●●●
●
● ● ● ● ● ● ●● ● ●
● ● ● ●
●●● ● ●● ●●●●●●
●● ● ●●●●●● ● ●
●● ●●
●●● ● ● ●
●
● ● ● ●●●●● ● ●●
● ● ● ● ●● ●● ● ● ● ●●● ● ●●●●●
● ●●
● ●● ●●●●●● ●
●●● ●
●●●
●●
●●● ●●
●● ●●●●●●●● ●● ● ●● ● ● ● ●● ●●● ●●
● ●●
●● ● ● ●●●
●●● ●●● ●●
●
●●●●●●
●●●●●●●
●● ●●●● ●●● ●●● ●
● ● ● ● ●●
●●● ●●● ●● ●●●●
● ● ●●●
●●● ● ●● ●●
● ● ●● ●●●●●●
●
● ● ●
●●● ●
● ●● ●
●● ●●●
● ●● ●
●●●● ●●●
● ●
●● ● ● ●
●● ● ●●● ● ●● ●●
●●●● ●●
●●
●●●●
●
● ●● ●● ● ●● ● ● ● ● ●●●● ●●●
●
●●●
● ● ● ●●●
●●●●● ● ●● ●● ●●●
● ●● ●●●● ●● ●
● ● ●●● ●●
●●
●
● ●● ● ●
●● ●
● ● ●● ●●● ● ●
●
●
●● ●●●●● ● ●● ● ● ●● ●●● ● ●●
● ●●
● ● ●●●● ●●
● ● ●
● ●●●
●● ●●
● ●●● ●●●●● ● ●●●● ● ●●●● ● ● ●●● ● ●●● ●●●
●● ●●● ● ●● ●● ● ●
●● ● ●
●●●●
● ●● ● ●
● ●●●
●● ● ●●●● ● ● ●●●● ●●
● ● ●●●● ●● ●
●● ● ● ● ●
● ● ●●
●● ●● ● ●● ● ● ●● ●● ●●
●●●●●●● ●
● ●●
●●●●
●
●
● ● ●● ● ● ● ●
● ● ●
● ●
● ● ●
●● ● ●● ● ●
●●●
●● ●●● ●●●
●
● ●●
●● ● ● ● ●●●
● ●●
●● ● ●●●
●●●● ● ● ● ● ● ●● ●●●● ●
●●●●
●● ● ● ●●● ●● ● ●● ● ●
● ●●● ●●●●●
●●
●
●●●● ● ●●●
● ● ● ● ●● ● ● ●
● ●● ●● ● ●● ● ●● ●● ● ●●●●● ● ●
● ● ●● ●●●●● ●● ● ●●● ●●● ●● ●● ●●●●● ● ●
●● ●● ●
● ●●
●● ●● ●
● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●
●
●●●● ● ●● ●●● ● ● ● ● ● ● ● ●●● ● ●● ● ●●● ●● ●● ●●● ● ●● ● ●●● ●●● ● ●● ● ●● ● ●
● ● ● ●
● ●●
● ●●●●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●●●● ● ● ●
●
● ● ● ● ●● ● ●●● ● ●● ●
● ● ● ● ● ●
●●●● ●● ● ● ● ● ● ● ● ● ●
●● ● ● ●●●● ● ● ●●
● ● ● ● ●● ● ● ●●
●●
● ●●● ● ● ● ●
●●● ● ● ● ● ● ●
● ●●● ● ●●
● ● ● ● ●● ●●
●●● ● ●● ● ●●●●
●
● ● ● ● ● ● ● ●
●
●● ● ●●
● ● ●● ●
●● ●
● ● ●●●
●● ● ●●● ● ●● ●● ● ●● ● ●
● ●● ● ● ● ● ● ● ● ●●●
●
●● ● ●
● ●● ● ●
●
● ●● ● ● ●
● ● ● ●
● ● ● ●●● ● ● ● ● ● ●●
● ●● ● ●● ● ● ● ● ●
● ●●● ●
−2
−2
−2
●● ●● ●●●● ●● ●● ●
● ● ● ●● ● ●● ● ● ● ● ●● ●
● ●
●
● ● ●● ● ●
●
● ● ●
●
● ● ● ●● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●●● ● ●
● ● ● ●● ●
● ● ● ● ●
● ●
● ● ●
● ● ●
● ● ●
●
−4
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
Figure: Scatter plots of bivariate samples with different dependency

structures and equal correlation coefficients.

Varying Dependency
4 4
3 3
2 2
1 1
Siemens
Siemens
0 0
−1 −1
−2 −2
−3 −3
−4 −4
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
Bayer Bayer
Figure: Standardized log returns of Bayer and Siemens

20000103-20020101 (left) and 20040101-20060102 (right).
MVAscalogret

Outline
1. Motivation X
2. Copulae
3. Parameter Estimation
4. Sampling from Copulae
5. Tail Dependence
6. Value-at-Risk with Copulae
7. Application

Copulae
A copula is a multivariate distribution function defined on the unit

cube [0, 1]d , with uniformly distributed margins.
P(X1 ≤ x1 , . . . , Xd ≤ xd ) = C {P(X1 ≤ x1 ), . . . , P(Xd ≤ xd )}

= C {F1 (x1 ), . . . , Fd (xd )}

a) Before 1986, the literature was sparse and mostly mathematical. The concept of copula
can be traced back at least to the work of Wassily Hoeffding and Maurice Fréchet, though
theMultivariate Distributions
term itself was coined by Sklar (1959). Many contributions were related to the study 4-109
of probabilistic metric spaces, as described in the book by Schweizer & Sklar (1983).
b) Beginning in 1986, one can see a slow, systematic rise in the number of publications.
Applications
Growth was largely due to the emergence of the concept of copula in statistics and to
three conferences devoted to the subject: Rome (1990), Seattle (1993), and Prague (1996).
c) From 1.1999medicine
on, the number of contributions grew considerably. The books by Joe (1997)
and Nelsen (1999) were influential in disseminating copula theory; the book by Drouet-
Mari 2. hydrology
& Kotz (2001), which focuses finance-related
on correlation and dependence,
documents now account isfor also noteworthy.
over half of the literature on the subject. We will later
Actuarial and financial applications were
discuss the fuelled by Frees
nature of these & Valdez (1998) and
contributions.
3. finance
Embrechts (portfolio
et al. (1999), who illustratedselection,
the potential for time
copulaseries,
modelingrisk management)
in these fields.
1% 2%
1% 1% 2%
225
6%
200
175 8%
150 41%
125
100 10%
75
50
25
0 1972 1974 1976197819801982198419861988 19901992199419961998200020022004 28%

Unclassified Economics Operations research Engineering
Natural sciences Actuarial science Mathematics Biostatistics

Figure: Number of documents on copula theory, 1971 - 2005. Breakdown Statistics Finance
Figure 1. Number of documents on copula theory, 1971–2005
by discipline of the 871 documents in the database (41% Finance, 28%
Figure 2. Breakdown by discipline of the 871 documents in the database
3. Breakdown by field of study
Statistics, 10% Biostatistics, 8% Mathematics, 6% Insurance)
The level of activity in each discipline is also reflected by Table 1, which lists the peer-review
What is the part of finance to the spectacular growth
journals of copula
that carried methodology
the largest number of in the past
articles few with copulas. As of June 2006,
concerned
years? To investigate this issue, we subjectively grouped
statistics the to
continued 871 documents
lead the rooster.inThis
ourisdatabase into given that copulas have a long
not surprising,
history in this area. Interestingly, Risk Magazine
9 mutually exclusive categories: mathematics; statistics; biostatistics; operations research; natural and Quantitative Finance make the list, even
Applied Multivariate
sciences; engineering; Statistical
actuarial science; though theAnalysis
economics; earliest papers on the topic appeared there in 2001. A fair proportion of copula-related
and finance. We achieved this classification
articles in Insurance: Mathematics and Economics also pertain to finance.
by carefully examining the contents of each document. About 1% of them did not match any of
F-volume
Let U1 and U2 be two sets in R = R ∪ {+∞} ∪ {−∞} and

consider the function F : U1 × U2 −→ R.
The F -volume of a rectangle B = [x1 , x2 ] × [y1 , y2 ] ⊂ U1 × U2 is
defined as:
VF (B) = F (x2 , y2 ) − F (x1 , y2 ) − F (x2 , y1 ) + F (x1 , y1 ) (2)

2-increasing Function
F is said to be a 2-increasing function if for every

B = [x1 , x2 ] × [y1 , y2 ] ⊂ U1 × U2 ,
VF (B) ≥ 0 (3)
Remark Note, that “to be 2-increasing function” neither implies

nor is implied by “to be increasing in each argument”.

2-increasing Function
Lemma
Let U1 and U2 be non-empty sets in R and let F : U1 × U2 −→ R
be a two-increasing function. Let x1 , x2 be in U1 with x1 ≤ x2 , and
y1 , y2 be in U2 with y1 ≤ y2 . Then the function
t 7→ F (t, y2 ) − F (t, y1 ) is non-decreasing on U1 and the function
t 7→ F (x2 , t) − F (x1 , t) is non-decreasing on U2 .

Grounded Function
If U1 and U2 have a smallest element min U1 and min U2

respectively, then we say, that a function F : U1 × U2 −→ R is
grounded if :
for all x ∈ U1 : F (x, min U2 ) = 0 and (4)

for all y ∈ U2 : F (min U1 , y ) = 0 (5)

Distribution Function
2
A distribution function is a function from R 7→ [0, 1] which:
is grounded
is 2-increasing
satisfies F (∞, ∞) = 1.

Margins
If U1 and U2 have a greatest element max U1 and max U2

respectively, then we say, that a function F : U1 × U2 −→ R has
margins and that the margins of F are given by:
F (x) = F (x, max U2 ) for all x ∈ U1 (6)

F (y ) = F (max U1 , y ) for all y ∈ U2 (7)

Bivariate Copulae
A 2-dimensional copula is a function C : [0, 1]2 → [0, 1] with the

following properties:
1. For every u ∈ [0, 1], C (0, u) = C (u, 0) = 0 (grounded).
2. For every u ∈ [0, 1], C (u, 1) = u and C (1, u) = u.
3. For every (u1 , u2 ), (v1 , v2 ) ∈ [0, 1] × [0, 1] with u1 ≤ v1 and
u2 ≤ v2 : C (v1 , v2 ) − C (v1 , u2 ) − C (u1 , v2 ) + C (u1 , u2 ) ≥ 0
(2-increasing).

Fréchet-Hoeffding Bounds
1. every copula C satisfies
W (u1 , u2 ) ≤ C (u1 , u2 ) ≤ M(u1 , u2 )
2. upper and lower bounds are copulae
M(u1 , u2 ) = min(u1 , u2 )
W (u1 , u2 ) = max(u1 + u2 − 1, 0)

Fréchet Copulae
Figure: M(u, v ) = min(u, v ), W (u, v ) = max(u + v − 1, 0)

and Π(u, v ) = uv SFEfrechet
Fréchet, Maurice R. on BBI:

Sklar’s Theorem in Two Dimensions
Let F be a two-dimensional distribution function with marginal

distribution functions FX1 and FX2 . Then a copula C exists such
2
that for all x1 , x2 ∈ R :
F (x1 , x2 ) = C {FX1 (x1 ) , FX2 (x2 )} (8)
Moreover, if FX1 and FX2 are continuous, then C is unique.

Otherwise C is uniquely determined on the Cartesian product
Im(FX1 ) × Im(FX2 ). Conversely, if C is a copula and FX1 and FX2
are distribution functions, then F defined by (24) is a
two-dimensional distribution function with marginals FX1 and FX2 .
Gauss Copula
C (u1 , u2 ) = Φρ {Φ−1 (u1 ), Φ−1 (u2 )}
Φ−1 −1
Z (u1 ) Φ Z (u2 )
x 2 − 2ρxy + y 2

1
= p exp − 2
dx dy
2π 1 − ρ2 2(1 − ρ )
−∞ −∞
Figure: Gauss copula density, ρ = 0.4. MSRpdf cop Gauss
Gauss, Carl F. on BBI:

t-Student Copula
C (u1 , u2 ) = tρ,ν {tν−1 (u1 ), tν−1 (u2 )}
−1 −1
tν (u1 ) tν (u2 ) −(ν+2)/2
x 2 − 2ρxy + y 2
Z Z
1
= exp 1 + dx dy
ν(1 − ρ2 )
p
2π 1 − ρ2
−∞ −∞
Figure: t-Student copula density, ν = 3, ρ = 0.4.

Gosset, Emil J. on BBI: MSRpdf cop tStudent

Archimedean Copulae
Archimedean copula:
C (u, v ) = ψ [−1] {ψ(u) + ψ(v )}
for a continuous,
( decreasing and convex ψ, ψ(1) = 0.
−1
ψ (t), 0 ≤ t ≤ ψ(0),
ψ [−1] (t) =
0, ψ(0) < t ≤ ∞.
The function ψ is a generator of the Archimedean copula.
For ψ(0) = ∞: ψ [−1] = ψ −1 and the ψ is called a strict generator.

Gumbel Copula
h n o1 i
C (u, v ) = exp − (− log u)θ + (− log v )θ
θ
Figure: Gumbel copula density, parameter θ = 2.

MSRpdf cop Gumbel
E. Gumbel on BBI:

Clayton Copula
n 1
o
C (u, v ) = max (u −θ + v −θ − 1)− θ , 0
Figure: Clayton copula density, θ = 2. MSRpdf cop Clayton

Frank Copula
(e −θu − 1)(e −θv − 1)

1
C (u, v ) = − log 1 +
θ e −θ − 1
Figure: Frank copula density, θ = 2. MSRpdf cop Frank

Clayton Gumbel
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
Y
Y
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
X X
Figure: Monte Carlo sample of 10.000 realizations of pseudo random

variable with uniform marginals in [0, 1] and dependence structure given
by Clayton (left) and Gumbel (right) copula with θ = 3.
MVAgumbelclayton

Transformations of Margins
If (X1 , X2 ) have copula C and set g1 , g2 two continuous increasing

functions, then {g1 (X1 ) , g2 (X2 )} have the copula C, too.

Product Copula
Independence implies that the product of the cdf’s FX1 and FX2
equals the joint distribution function F , i.e.:
F (x1 , x2 ) = FX1 (x1 )FX2 (x2 ) (9)
Thus, we obtain the independence or product copula

C = Π(u, v ) = uv .

Product Copula
Let X1 and X2 be random variables with continuous distribution

functions F1 and F2 and joint distribution function H.
Then X1 and X2 are independent if and only if CX1 X2 = Π.
According to Sklar’s Theorem, there exists a unique copula C with
P(X1 ≤ x1 , X2 ≤ x2 ) = H(x1 , x2 )
= C {F1 (x1 ), F2 (x2 )}
= F1 (x1 ) · F2 (x2 )

Partial Derivatives
Let C (u, v ) be a copula. For any v ∈ I , the partial derivative
∂C (u,v )
∂v exists for almost all u ∈ I . For such u and v one has:
∂C (u, v )
∈I (10)
∂v
∂C (u,v )
The analogous statement is true for the partial derivative ∂u :
∂C (u, v )
∈I (11)
∂u
Moreover, the functions
def
u 7→ Cv (u) = ∂C (u, v )/∂v and
def
v 7→ Cu (v ) = ∂C (u, v )/∂u
are defined and non-increasing almost everywhere on I .
Copulae in d-Dimensions
Let U1 , U2 , . . . , Ud be non-empty sets in R and consider the

function F : U1 × U2 × . . . × Ud −→ R. For a = (a1 , a2 , . . . , ad )
and b = (b1 , b2 , . . . , bd ) with a ≤ b (i.e. ak ≤ bk for all k) let
B = [a, b] = [a1 , b1 ] × [a2 , b2 ] × . . . × [ad , bd ] be the d-box with
vertices c = (c1 , c2 , . . . , cd ). It is obvious, that each ck is either
equal to ak or to bk .

F -volume
The F -volume of a d-box

B = [a, b] = [a1 , b1 ] × [a2 , b2 ] × . . . × [ad , bd ] ⊂ U1 × U2 × . . . × Ud
is defined as follows:
d
X
VF (B) = sgn(ck )F (ck ) (12)
k=1
where sgn(ck ) = 1, if ck = ak for even k and sgn(ck ) = −1, if

ck = ak for odd k.

d-increasing Function
F is said to be a d-increasing function if for all d-boxes B with

vertices in U1 × U2 × . . . × Ud holds:
VF (B) ≥ 0. (13)

Grounded Function
If U1 , U2 , . . . , Ud have a smallest element

min U1 , min U2 , . . . , min Ud respectively, then we say, that
a function F : U1 × U2 × . . . × Ud −→ R is grounded if :
F (x) = 0 for all x ∈ U1 × U2 × . . . × Ud (14)
such that xk = min Uk for at least one k.

Multivariate Copula
A d-dimensional copula is a function C : [0, 1]d → [0, 1]:

1. C (u1 , . . . , ui−1 , 0, ui+1 , . . . , ud ) = 0 (at least one ui is 0);
2. u ∈ [0, 1]d , C (1, . . . , 1, ui , 1, . . . , 1) = ui (all coordinates
except ui is 1)
3. For each u < v ∈ [0, 1]d (ui < vi )
X
VC [u, v ] = sgn(a)C (a) ≥ 0
a
where a is taken over all vertices of [u, v ]. sgn(a) = 1 if

ak = uk for an even number of k 0 s and sgn(a) = −1 if
ak = uk for an odd number of k 0 s (d-increasing).

Sklar’s Theorem
For a distribution function F with marginals FX1 . . . , FXd , there
exists a copula C : [0, 1]d → [0, 1], such that
F (x1 , . . . , xd ) = C {FX1 (x1 ), . . . , FXd (xd )} (15)
for all xi ∈ R, i = 1, . . . , d. If FX1 , . . . , FXd are cts, then C is
unique. If C is a copula and FX1 , . . . , FXd are cdfs, then the
function F defined in (15) is a joint cdf with marginals
FX1 , . . . , FXd .

a copula C and marginal distributions can be ”coupled”

together into a distribution function:
FX (x1 , . . . , xd ) = C {FX1 (x1 ), . . . , FXd (xd )}
a (unique) copula is obtained from ”decoupling” every

(continuous) multivariate distribution function from its
marginal distributions:
C (u1 , . . . , ud ) = FX {FX−1
1
(u1 ), . . . , FX−1
d
(ud )}
uj = FXj (xj ), j = 1, . . . , d

if C is absolute continuous, there exists a copula density
∂ d C (u1 , . . . , ud )
c(u1 , . . . , ud ) =
∂u1 . . .∂ud
the joint density fX is
d
Y
fX (x1 , . . . , xd ) = c{FX1 (x1 ), . . . , FXd (xd )} fj (xj )
j=1

Fréchet-Hoeffding Bounds, Product Copula

1. Every copula C satisfies
W d (u1 , . . . , ud ) ≤ C (u1 , . . . , ud ) ≤ M d (u1 , . . . , ud )
2. Upper and lower bounds
M d (u1 , . . . , ud ) = min(u1 , . . . , ud )
d
!
X
W d (u1 , . . . , ud ) = max ui − d + 1, 0
i=1
Qd
3. Product copula Πd (u1 , . . . , ud ) = j=1 uj
4. The functions Md
and Πd
are d-copulae for all d ≥ 2, the
d
function W is not a d-copula for any d > 2.

Multivariate Elliptical Copulae
Gauss
R Φ−1 (u1 ) R Φ−1 (u ) d 1
−∞ . . . −∞ d (2π)− 2 |R|− 2 exp − 12 r > R −1 r dr1 . . . drd ,
where r = (r1 , . . . , rd )>
t-Student
R tν−1 (u1 ) R tν−1 (ud ) − v +d
− d2 − 12 r > R −1 r 2
−∞ . . . −∞ (2π) |R| 1+ ν dr1 . . . drd
where r = (r1 , . . . , rd )>

Multivariate Archimedean Copulae
Gumbel
n o1
θ θ θ
C (u1 , . . . , ud ) = exp − (− log u1 ) + . . . + (− log ud )
Cook-Johnson
 − 1
d θ
X
−θ
C (u1 , . . . , ud ) =  uj − d + 1
j=1
Frank

1 (e −θu1 − 1) . . . (e −θud − 1)
C (u1 , . . . , ud ) = − log 1 +
θ (e −θ − 1)d−1
Dimensionality
In d-dimension
d(d−1)
1. Elliptical Copulae: correlation matrix with 2 parameters
2. Archimedean Copulae: 1 parameter

Conclusions
Pluses of copulae
flexible and wide range of dependence
easy to simulate, estimate, implement
explicit form of densities of copulae
modelling of fat tails, assymetries
Minuses of copulae
Elliptical: correlation matrix, symmetry
Archimedean: too restrictive, single parameter, exchangable
selection of copula

Theory of the Multinormal 5-1
Theory of the Multinormal
Elementary Properties of the Multinormal

The pdf of X ∼ Np (µ, Σ) is given by:

−1/2 1
f (x) = |2πΣ| exp − (x − µ)> Σ−1 (x − µ)
2
The expectation and variance are respectively given by:
E(X ) = µ, Var(X ) = Σ

Linear transformations
Linear transformations turn normal random variables into normal

random variables.
X ∼ Np (µ, Σ), A(p × p), c ∈ Rp
Y = AX + c ∼ Np (Aµ + c, AΣA> ).

Theorem
X1
X = ∼ Np (µ, Σ), X1 ∈ Rr X2 ∈ Rp−r
X2
X2.1 = X2 − Σ21 Σ−1
11 X1 with
!
Σ11 Σ12
Σ = .
Σ21 Σ22
⇒ X1 ∼ Nr (µ1 , Σ11 ),
independent
⇒ X2.1 ∼ Np−r (µ2.1 , Σ22.1 )
µ2.1 = µ2 − Σ21 Σ−1

11 µ1 ,
Σ22.1 = Σ22 − Σ21 Σ−1
11 Σ12 .
Corollary
X1
Let X = ∼ Np (µ, Σ).
X2
Σ12 = 0 if and only if X1 is independent of X2 .
The independence of two linear transforms of a multinormal X can

be shown via the following corollary.
Corollary
If X ∼ Np (µ, Σ), A and B matrices, then AX and BX are
independent if and only if AΣB > = 0.

Theorem
If X ∼ Np (µ, Σ) and A(q × p), c ∈ Rq , q ≤ p, then Y = AX + c
is a q-variate Normal, i.e.,
Y ∼ Nq (Aµ + c, AΣA> ).

Theorem
The conditional distribution of X2 given X1 = x1 is normal with
mean µ2 + Σ21 Σ−1
11 (x1 − µ1 ) and covariance Σ22.1 , i.e.,
(X2 | X1 = x1 ) ∼ Np−r (µ2 + Σ21 Σ−1

11 (x1 − µ1 ), Σ22.1 ).
The conditional mean E(X2 | X1 = x1 ) is a LINEAR function of

X1 !

Example
0 1 −0.8
p = 2, r = 1, µ = ,Σ=
0 −0.8 2
Σ11 = 1, Σ21 = −0.8, Σ22.1 = 2 − (0.8)2 = 1.36.
2
x
⇒ fX1 (x1 ) = √12π exp − 21
n o
1 (x2 +0.8x1 )2
⇒ f (x2 | x1 ) = √ exp − 2·(1.36) .
2π(1.36)

Conditional Normal Densities f(X2|X1)
●
●●
●
●●
●
●
● ●
● ●
●
●
● ●
●● ●
●● ●● ●
●● ●
●
●●●● ●● ●
● ●
● ●
●● ● ●
●
●
●●● ● ●● ●● ●
0.000.050.100.150.200.250.300.35
● ●
● ●
●● ●● ●● ●
●
●
●●● ● ●
● ● ●● ●
● ●●
● ● ●
● ● ●● ● ● ●
●●
●●●
● ● ●
● ● ●● ● ● ●
●
●● ●● ● ● ●● ●
● ●● ● ●
●
●●●
●
● ● ●
● ● ●● ●
● ● ● ● ●● ●
● ● ●● ● ● ●
●● ● ● ● ●● ●
●
● ●
● ●● ● ● ●● ● ● ● ●
● ● ●● ● ●● ●
●
● ●● ● ● ●● ● ● ● ● ● ●
●
● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ●
●
● ● ●● ● ● ●
● ●
● ● ●● ● ● ●● ●● ● ●
●
● ● ● ●
● ● ●● ● ● ● ●
● ●
● ● ● ●● ●● ● ●● ● ● ●
● ●● ● ●● ●● ● ●
● ● ●● ●● ● ●● ●● ● ● ●
● ● ●● ● ●● ●● ● ● ● ● ●
● ● ● ●● ●● ● ● ● ● ● ●
●
● ● ● ●● ●● ● ● ● ●
● ● ● ●● ● ● ● ● ●
● ●●
● ●
● ● ● ● ● ● ●● ●● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●
●●
● ● ● ● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●● ●● ● ●● ● ●
● ● ● ●● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ●● ● ● ● ● ● ● ● ●
● ● ● ● ● ●● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●● ● ● ● ● ● ●
● ●● ● ● ●
● ● ● ● ●● ● ●● ● ● ● ●
● ●
● ● ● ● ● ● ●● ●
● ● ● ● ● ● ●
● ● ● ● ● ●●●●
●
● ●
● ● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ●
●
● ● ● ● ● ● ●
● ● ● ●
● ● ●● ●●
● ● ● ● ● ● ● ● ●● ●
● ●
● ● ●
● ●
●
● ● ● ● ● ● ●●●●●● ● ● ● ●
● ●
● ●●
●
● ● ●
● ●
● ●
● ● ●
● ●
●
● ●
●
●●
●●
●
●●
●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●
●●
●
●●
● ●
● ● ●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●
●●
●●
●●
●●
●
●●
●● 5
●
●●
●●
●
●
● ●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●● ● ●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●●
●
4
●
●●
●●
●
●●
●●
●●
●●
●●
●
●●
●
● ●
●
●
●
●●
●
● ●
●
●
● ●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●●
●
●
●
●●
●
●
●●
●
●●
●●
●
●●
●
●
●●
●
●●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●●
●
●●
● 3
●
●●
●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●●
●
●●
●●
●●
●●
●●
●
●●
●●
●
2
●
●●
●●
●
●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●●
●●
●
●
●●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●●
●●
●●
●
●●
●
●●
●●
●●
●
●●
●●
●
1
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●●
●
●●
●●
●
●●
●●
●●
●
●●
●●
●●
●
●●
●●
●●
●●
●
●●
●●
●
●●
●●
●●
●●
●●
●
●●
●●
●
●● ●
●
●●
●●
●●
●●
●
●●
●●
●●
●●
● 0
−10 −5 0 5
Shifts in the conditional density. MVAcondnorm

Theorem
If X1 ∼ Nr (µ1 , Σ11 ) and (X2 |X1 = x1 ) ∼ Np−r (Ax1 + b, Ω) where
Ω does not depend on x1 , then

X1
X = ∼ Np (µ, Σ),
X2
where
µ1
µ=
Aµ1 + b
and !
Σ11 Σ11 A>
Σ= .
AΣ11 Ω + AΣ11 A>

Conditional Approximations
Best approximation of X2 ∈ Rp−r by X1 ∈ Rr :
X2 = E(X2 |X1 ) + U = µ2 + Σ21 Σ−1

11 (X1 − µ1 )
= β0 + BX1 + U
with B = Σ21 Σ−1

11 , β0 = µ2 − Bµ1 and U ∼ Np−r (0, Σ22.1 ).

Consider the case where X2 ∈ R, i.e., r = p − 1.

Now B is (1 × r )-row vector β > such that:
X2 = β0 + β > X1 + U.
This means that the best MSE approximation of X2 by a function

of X1 is a hyperplane.

!
Σ11 σ12
Σ =
σ21 σ22
with σ12 ∈ Rr and σ22 ∈ R.
Marginal variance of X2 :
σ22 = β > Σ11 β + σ22.1 = σ21 Σ−1

11 σ12 + σ22.1 .
Squared multiple correlation between X2 and the r variables X1 :
σ21 Σ−1
11 σ12
ρ22.1...r = .
σ22

Example: classic blue pullover data
Suppose that X1 (sales), X2 (price), X3 (advertisement) and X4

(sales assistants) are normally distributed with
   
172.7 1037.21
   
 104.6   −80.02 219.84 
µ=  
 104.0  and Σ =  1430.70
.

   92.10 2624.00 
93.8 271.44 −91.58 210.30 177.36

The conditional distribution of X1 given (X2 X3 X4 ) is univariate

normal with mean
 
X2 − µ2
−1  
µ1 + σ12 Σ22  X3 − µ3  = 65.7 − 0.2X2 + 0.5X3 + 0.8X4
X4 − µ4
and variance
σ11.2 = σ11 − σ12 Σ−1

22 σ21 = 96.761
σ12 Σ−1
22 σ21
The multiple correlation is ρ21.234 = σ11 = 0.907.

The correlation matrix between the 4 variables is given by
 
1
 
 −0.168 1 
P= .

 0.867 0.121 1 
0.633 −0.464 0.308 1
The conditional distribution of (X1 , X2 ) given (X3 , X4 ) is bivariate
normal with mean:
! !−1 !
µ1 σ13 σ14 σ33 σ34 X3 − µ3
+
µ2 σ23 σ24 σ43 σ44 X4 − µ4
!
32.516 + 0.467X3 + 0.977X4
=
153.644 + 0.085X3 − 0.617X4

and covariance matrix:

! ! !−1 !
σ11 σ12 σ13 σ14 σ33 σ34 σ31 σ32
−
σ21 σ22 σ23 σ24 σ43 σ44 σ41 σ42
!
104.006
= .
−33.574 155.592
This covariance matrix allows to compute the partial correlation
between X1 and X2 for a fixed level of X3 and X4 :
−33.574
ρX1 X2 |X3 X4 = √ = −0.264.
104.006 · 155.592

Mahalanobis Transform
If X ∼ Np (µ, Σ) then the Mahalanobis transform is
Y = Σ−1/2 (X − µ) ∼ Np (0, Ip )
and it holds
Y > Y = (X − µ)> Σ−1 (X − µ) ∼ χ2p .
Y is random vector and Y > Y is scalar.

Y > Y can be used for testing (assuming that Σ is known).
Normally, we do not know Σ. The tests in this situation can
be carried out using Wishart and Hotelling distributions.

Summary: Elementary Properties
If X ∼ Np (µ, Σ) then a linear transformation

AX + c, A(q × p), c ∈ Rq has distribution
Nq (Aµ + c, AΣA> ).
Two linear transformations AX and BX of X ∼ Np (µ, Σ) are
independent if and only if AΣB > = 0.
If X1 and X2 are partitions of X ∼ Np (µ, Σ) then the
conditional distribution of X2 given X1 = x1 is normal again.

Summary: Elementary Properties
In the multivariate normal case, X1 is independent of X2 if

and only if Σ12 = 0.
conditional expectation of (X2 |X1 ) is a linear function if
The
X1
X2 ∼ Np (µ, Σ).
The multiple correlation coefficient is defined as
σ Σ−1 σ12
ρ22.1...r = 21 σ11
22
.
The multiple correlation coefficient is the percentage of the
variance of X2 explained by the linear approximation
β0 + β > X1 .

Wishart Distribution
X ∼ Np (µ, Σ), µ = 0
M(p × p) = X > X ∼ Wp (Σ, n)

Example (Wishart is generalization of χ2 ):
p = 1, X ∼ N1 (0, σ 2 )
 
x1
 .  P
n
X =  ..  M = X >X = xi2 ∼ σ 2 χ2n = W1 (σ 2 , n)
 
i=1
xn

Linear Transformation of the Data Matrix
Theorem
M ∼ Wp (Σ, n), B(p × q)
⇒ B > MB ∼ Wq (B > ΣB, n)

Wishart and χ2p - Distribution
Theorem
M ∼ Wp (Σ, n) , a ∈ Rp , a> Σa 6= 0
a> Ma
⇒ ∼ χ2n
a> Σa

Theorem (Cochran)
X (n × p) data matrix from a Np (0, Σ) distribution
nS = X > HX ∼ Wp (Σ, n − 1)
S is the sample covariance matrix
x̄ and S are independent

Summary: Wishart Distribution
The Wishart distribution is a generalization of the

χ2 -distribution. In particular W1 (σ 2 , n) = σ 2 χ2n .
The empirical covariance matrix S has a n1 Wp (Σ, n − 1)
distribution.
In the normal case, x̄ and S are independent.
a> Ma
For M ∼ Wp (Σ, m), a> Σa
∼ χ2m .

Hotelling’s T 2 -Distribution
Assume that random vector Y ∼ Np (0, I) is independent of

random matrix M ∼ Wp (I, n).
n Y > M−1 Y ∼ T 2 (p, n)
Hotelling’s T 2 is a generalization of Student’s t-distribution

The critical values of Hotelling’s T 2 can be calculated using
F -distribution:
np
T 2 (p, n) = Fp,n−p+1
n−p+1

Summary: Hotelling’s T 2 -Distribution
Hotelling’s T 2 -distribution is a generalization of the

t-distribution. In particular T (1, n) = tn .
(n − 1)(x − µ)> S −1 (x − µ) has a T 2 (p, n − 1) distribution.
The relation between Hotelling’s T 2 − and Fisher’s
np
F -distribution is given by T 2 (p, n) = n−p+1 Fp,n−p+1 .

Theory of Estimation 6-1
Theory of Estimation
In parametric statistics, θ is a k-variate vector θ ∈ Rk characterizing

the unknown properties of the population pdf f (x; θ)
The aim will be to estimate θ from the sample X through

estimators θb which are functions of the sample: θb = θ(X
b ).
We must derive the sampling distribution of θb to analyze its

properties (is it related to the unknown quantity θ it is supposed to
estimate?).
We will utilise the maximum likelihood theory.

The Likelihood Function
X ∼ f (x; θ) pdf of an i.i.d. sample {xi }ni=1 with parameter θ

Likelihood function
n
Y
L(X ; θ) = f (xi ; θ)
i=1
MLE
θb = arg max L(X ; θ)
θ
log-likelihood
`(X ; θ) = log L(X ; θ)

Example
Sample {xi }ni=1 from Np (µ, I), i.e. from the pdf

−p/2 1 >
f (x; θ) = (2π) exp − (x − θ) (x − θ)
2
where θ = µ ∈ Rp is the mean vector parameter.

The log-likelihood is
n
X n
1X
`(X ; θ) = log{f (xi ; θ)} = log (2π)−np/2 − (xi −θ)> (xi −θ).
2
i=1 i=1
The term (xi − θ)> (xi − θ) equals
(xi − x̄)> (xi − x̄) + (x̄ − θ)> (x̄ − θ) + 2(x̄ − θ)> (xi − x̄).
Example
If we sum up this term over i = 1, . . . , n we see that
n
X n
X
(xi − θ)> (xi − θ) = (xi − x̄)> (xi − x̄) + n(x̄ − θ)> (x̄ − θ).
i=1 i=1
Hence
n
1X n
`(X ; θ) = log(2π)−np/2 − (xi − x̄)> (xi − x̄)− (x̄ −θ)> (x̄ −θ).
2 2
i=1
Only the last term depends on θ and is obviously maximized for
θb = µ
b = x̄.
Thus x̄ is the MLE.

Example (MLE’s from a Normal Distribution)

{xi }ni=1 is a sample from a normal distribution Np (µ, Σ)
Due to the symmetry of Σ, the unknown parameter θ is in fact
{p + 12 p(p + 1)}-dimensional.
Then
( n
)
−n/2 1X > −1
L(X ; θ) = |2πΣ| exp − (xi − µ) Σ (xi − µ)
2
i=1
and
n
n 1X
`(X ; θ) = − log |2πΣ| − (xi − µ)> Σ−1 (xi − µ).
2 2
i=1

Example (MLE’s from a Normal Distribution - cont’d)

The term (xi − µ)> Σ−1 (xi − µ) equals
(xi − x̄)> Σ−1 (xi − x̄) + (x̄ − µ)> Σ−1 (x̄ − µ) + 2(x̄ − µ)> Σ−1 (xi − x̄).
If we sum up this term over i = 1, . . . , n we see that

n
X
(xi − µ)> Σ−1 (xi − µ)
i=1
n
X
= (xi − x̄)> Σ−1 (xi − x̄) + n(x̄ − µ)> Σ−1 (x̄ − µ).
i=1


Note that
n o
(xi − x̄)> Σ−1 (xi − x̄) = tr (xi − x̄)> Σ−1 (xi − x̄)
n o
= tr Σ−1 (xi − x̄)(xi − x̄)> .
We sum this up over the index i:

X n
(xi − µ)> Σ−1 (xi − µ)
n i=1
X
−1
= tr{Σ (xi − x̄)(xi − x̄)> } + n(x̄ − µ)> Σ−1 (x̄ − µ)
i=1
−1
= tr{Σ nS} + n(x̄ − µ)> Σ−1 (x̄ − µ).


Thus the log-likelihood function for Np (µ, Σ) is
n n n
`(X ; θ) = − log |2πΣ| − tr{Σ−1 S} − (x̄ − µ)> Σ−1 (x̄ − µ)
2 2 2
We can easily see that the third term would be maximized by
µ = x̄.
The MLE’s are given by
b = x̄,
µ b = S.
Σ
n
Note that the unbiased covariance estimator Su = n−1 S is not the
MLE!

Example (Linear Regression Model)

Linear regression model yi = β > xi + εi ; i = 1, . . . , n, with εi i.i.d.
N(0, σ 2 ) and xi ∈ Rp .
Here θ = (β > , σ) is a (p + 1)-dimensional parameter vector.
Denote    
y1 x1>
 .   . 
y = . 
 . , X =  . .
 . 
yn xn>
Then
n
Y 1 1 > 2
L(y ; θ) = √ exp − 2 (yi − β xi )
2πσ 2σ
i=1
and

Example (Linear Regression Model - cont’d)
n
1 1 X
`(y ; θ) = log − 2 (yi − β > xi )2
(2π)n/2 σ n 2σ
i=1
n 1
= − log(2π) − n log σ − (y − X β)> (y − X β)
2 2σ 2
n 1
= − log(2π) − n log σ − (y > y + β > X > X β − 2β > X > y )
2 2σ 2
Differentiating w.r.t. the parameters yields
∂ 1
` = − 2 (2X > X β − 2X > y )
∂β 2σ
∂ n 1
` = − + 3 (y − X β)> (y − X β) .
∂σ σ σ

∂
∂β ` is the vector of the derivatives w.r.t. all components of β (the
gradient).
Since the first equation is only dependent on β, we start with
b
deriving β.
X > X βb = X > y =⇒ βb = (X > X )−1 X > y
Now we plug-in βb into the second equation which gives

n 1 b > (y − X β)
b =⇒ σ 1 b 2,
= 3 (y − X β) b2 = ||y − X β||
b
σ b
σ n
|| • ||2 denoting the Euclidean vector norm.


We see that the MLE βb is identical with the least squares
estimator.
The variance estimator
n
1X
2
b =
σ (yi − βb> xi )2
n
i=1
is nothing else than the residual sum of squares (RSS) generalized

to the case of multivariate xi .


Note that in a fixed design situation where the xi are considered as
being fixed, we have
E(y ) = X β and Var(y ) = σ 2 In .
Then, using the properties of moments, we have
b = (X > X )−1 X > E(y ) = β,

E(β)
b = σ 2 (X > X )−1 .
Var(β)

Summary: Likelihood Function
If {xi }ni=1 is an i.i.d. sample from a distribution with pdf

Q
f (x; θ) then L(X ; θ) = ni=1 f (xi ; θ) is the likelihood function.
The maximum likelihood estimator (MLE) is that value of θ
which maximizes L(X ; θ). Equivalently one can maximize the
log-likelihood `(X ; θ).

Summary: Likelihood Function
b = x̄
The MLE’s of µ, Σ from a Np (µ, Σ) distribution are µ
b
and Σ = S. Note that the MLE for Σ is not unbiased.
The MLE’s in a linear model y = X β + ε, ε ∼ Nn (0, σ 2 I) are
given by the least squares estimator βb = (X > X )−1 X > y and
b 2 . E(β)
b2 = n1 ||y − X β||
σ b = β and Var(β) b = σ 2 (X > X )−1 .

Cramer-Rao Lower bound

One typical property we want for an estimator is unbiasedness:
b = θ. (x̄ is an unbiased estimator of µ and S is a biased
E(θ)
estimator of Σ in finite sample).
We look for an unbiased estimator with the smallest possible
variance.
The Cramer-Rao lower bound will achieve this and it provides
the asymptotic optimality property of maximum likelihood
estimators.
The Cramer-Rao theorem involves the score function and its
properties, which are first derived.

Score Function and Fisher Information
The score function is

∂
s(X ; θ) = `(X ; θ)
∂θ
The covariance matrix Fn = Var{s(X ; θ)} is called the Fisher
information matrix.

Example (Score Function and Fisher Information)

Suppose that X ∼ Np (θ, I). Then
∂
s(X ; θ) = `(X ; θ)
∂θ ( n )
1 ∂ X
= − (xi − θ)> (xi − θ)
2 ∂θ
i=1
= n(x̄ − θ),
hence the information matrix is Fn = Var{n(x̄ − θ)} = nIp .

Theorem
If s = s(X ; θ) is the score function and if θb = t = t(X , θ) is any
function of X and θ, then under regularity conditions

> ∂ ∂t >
E(st ) = E(t > ) − E ·
∂θ ∂θ
Corollary
If s = s(X ; θ) is the score function, and θb = t = t(X ) is any
unbiased estimator of θ (i.e., E(t) = θ), then
E(st > ) = Cov(s, t) = Ik .

Note that the score function has mean zero
E{s(X ; θ)} = 0.
Hence, E(ss > ) = Var(s) = Fn and it follows that

∂2
Fn = − E `(X ; θ) .
∂θ∂θ>
Remark
If x1 , · · · , xn are i.i.d., Fn = nF1 where F1 is the Fisher
information matrix for sample size n = 1.

All estimators which are unbiased and attain the Cramer-Rao lower
bound are minimum variance estimators.
Theorem (Cramer-Rao)
If θ̂ = t = t(X ) is any unbiased estimator for θ then under
regularity conditions
Var(t) ≥ Fn−1 ,
where
Fn = E{s(X ; θ)s(X ; θ)> } = Var{s(X ; θ)}
is the Fisher information matrix

Proof.
Consider the correlation ρY ,Z between Y and Z where Y = a> t,
Z = c > s, and s is the score and the vectors a, c ∈ Rp . By the
Corollary Cov(s, t) = I and thus
Cov(Y , Z ) = a> Cov(t, s)c = a> c

Var(Z ) = c > Var(s)c = c > Fn c.
Hence,
Cov2 (Y , Z ) (a> c)2

ρ2Y ,Z = = > ≤ 1.
Var(Y ) Var(Z ) a Var(t)a· c > Fn c

cont’d.
In particular, this holds for any c 6= 0. Therefore it holds also for
the maximum of the left-hand side with respect to c. Since
c > aa> c
max = max c > aa> c
c c > Fn c c > Fn c=1
and
max c > aa> c = a> Fn−1 a
c > Fn c=1

By the maximization theorem in the chapter on Matrix Algebra we

have
a> Fn−1 a
≤ 1 ∀ a ∈ Rp , a 6= 0,
a> Var(t)a
i.e.,
a> {Var(t) − Fn−1 }a ≥ 0 ∀ a ∈ Rp , a 6= 0,
which is equivalent to Var(t) ≥ Fn−1 .

Asymptotic Sampling Distribution of the MLE
Maximum likelihood estimators (MLE’s) attain the lower bound if

the sample size n goes to infinity. The next theorem states this
and, in addition, gives the asymptotic sampling distribution of the
maximum likelihood estimation, which turns to be multinormal.

Theorem
Suppose that the sample {xi }ni=1 is i.i.d. If θb is the MLE for
θ ∈ Rk , i.e., θb = arg max L(X ; θ), then under some regularity
θ
conditions, as n → ∞:
√ L
n(θb − θ) −→ Nk (0, F1−1 )
where F1 denotes the Fisher information for sample size n = 1.

As a consequence we see that (under regularity conditions) the
MLE is asymptotically unbiased, efficient (minimum variance) and
normally distributed.

It follows that asymptotically
L
n(θb − θ)> F1 (θb − θ) → χ2p ,
If Fb1 is a consistent estimator of F1
L
n(θb − θ)> Fb1 (θb − θ) → χ2p
This expression may be useful to test hypotheses about θ and to
construct confidence regions for θ in a very general setup. It is
clear that

P n(θb − θ)> Fb1 (θb − θ) ≤ χ21−α;p ≈ 1 − α,
where χ2ν;p denotes the ν-quantile of a χ2p random variable. So,

the ellipsoid n(θb − θ)> Fb1 (θb − θ) ≤ χ21−α;p ∈ Rp provides an
asymptotic (1 − α)-confidence region for θ.
Summary: Cramer-Rao Lower bound
The score function is the derivative s(X ; θ) = ∂ `(X ; θ) of

∂θ
the log-likelihood with respect to θ. The covariance matrix of
s(X ; θ) is the Fisher information matrix.
Any unbiased estimator θb = t = t(X ) has a variance that is
bounded below by the inverse of the Fisher information. Thus,
an estimator, which attains this lower bound, is a minimal
variance estimator.

Summary: Cramer-Rao Lower bound
MLE’s attain the lower bound in an asymptotic sense, i.e.,

√ L
n(θb − θ) −→ N(0, F1−1 )
if θb is the MLE θb = arg max L(X ; θ).

θ

Hypothesis Testing 7-1
Likelihood Ratio Test
Suppose that the distribution of {xi }ni=1 , xi ∈ Rp , depends on a

parameter vector θ. Then
H 0 : θ ∈ Ω0
H 1 : θ ∈ Ω1 .
The hypothesis H0 corresponds to the “reduced model” and H1 to

the “full model”.

Example
Xi ∼ Np (θ, I)
H0 : θ = θ 0
H1 : no constraints for θ
or equivalently to Ω0 = {θ0 }, Ω1 = Rp .

Likelihood Ratio
Define L∗j = max L(X ; θ), the maxima of the likelihood for each of
θ∈Ωj
the hypotheses.
L∗0
λ(X ) =
L∗1
Likelihood Ratio Test
rejection region:
R = {x : λ(x) < c}
sup Pθ (x ∈ R) = α
θ∈Ω0

Theorem (Wilks)
If Ω1 ⊂ Rq is a q-dimensional space and if Ω0 ⊂ Ω1 is an
r -dimensional subspace, then under regularity conditions for
n→∞
L
∀ θ ∈ Ω0 : −2 log λ −→ χ2q−r .

Test problem 1
X1 , . . . , Xn , i.i.d. with Xi ∼ Np (µ, Σ)
H0 : µ = µ0 , Σ known, H1 : no constraints.
Ω0 = {µ0 }, r = 0, Ω1 = Rp , q = p
−2 log λ = 2(`∗1 − `∗0 ) = n(x − µ0 )> Σ−1 (x − µ0 )
−2 log λ ∼ χ2p
Rejection region R: {x ∈ Rn such that −2 log λ > χ20.95;p }

Example (Bank Data)
µ0 = (214.9, 129.9, 129.7, 8.3, 10.1, 141.5)> .
x = (214.8, 130.3, 130.2, 10.5, 11.1, 139.4)> .
−2 log λ = 2(`∗1 − `∗0 ) = n(x − µ0 )> Σ−1 (x − µ0 )

= 7362.32
LR test statistic −2 log λ ∼ χ26 is highly significant.

Test problem 2
Xi ∼ Np (µ, Σ) i.i.d.
H0 : µ = µ 0 , Σ unknown, H1 : no constraints.
Under H0 it can be shown that
`∗0 = `(µ0 , S + dd > ), d = (x − µ0 )
and under H1 we have

`∗1 = `(x, S).
This leads to
− 2 log λ = 2(`∗1 − `∗0 ) = n log(1 + d > S −1 d). (16)

Test problem 2 cont’d
Note that this statistic depends on (n − 1)d > S −1 d which has,
under H0 , a Hotelling’s T 2 -distribution. Therefore,
(n − 1)(x̄ − µ0 )> S −1 (x̄ − µ0 ) ∼ T 2 (p, n − 1). (17)
or equivalently

n−p
(x̄ − µ0 )> S −1 (x̄ − µ0 ) ∼ Fp,n−p
p
So the rejection region may be defined as

n−p
(x̄ − µ0 )> S −1 (x̄ − µ0 ) > F1−α;p,n−p .
p

Test problem 2 cont’d

Alternatively we have, under H0 , the asymptotic distribution
−2 log λ −→ χ2p ,
leading to the rejection region

n o
n log 1 + (x̄ − µ0 )> S −1 (x̄ − µ0 ) > χ21−α;p

Confidence region for µ

n−p
p (x̄ − µ)> S −1 (x̄ − µ) ∼ Fp,n−p

> −1 p
µ ∈ R | (µ − x̄) S
p
(µ − x̄) ≤ F1−α;p,n−p
n−p
is a confidence region at level (1-α) for µ; it is the interior of an

iso-distance ellipsoid in Rp .
When p is large, ellipsoids are difficult to practically handle. One is
thus interested in finding confidence intervals for µ1 , µ2 , . . . , µp so
that simultaneous confidence on all the intervals reaches the
desired level say, 1 − α.
Simultaneous Confidence Intervals for a> µ

Obvious confidence interval for certain a> µ is given by:
√
n − 1(a> µ − a> x̄)
√ ≤ t1− α ;n−1

a> Sa 2
or equivalently
2
2 (n − 1) a> (µ − x̄)
t (a) = ≤ F1−α;1,n−1
a> Sa
which provides the (1 − α) confidence interval for a> µ:
 s s 
> >
a> x̄ − F1−α;1,n−1 a Sa ≤ a> µ ≤ a> x̄ + F1−α;1,n−1 a Sa  .
n−1 n−1

Using Theorem on maximum of quadratic forms we see that:
max t 2 (a) = (n − 1)(x̄ − µ)> S −1 (x̄ − µ) ∼ T 2 (p, n − 1).

a
implies that the simultaneous confidence intervals for all

possible linear combinations a> µ, a ∈ Rp of the elements of µ is
given by:
p p
a> x̄ − Kα a> Sa, a> x̄ + Kα a> Sa ,
p
where Kα = n−p F1−α;p,n−p .

Example
95% confidence region for µf , the mean of the forged banknotes, is
given by the ellipsoid:

6
µ ∈ R6 (µ − x̄f )> Sf−1 (µ − x̄f ) ≤ F0.95;6,94
94
95% simultaneous c.i. are given by (using F0.95;6,94 = 2.1966)
214.692 ≤ µ1 ≤ 214.954
130.205 ≤ µ2 ≤ 130.395
130.082 ≤ µ3 ≤ 130.304
10.108 ≤ µ4 ≤ 10.952
10.896 ≤ µ5 ≤ 11.370
139.242 ≤ µ6 ≤ 139.658
Example (cont’d)
Comparison with µ0 = (214.9, 129.9, 129.7, 8.3, 10.1, 141.5)>
shows that almost all components (except the first one) are
responsible for the rejection of µ0 .
In addition, choosing e.g. a> = (0, 0, 0, 1, −1, 0) gives c.i.
−1.211 ≤ µ4 − µ5 ≤ 0.005 shows that for the forged bills, the
lower border is essentially smaller than the upper border.

Test problem 3
Xi ∼ Np (µ, Σ)
H0 : Σ = Σ 0 , µ unknown, H1 : no constraints.
−2 log λ = 2(`∗1 − `∗0 )

= n tr(Σ−1 −1
0 S) − n log |Σ0 S| − np.
1
−2 log λ → χ2m , m= 2 {p(p + 1)}

Example (US companies data)

!
1.6635 1.2410
S = 107 × (energy sector)
1.2410 1.3747
!
X1
1.2248 1.1425
We want to test if Var X2 = 107 × = Σ0
1.1425 1.5112
(where Σ0 is the variance of manufacturing sector)
LR test statistic −2 log λ = 2.7365 is not significant for χ23 .
Hence, we don’t reject the null hypothesis H0 and we can not
conclude that Σ 6= Σ0 .

Test problem 4
Yi ∼ N1 (β > xi , σ 2 ), xi ∈ Rp
H 0 : β = β0 , σ 2 unknown, H1 : no constraints.
−2 log λ = 2(`∗1 − `∗0 )

!
n ||y − X β0 ||2
= log
2 ||y − X β̂||2
−→ χ2p
Recall !
(n − p) ||y − X β0 ||2
F = −1 ∼ Fp,n−p
p ||y − X β̂||2

Example (Classic blue pullover example)

α

211
β = 0
     
y1 x1,1 1 x1,2
 .   .   . .. 
y = .   .
 . = .
,
 X =

.. . .
y10 x10,1 1 x10,2
The test statistic for the LR test is −2 log λ = 4.55 which is under
the χ22 distribution not significant. However the exact F -test
statistic F = 5.93 under the F2,8 distribution is significant
(F2,8;0.95 = 4.46).

Summary: Hypothesis Testing
The hypotheses H0 : θ ∈ Ω0 against H1 : θ ∈ Ω1 can be test

by means of the likelihood ratio test (LRT).
The likelihood ratio (LR) is the quotient λ(X ) = L∗0 /L∗1 where
the L∗j are the maxima of the likelihood in each of the
hypotheses.
The test statistic in the LRT is λ(X ) or equivalently its
logarithm log λ(X ).

If Ω1 is q-dimensional and Ω0 ⊂ Ω1 r -dimensional, then the

asymptotic distribution of −2 log λ is χ2q−r . This allows to
test H0 against H1 by calculating as test statistic
−2 log λ = 2(`∗1 − `∗0 ) where `∗j = log L∗j .
The hypothesis H0 : µ = µ0 for X ∼ Np (µ, Σ), Σ known,
leads to −2 log λ = n(x − µ0 )> Σ−1 (x − µ0 ) ∼ χ2p .
The hypothesis H0 : µ = µ0 for X ∼ Np (µ, Σ), Σ unknown,
leads to −2 log λ = n log{1 + (x − µ0 )> S −1 (x − µ0 )} −→ χ2p ,
and
(n − 1)(x̄ − µ0 )> S −1 (x̄ − µ0 ) ∼ T 2 (p, n − 1).
The hypothesis H0 : Σ = Σ0 for X ∼ Np (µ, Σ), µ unknown,

leads to −2 log λ = n tr Σ−1 −1
0 S − n log |Σ0 S| − np −→
χ2m , m = 12 p(p + 1).
The hypothesis H0 : β = β0 for Yi ∼ N1 (β > xi ,σ 2 ), σ 2
2
unknown, leads to −2 log λ = n2 log ||y −X β0 ||2 −→ χ2p .
||y −X β̂||

Linear Hypothesis
We present a general procedure which allows a linear hypothesis to
be tested.
Linear hypotheses are of the form Aµ = a with known matrices
A(q × p) and a(q × 1) with q ≤ p.
Example
Suppose that X1 ∼ N(µ1 , σ) and X2 ∼ N(µ2 , σ) are independent
and that you want to test the hypothesis H0 : µ1 = µ2
This can be written as linear hypothesis
!
µ
1
H0 : Aµ = 1 −1 = 0.
µ2
Test problem 5
Xi ∼ Np (µ, Σ)
H0 : Aµ = a, Σ known, H1 : no constraints.
The results of the Test Problems 1 and 2 can directly be used on

µy , the mean of Yi = AXi .
Indeed Yi ∼ Nq (µy , Σy ) where µy =Aµ and Σy = AΣA> .
Accordingly we have: ȳ = Ax̄, Sy = ASA> , d = Ax̄ − a.
n(Ax̄ − a)> (AΣA> )−1 (Ax̄ − a) ∼ Xq2

Example

µ1
We consider hypotheses on partitioned µ = µ2 .
H0 : µ 1 = µ 2 , H1 : no constraints,

for N2p ( µµ12 , Σ0 Σ0 ) with known Σ.
This is equivalent to A = (Ip , −Ip ), a = (0, . . . , 0)> and leads to
| {z }
p
−2 log λ = n(x 1 − x 2 )(2Σ)−1 (x 1 − x 2 ) ∼ χ2p .

Example
Another example is the test whether µ1 = 0, i.e.
H0 : µ1 = 0, H1 : no constraints,

for N2p ( µµ12 , Σ0 Σ0 ) with known Σ.
This is equivalent to Aµ = a with A = (I, 0),
a = (0, . . . , 0)> .
| {z }
p
Hence
−2 log λ = nx 1 Σ−1 x 1 ∼ χ2p .

Test problem 6
Xi ∼ Np (µ, Σ)
H0 : Aµ = a, Σ unknown, H1 : no constraints.

Example
Consider the bank data set and test if µ4 = µ5 , i.e., if the lower
border mean equals to the larger border mean for the forged bills.
A = (0 0 0 1 − 1 0)
a = 0.
The test statistic is
99(Ax̄)> (ASf A> )−1 (Ax̄) ∼ T 2 (1, 99) = F1,99 .
The observed value is 13.638 which is significant.

Repeated Measurements
Frequently, n independent sampling units are observed under p
different experimental conditions (different treatments,...).
X1 , . . . , Xn are i.i.d. with Xi ∼ Np (µ, Σ) given p repeated measures.
The hypothesis of interest in that case is the following: there are

no treatment effects, H0 : µ1 = µ2 = . . . = µp . This hypothesis is
a direct application of the Test Problem 6.
 
1 −1 0 · · · 0
 
 0 1 −1 · · · 0 

H0 : C µ = 0 where C ((p − 1) × p) =  . .. .. .. .. 
. 
 . . . . . 
0 ··· 0 1 −1
Note that in many cases one of the experimental conditions is the
“control” (a placebo, standard drug or reference condition). In this
case,  
1 −1 0 · · · 0
 
 1 0 −1 · · · 0 
C ((p × 1) × p) = 
 .. .. .. .. .. 
 . . . . . 
1 0 0 · · · −1
The null hypothesis will be rejected when
(n − p + 1) > >
x̄ C (CSC > )−1 C x̄ > F1−α;p−1,n−p+1
p−1

Simultaneous confidence intervals for linear combinations of the

mean of Yi have already been derived. For all a ∈ Rp−1 , with
probability (1 − α) we have:
s
(p − 1)
a> C µ ∈ a> C x̄ ± F1−α;p−1,n−p+1 a> CSC > a.
n−p+1
The row sums of the element of C are zero: C 1p = 0, therefore

a> C is a vector whose sum of element vanishes. This is called a
contrast.

P
p
Let b = C > a, we have b > 1p = bj = 0, the result above
j=1
provides thus for all contrasts of µ, b > µ simultaneous confidence
intervals of level (1 − α)
s
(p − 1)
b > µ ∈ b > x̄ ± F1−α;p−1,n−p+1 b > Sb.
n−p+1
Contrast are e.g.: b > = (1, −1, 0, 0), (1, 0, 0, −1),

( 31 , − 13 , − 13 , − 31 ).

Example
40 children were randomly chosen and then followed from grade
level 8 to 11, the scores obtained from a test of their vocabulary.
x̄ > = (1.086, 2.544, 2.851, 3.420)

 
2.902
 
 2.438 3.049 
S = 
 2.963 2.775 4.281
.

 
2.183 2.319 2.939 3.162

Example (cont’d)
The matrix C providing successive differences of µj is:
 
1 −1 0 0
 
C =  0 1 −1 0 .
0 0 1 −1
The test statistic is Fobs = 53.134 which is significant for F3.37 .

We have the following simultaneous 95% confidence intervals
−1.958 ≤ µ1 − µ2 ≤ −0.959
−0.949 ≤ µ2 − µ3 ≤ 0.335
−1.171 ≤ µ3 − µ4 ≤ 0.036.

Example (cont’d)
The rejection of the H0 is mainly due to the difference between the
first and the second year performance of children. The following
confidence intervals for the following contrasts may also be of
interest:
−2.283 ≤ µ1 − 13 (µ2 + µ3 + µ4 ) ≤ −1.423

−1.777 ≤ 13 (µ1 + µ2 + µ3 ) − µ4 ≤ −0.742
−1.479 ≤ µ2 − µ4 ≤ −0.272
i.e., µ1 is different from the average of the 3 other years and µ4

turns out to be better than µ2 .

Test Problem 7
Suppose Y1 , . . . , Yn , independent with
Yi ∼ N1 (β > xi , σ 2 ),xi ∈ Rp .
H0 : Aβ = a, σ 2 unknown, H1 : no constraints.
The constrained maximum likelihood estimators under H0 are
β̃ = β̂ − (X > X )−1 A> [A(X > X )−1 A> ]−1 (Aβ̂ − a)
for β and σ̃ 2 = n1 (y − X β̃)> (y − X β̃). β̂ denotes the
unconstrained MLE as before. The LR statistic is
−2 log λ = 2(`∗1 − `∗0 )
!
n ||y − X β̃||2
= log
2 ||y − X β̂||2
−→ χ2q
Example (“classic blue” pullovers)

Let’s test if β = 0 in the regression of sales on prices. It holds

α
β = 0 ←→ (0 1) = 0.
β
The LR statistic here is
−2 log λ = 0.142
which is not significant for the χ21 distribution. The F -test statistic
F = 0.231
is also not significant.

Example (“classic blue” pullovers cont’d)

We can assume independence of sales on prices (alone).
Multivariate regression in the “classic blue” pullovers example.
Parameter estimates in the model
X1 = α + β1 X2 + β2 X3 + β3 X4 + ε
are
α̂ = 65.670, β̂1 = −0.216, β̂2 = 0.485, β̂3 = 0.844.
Let us test now the hypothesis

1
H0 : β1 = − β2
2
Example (“classic blue” pullovers cont’d)

This is equivalent to
 
α
 
1  β1 
0 1 0   = 0.
2  β2 
β3
The LR statistic in this case is equal to
−2 log λ = 0.006,
the F statistic is
F = 0.007.
Hence, in both cases we will not reject our hypothesis.

Test Problem 8 (Comparison of two means)
Suppose Xi1 ∼ Np (µ1 , Σ),i = 1, · · · , n1 and
Xj2 ∼ Np (µ2 , Σ),j = 1, · · · , n2 , all the variables being
independent.
H 0 : µ1 = µ2 , H1 : no constraints.
Both samples provide the statistics x̄k and Sk , k=1,2. Let

δ = µ1 − µ2 ,we have

n1 + n2
(x̄1 − x̄2 ) ∼ Np δ, Σ
n1 n2
n1 S1 + n2 S2 ∼ Wp (Σ, n1 + n2 − 2).

The rejection region will thus be given by:
n1 n2 (n1 + n2 − p − 1)
((x̄1 − x̄2 ))> S −1 ((x̄1 − x̄2 ))
p(n1 + n2 )2
≥ F1−α;p,n1 +n2 −p−1
A (1 − α) ∗ 100% confidence region for δ is given by the ellipsoid
centered at (x̄1 − x̄2 )
(δ − (x̄1 − x̄2 ))> S −1 (δ − (x̄1 − x̄2 ))
p(n1 + n2 )2
≤ F1−α;p,n1 +n2 −p−1 ,
(n1 + n2 − p − 1)(n1 n2 )
and the simultaneous confidence intervals for all linear
combinations of the elements of δ : a> δ are given by
s
> > p(n1 + n2 )2
a δ ∈ a (x̄1 −x̄2 )± F1−α;p,n1 +n2 −p−1 a> Sa.
(n1 + n2 − p − 1)(n1 n2 )
Example
We want to compare the mean of the assets (X1 ) and of the sales
(X2 ) of the two sectors energy (group 1) and manufacturing (group
2). We have the!following statistics !n1 = 15, n2 = 10, p = 2,
4084 4307.2
x̄1 = , x̄1 = and
2580.5 4925.2
!
7 1.6635 1.2410
S1 = 10 ∗ ,
1.2410 1.3747
!
1.2248 1.1425
S2 = 107 ∗ ,
1.1425 1.5112
!
7 1.4880 1.2016
so that S = 10 ∗ .
1.2016 1.4293

Example
The observed value of the test statistic is Fobs = 2.7036. Since
F0.95;2,22 = 3.4434 the hypothesis of equal means of the two
groups is not rejected although it would be rejected at a less severe
level (p − value = 0.0892). The 95% simultaneous confidence
intervals for the differences are given by
−4628.6 ≤ µ1a − µ2a ≤ 4182.2

−6662.4 ≤ µ1s − µ2s ≤ 1973.0.

Example
Let us compare the vectors of means of the forged and the genuine
bank notes. The matrices Sf and Sg were already calculated and
since here nf = ng = 100, S is simply the mean of Sf and
Sg : S = 12 (Sf + Sg ).
x̄g> = (214.97 129.94 129.72 8.305 10.168 141.52)

x̄f> = (214.82 130.3 130.19 10.53 11.133 139.45)
The test statistic is Fobs = 391.92 which is highly significant for

F6,193 .

Example
The 95% simultaneous confidence intervals for the differences
δj = µgj − µfj , j = 1, . . . , p are:
−0.0443 ≤ δ1 ≤ 0.3363
−0.5186 ≤ δ2 ≤ −0.1954
−0.6416 ≤ δ3 ≤ −0.3044
−2.6981 ≤ δ4 ≤ −1.7519
−1.2952 ≤ δ5 ≤ −0.6348
1.8072 ≤ δ6 ≤ 2.3268
All the components (except for the first) show a significant

difference in the means. The main effects being taken by the lower
border (X4 ) and the diagonal (X6 ).
Test Problem 9 (Comparison of Covariance Matrices)

Let Xih ∼ Np (µh , Σh ), i = 1 · · · , Nh ; h = 1, · · · , k
all variables being independent,
H0 : Σ1 = Σ2 = · · · = Σk , H1 : no constraints.

Each subsample provides Sh an estimator of Σh with
nh Sh ∼ Wp (Σh , nh − 1)
P
Under H0 , kh=1 nh Sh ∼ Wp (Σ, n − k), where Σ is the common
P
covariance matrix x and n = kh=1 nh . Let S = n1 S1 +···+n
n
k Sk
be
the weighted average of the Sh (it is in the fact the MLE of Σ when
H0 is true). The likelihood ratio test leads to the statistic
k
X
−2 log λ = n log | S | − nh log | Sh |
h=1
which under H0 is approximately distributed as a Xm2 where

m= 12 (k − 1)p(p + 1).

Example
Come back to US companies data, where the mean of assets and
sales have been compared for companies from the energy and
manufacturing sector. The test Σ1 = Σ2 leads to the value of the
test statistic
−2 log λ = 0.9076
which is not significant (p-value for a χ23 = 0.82). We cannot

reject H0 and the comparison of the means above is valid.

Test Problem 10 (Comparison of two means, unequal

covariance matrices, large samples)
Suppose Xi1 ∼ Np (µ1 , Σ1 ),i = 1, · · · , n1 and
Xj2 ∼ Np (µ2 , Σ2 ),j = 1, · · · , n2 , all the variables being
independent.
H0 : µ 1 = µ 2 , H1 : no constraints.


Σ1 Σ2
(x̄1 − x̄2 ) ∼ Np δ, + .
n1 n2
Therefore,
−1
> Σ1 Σ2
(x̄1 − x̄2 ) + (x̄1 − x̄2 ) ∼ χ2p
n1 n2
Since Si is a consistent estimator of Σi , i = 1, 2 we have

−1
> S1 S2
(x̄1 − x̄2 ) + (x̄1 − x̄2 ) → χ2p (18)
n1 n2

Example
Let us compare the forged and the genuine bank notes again (n1
and n2 are large). The test statistic turns out to be 2436.8 which
is highly significant. The 95% simultaneous confidence intervals
are now:
−0.0389 ≤ δ1 ≤ 0.3309
−0.5140 ≤ δ2 ≤ −0.2000
−0.6368 ≤ δ3 ≤ −0.3092
−2.6846 ≤ δ4 ≤ −1.7654
−1.2858 ≤ δ5 ≤ −0.6442
1.8146 ≤ δ6 ≤ 2.3194
showing that all the components except the first are different from
zero, the larger difference coming from X6 (length of the diagonal)
and X4 (lower border).
Profile analysis
p measures are reported in the same units.

For instance, measures of blood pressure at p different
moments, one group being the control group and the other is
the group receiving a new treatment.
One is then interested to compare the profile of each group: the

profile being just the vectors of means of the p responses (the
comparison may be visualized in a two dimensional graph using the
parallel coordinate plot

Profile Analysis
Population Profiles
Group 1
5
Group 2
4
Mean
3
2
1
1 2 3 4 5
Treatment
Figure: Population profiles MVAprofil

The following questions are of interest:

1) Are the profiles similar in the sense of being parallel (which
means no interaction between the treatments and the
groups)?
2) If the profiles are parallel, are they at the same level?
3) If the profiles are parallel, is there any treatment effect (are
the profiles horizontal)?
The above questions are easily translated in terms of linear
constraints on the means and a test statistic is obviously obtained.

Parallelism
Let C be a((p − 1) × p) matrix defined as
 
1 −1 0 ··· 0
 
C = 0 1 −1 · · · 0 .
0 ··· 0 1 −1
(1)
The hypothesis to be tested is H0 : C (µ1 − µ2 ) = 0. Under H0 ,
n1 n2
(n1 + n2 − 2) (C (x̄1 − x̄2 ))> (C SC > )−1 C (x̄1 − x̄2 )
(n1 + n2 )2
∼ T 2 (p − 1, n1 + n2 − 2)
when S is the pooled covariance matrix. The hypothesis is rejected
if
n1 n2 (n1 + n1 − p) −1
> >
(C x̄) C SC C x̄ > F1−α;p−1,n1 +n2 −p .
(n1 + n2 )2 (p − 1)
Equality of two levels
The question of equality of the two levels is meaningful only if the

two profiles are parallel. In the case of interaction (rejection of
(1)
H0 ), the two populations react differently to the treatments and
the question of the level has no meaning.
The equality of the two levels is written as:
(2)
H0 : 1>
p (µ1 − µ2 ) = 0

2
n1 n2 1>
p (x̄1 − x̄2 )
(n1 + n2 − 2) ∼ T 2 (1, n1 + n2 − 2)
(n1 + n2 )2 1>p S1p
= F1,n1 +n2 −2.
The rejection region is thus

2
n1 n2 (n1 + n2 − 2) 1>
p (x̄1 − x̄2 )
> F1−α;1,n1 +n2 −2 .
(n1 + n2 )2 1>p S1p

Treatment effect
If the parallelism between the profiles has been rejected, then two
independent analyses should be done on the two groups using the
repeated measurement approach (see above). But if the parallelism
is accepted, we can exploit the information contained in both
groups (eventually at different levels) to test a treatment effect or
the horizontality of the two profiles.
This may be written as:
(3)
H0 : C (µ1 + µ2 ) = 0.

(3) (1)
It is easy to prove that H0 with H0 implies that

n1 µ1 + n2 µ2
C = 0.
n1 + n2
So under parallel, horizontal profiles we have
√
n1 + n2 C x̄ ∼ Np (0, C ΣC 0 ).
We obtain
(n1 + n2 − 2)(C x̄)> (C SC > )−1 C x̄ ∼ T 2 (p − 1, n1 + n2 − 2).

(3)
This leads to the rejection region of H0
n1 + n2 − p
(C x̄)> (C SC > )−1 C x̄ > F1−α;p−1,n1 +n2 −p .
p−1

Example
Wechsler Adult Intelligence Scale (WAIS) for 2 categories of
people: in group 1 are n1 = 37 people who do not present a senile
factor, group 2 are those (n2 = 12) presenting a senile factor. The
four WAIS subtests are X1 (information), X2 (similarities), X3
(arithmetic) and X4 (picture completion). The relevant statistics
are
x̄1> = (12.57 9.57 11.49 7.97)

x̄2> = (8.75 5.33 8.50 4.75)

Example
 
11.164
 
 8.840 11.759 
S1 = 



 6.210 5.778 10.790 
2.020 0.529 1.743 3.594
 
9.688
 
 9.583 16.722 
S2 = 



 8.875 11.083 12.083 
7.021 8.167 4.875 11.688

Example
The test statistics for testing the parallelism of the two profiles is
Fobs = 0.4634 which is not significant (p − value = 0.71) so we
can accept the parallelism.
The second test (equality of the levels of the 2 profiles) is given
with Fobs = 17.2146 which is highly significant (p-value ' 10−4 ):
the global level of the test for the non-senile people is superior to
the senile group.
Finally, the final test (horizontality of the average profile) gives
Fobs = 53.317 which is also highly significant (p-value ' 10−14 ).
There are significant differences among the means of the different
subtests.

Summary: Linear Hypothesis
Hypotheses about µ can often be written as Aµ = a, with

matrix A, and vector a.
The hypothesis H0 : Aµ = a for X ∼ Np (µ, Σ) with Σ known
leads to −2 log λ = n(Ax − a)> (AΣA> )−1 (Ax − a) ∼ χ2q ,
where q is the number of elements in a.

The hypothesis H0 : Aµ = a for X ∼ Np (µ, Σ) with Σ

unknown leads to
−2 log λ = n log{1 + (Ax − a)> (ASA)−1 (Ax − a)} −→ χ2q ,
where q is the number of elements in a and we have an exact
test (n − 1)(Ax̄ − a)> (ASA> )−1 (Ax̄ − a) ∼ T 2 (q, n − 1).

The hypothesis H0 : Aβ = a for Yi ∼ N1 (β > xi , σ

2 ) with σ 2
2
unknown leads to −2 log λ = n2 log ||y −X β̃||2 − 1 −→ χ2q ,
||y −X β̂||
with q being the length of a and with
n o−1
Aβ̂ − a A X > X −1 A> A β̂ − a
n−p
> ∼ Fq,n−p .
q
y − X β̂ y − X β̂

Regression Models 8-1
Regression Models
Linear Regression
y = Xβ + ε
X (n × p) explanatory variable
y (n × 1) response

Example
Let x1 , x2 be two factors that explain the variation of response y
2 2
yi = β0 + β1 xi1 + β2 xi2 + β3 xi1 + β4 xi2 + β5 xi1 xi2 + εi
i = 1, . . . , n
 
2
1 x11 x12 x11 2
x12 x11 x12
 2 2 
 1 x21 x22 x21 x22 x21 x22 
X =
 .. .. .. .. .. .. 

 . . . . . . 
2 2
1 xn1 xn2 xn1 xn2 xn1 xn2

Example
Figure: 3-D response surface MVAresponsesurface

ANOVA Models
One factor (p levels) model
yk` = µ + α` + εk` , k = 1, . . . , n` , and ` = 1, . . . , p
Pullover example: p = 3 marketing strategies, y = X β + ε

 
1 1 0
 
 1 1 0 
 
 1 0 1 
 
X = 
 1 0 1 
 
 1 −1 −1 
 
1 −1 −1
Multiple-Factors Models
Example: 3 marketing strategies, 2 locations
A1 A2 A3
B1 18 15 5 8 8 10 14
B2 15 20 25 30 10 12 20 25
Table: A two factor ANOVA data set, factor A, three levels of the
marketing strategy and factor B, two levels for the location. The figures
represent the resulting sales during the same period.

General Two Factor Model
yijk = µ + αi + γj + (αγ)ij + εijk

i = 1, . . . , r , j = 1, . . . , s, k = 1, . . . , nij
r
X s
X
αi = 0, γj = 0
i=1 j=1
r
X Xs
(αγ)ij = 0, (αγ)ij = 0
i=1 j=1
For the marketing data: r = 3, s = 2

Interactions: (αγ)ij

Example
(αγ)11 > 0 The effect of A1 (advertisement in local newspaper)

more successful in location B1 (commercial centre)
(αγ)31 < 0 A3 (luxury presentation) less effective in B1 than in

B2 (non-commercial centre)

Model without Interactions
>
y= 18 15 15 20 25 30 5 8 8 10 12 10 14 20 25
 >
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 0 0 0 0 0 −1 −1 −1 −1 
X =
 

 0 0 0 0 0 0 1 1 1 1 1 −1 −1 −1 −1 
1 1 −1 −1 −1 −1 1 1 1 −1 −1 1 1 −1 −1

Model with Interactions
 >
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

 1 1 1 1 1 1 0 0 0 0 0 −1 −1 −1 −1 

 
 0 0 0 0 0 0 1 1 1 1 1 −1 −1 −1 −1 
X = 

 1 1 −1 −1 −1 −1 1 1 1 −1 −1 1 1 −1 −1 

−1 −1 −1 −1 −1 −1
 
 1 1 0 0 0 0 0 1 1 
0 0 0 0 0 0 1 1 1 −1 −1 −1 −1 1 1

Example
βb p-values
µ 15.25
α1 4.25 0.0218
α2 -6.25 0.0033
γ1 -3.42 0.0139
(αγ)11 0.42 0.7922
(αγ)21 1.42 0.8096
Table: The values of βb in the full model with interactions for the
marketing data (RSSfull = 158)

ANCOVA Models
Regression models where some of the variables are qualitative

and others are continuous
Example: Consider the Car data and analyse the effect of weight
(W ) and displacement (D) on the mileage (M). Test if the origin
of the car (C ) has some effect on the response and if the effect of
the continuous variables is same for different factor levels.

Example
βb p-values βe p-values
µ 41.0066 0.0000 43.4031 0.0000
W -0.0073 0.0000 -0.0074 0.0000
D 0.0118 0.2250 0.0081 0.4140
C -0.9675 0.1250
Table: Estimation of the effects of weight and displacement on the

mileage MVAcareffect

Example
µ p-values W p-values D p-values

c=1 40.043 0.0000 -0.0065 0.0000 0.0058 0.3790
c=2 47.557 0.0005 0.0081 0.3666 -0.3582 0.0160
c=3 44.174 0.0002 0.0039 0.7556 -0.2650 0.3031
Table: Different factor levels on the response MVAcareffect

Categorical Responses
The response variable is categorical (qualitative)

Observe counts yk for class k = 1, . . . , K
Likelihood
K
n! Y mk yk
L = QK
k=1 yk ! k=1
n
Idea: make log mk linear on X

Two-Way Tables
yjk is the number of observations on cell (j, k)

Multinomial likelihood
K
mjk yjk
J Y
Y
n!
L = QJ QK
j=1 k=1 yjk ! j=1 k=1
n
No interaction
log mjk = µ + αj + γk for j = 1, . . . , J, k = 1, . . . , K
log m = X β

Model without Interaction
   
log m11 1 1 1 0  
   
 log m12   1 1 0 1  β0
     
 log m13   1 1 −1 −1   β1 
     
log m =   , X = , β =  
 log m21   1 −1 1 0  β
 2 
   
 log m22   1 −1 0 1  β3
   
log m23 1 −1 −1 −1

Model without Interaction
Likelihood
J X
X K X
Lβ = yjk log mjk s.t. mjk = n
j=1 k=1 j,k
α1 = β1 , α2 = −β1
γ1 = β2 , γ2 = β3 , γ3 = −(β2 + β3 )

Model with Interactions
log mjk = µ + αj + γk + (αγ)jk , j = 1, . . . , J, k = 1 . . . , K
K
X
(αγ)jk = 0, for j = 1, . . . , J
k=1
J
X
(αγ)jk = 0, for k = 1, . . . , K
j=1

Testing with Count Data
yk count data
b k value predicted by the model
m
Pearson chi-square
K
X
2 b k )2
(yk − m
χ =
bk
m
k=1
Deviance
K
X
2 yk
G =2 yk log
bk
m
k=1

Both statistics are asymptotically χ2 distributed
Degrees of freedom
d.f . = # free cells − # free parameters estimated
Test
H0 : reduced model with r degrees of freedom
H1 : full model with f degrees of freedom

GH2 0 − GH2 1 ∼ χ2r −f
Reject H0

P χ2r −f > GH2 0 − GH2 1 observed

Example
2 × 2 × 5 table of n = 5833 counts on prescribed drugs
M A1 A2 A3 A4 A5
DY 21 32 70 43 19
DN 683 596 705 295 99
F A1 A2 A3 A4 A5
DY 46 89 169 98 51
DN 738 700 847 336 196
Table: A Three-way Contingency Table: top table for men and bottom
table for women MVAdrug
Example
βb0 intercept 5.0089 βb10 0.0205

βb1 gender: M −0.2867 βb11 0.0482
βb2 drug: DY −1.0660 βb12 drug*age −0.4983
βb3 age −0.0080 βb13 −0.1807
βb4 0.2151 βb14 0.0857
βb5 0.6607 βb15 0.2766
βb6 −0.0463 βb16 gender*drug*age −0.0134
βb7 gender*drug −0.1632 βb17 −0.0523
βb8 gender*age 0.0713 βb18 −0.0112
βb9 −0.0092 βb19 −0.0102

Example
βb0 intercept 5.0051 βb8 gender*age 0.0795

βb1 gender: M −0.2919 βb9 0.0321
βb2 drug: DY −1.0717 βb10 0.0265
βb3 age −0.0030 βb11 0.0534
βb4 0.2358 βb12 drug*age −0.4915
βb5 0.6649 βb13 −0.1576
βb6 −0.0425 βb14 0.0917
βb7 gender*drug −0.1734 βb15 0.2822
Table: Coefficients estimates based on the saturated model (previous

slide) and ML method (current slide) MVAdrug3waysTab
Logit Models
p
X
exp(β0 + βj xij )
j=1
p (xi ) = P(yi = 1 | xi ) = p
X
1 + exp(β0 + βj xij )
j=1
Log odds ratio is linear

p
X
p (xi )
log = β0 + βj xij
1 − p (xi )
j=1

Logit Models
Likelihood function
n
Y
L(β0 , β) = p (xi )yi {1 − p (xi )}1−yi
i=1
Log-likelihood function
n
X
`(β0 , β) = [yi log p (xi ) + (1 − yi ) log{1 − p (xi )}]
i=1

Example
β̂ p-values
β0 3.6042 0.0660
β3 -0.2031 0.0037
β4 -0.0205 0.0183
β5 -1.1841 0.3108
Table: Estimation of the financial characteristics on bankrupt banks with

logit model MVAbankrupt

Summary: Regression Models
In contingency tables, the categories are defined by the

qualitative variables.
The saturated model has all of the interaction terms, and 0
degree of freedom.
The non-saturated model is a reduced model since it fixes
some parameters to be zero.

Two statistics to test for the full model and the reduced
model are:
XK
2
X = (yk − mb k )2 /m
bk
k=1
K
X
G2 = 2 bk )
yk log (yk /m
k=1

The logit models allow the column categories to be a

quantitative variable, and quantify the effect of the column
category by using fewer parameters and incorporating more
flexible relationships than just a linear one.
The logit model is equivalent to a log-linear model.
p
X
log [p (xi )/{1 − p (xi )}] = β0 + βj xij
j=1

Decomposition of Data Matrices by Factors 9-1
Decomposition of Data Matrices by Factors
The aim of MVA is dimension reduction

Data matrix X (n × p) – n observations, p variables:
1. Each row (observation) is a vector xi> = (xi1 , . . . , xip ) ∈ Rp
)+*-,

!""#$
&%('


2. Each column (variable) is a vector

x[j] = (x[1j] , . . . , x[nj] )> ∈ Rn

!#"$&%('

)+*-,


Summary: Introduction
Each row (individual) of X is a p-dimensional vector. From

this point of view X can be considered as a cloud of n points
in Rp .
Each column (variable) of X is a n-dimensional vector. From
this point of view X can be considered as a cloud of p points
in Rn .

The p-dimensional Point Cloud

X (n × p) data matrix, n individuals with observations ∈ Rp
subspace of dimension = 1

!#"%$

Projection of vector xi on F1 (direction u1 ) kpxi k = xi> kuu11 k = xi> u1


Representation of the individuals x1 , . . . , xn as a two-dimensional

point cloud.
Define “best line” by minimization of
n
X
kxi − pxi k2 (19)
i=1
a closer look:
pxi = xi> u1 ; kxi − pxi k2 = kxi k2 − kpxi k2 by Pythagoras’s
theorem equivalent to:
n
X
max kpxi k2
i=1
identification constraint ku1 k = 1, pxi = xi> u1 ⇒
max u1> (X > X )u1

Theorem
The vector u1 which minimizes (19) is the eigenvector of X > X
associated with largest eigenvalue λ1 of X > X .
If the data are centered x = 0, then n1 X > X is covariance matrix.

By the Theorem, first factor is first eigenvector of covariance
matrix!

Subspaces of Dimension 2
second factor direction = second eigenvector!
zj = X uj factorial variable
uj = j-th eigenvector, j = 1, 2

Summary: The p-dim Point Cloud
The p-dimensional point cloud of individuals can be

graphically represented by projecting on spaces of smaller
dimension.
The first factor direction is u1 and defines a line F1 through
the origin. This line is found by minimizing the orthogonal
distances.
The factor u1 equals the eigenvector of X > X corresponding
to its largest eigenvalue. The coordinates for representing the
point cloud on a straight line are z1 = X u1 .

Summary: The p-dim Point Cloud

The second factor direction is u2 , where u2 denotes the
eigenvector of X > X corresponding to its second largest
eigenvalue.
The coordinates for representing the point cloud on a plane
are given by z1 = X u1 , z2 = X u2 .
The factor directions 1, . . . , q are u1 , . . . , uq , which denote the
eigenvectors of X > X corresponding to the q largest
eigenvalues.
The coordinates for representing the point cloud of individuals
on a q-dimensional subspace are given by
z1 = X u1 , . . . , zq = X uq .
Fitting the n-dimensional Point Cloud
Algebraically the same problem as with the p-dimensional case

We replace X by X > !
i.e. max v1> (X X > )v1 for first factor. v1 is first eigenvector of
X X >!

Representation of the variables x[1] , . . . , x[p] as a two-dimensional

point cloud.
Summary: The n-dim Point Cloud
The n-dimensional point cloud of variables can be graphically

represented by projecting on spaces of smaller dimension.
The first factor direction is v1 and defines a line G1 through
the origin. The factor v1 equals the eigenvector of X X >
corresponding to its largest eigenvalue.
The coordinates for representing the point cloud on a straight
line are w1 = X > v1 .

Summary: The n-dim Point Cloud

The second factor direction is v2 , where v2 denotes the
eigenvector of X X > corresponding to its second largest
eigenvalue.
The coordinates for representing the point cloud on a plane
are given by w1 = X > v1 , w2 = X > v2 .
The factor directions 1, . . . , q are v1 , . . . , vq , which denote the
eigenvectors of X X > corresponding to the q largest
eigenvalues.
The coordinates for representing the point cloud of variables
on a q-dimensional subspace are given by
w1 = X > v1 , . . . , wq = X > vq .
Relations Between Fitting Subspaces
Consider the eigenvector equations in Rn : (X X > )vk = µk vk

For k ≤ r ,where r = rank(X X > ) = rank(X ) ≤ min(p, n)
X > (X X > )vk = µ k X > vk

or (X > X )(X > vk ) = µk (X > vk )
⇒ uk = ck X > vk
Now consider the eigenvector equation in Rp : (X > X )uk = λk uk
⇒ vk = dk X > uk

Theorem (Duality Relations)

Let r be the rank of X . For k ≤ r , the eigenvalues λk of X > X
and X X > are the same and the eigenvectors (uk and vk ,
respectively) are related by
1
uk = √ X > vk
λk
1
vk = √ X uk .
λk

Summary:Relations Between Subspaces
The matrices X > X and X X > have the same non-zero

eigenvalues λ1 , . . . , λr , r = rank(X ).
The eigenvectors of X > X and X X > can be calculated from
each other by
1 1
uk = √ X > vk and vk = √ X uk .
λk λk
The coordinates for representing the individuals (rows) of X
on a q-dimensional subspace can be easier calculated by
√
wk = λk uk .

Practical Computation
Compute the eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λp and the

corresponding eigenvectors u1 , . . . , up of X > X .
Representation of the n individuals by plotting
z1 = X u1 versus z2 = X u2
√ √
w1 = λ1 u1 versus w2 = λ2 u2
duality relations!
λ1 + λ2 + . . . + λq
τq =
λ1 + λ2 + . . . + λp

238
Decomposition 9 Decomposition of Data Matrices by Factors
of Data Matrices by Factors 9-19
Food Families
0.5 1
fruit
poult ca4 em2
meat ca3
0 ca2
ma2
W[.2]
veget em3
Z[.2]
ca5
0
ma3
em4
-0.5 em5
ma4
wine
milk
bread
-1 ma5
-1
-1 -0.5 0 0.5 -2 0 2
W[.1] Z[.1]
bread -0.49873 -0.84162 ma2 1.12853 0.14353

veget -0.96975 -0.13310 em2 0.74582 0.70752
fruit -0.92913 0.27791 ca2 0.04654 0.28641
meat -0.96210 0.19107 ma3 0.80563 -0.12763
poult -0.91125 0.26590 em3 0.66887 0.06423
milk -0.58434 -0.70690 ca3 -0.66886 0.53487
wine 0.42820 -0.64815 ma4 0.36826 -0.54151
em4 0.09955 -0.24969
ca4 -0.63184 0.68533
ma5 -0.08726 -1.09621
em5 -0.77026 -0.44656
ca5 -1.70498 0.03971
Figure: Representation of offood

Figure 9.6: Representation expenditures
food expenditures and
and family family
types types in two
in two dimensions.
MVAdecofood
dimensions MVAdecofood
a “zero” consumer). So it makes sense to compare the consumption of any family to that
of an “average family” rather than to the origin. Therefore, the data is first centered (the
origin is translatedStatistical Analysis
to the center of gravity, x). Furthermore, since the dispersions of the 7
variables are quite different each variable is standardized so that each has the same weight
in the analysis (mean 0 and variance 1). Finally, for convenience, we divide each element
Summary: Practical Computation
The practical implementation consists of computing the

eigenvalues λ1 , . . . , λp and the eigenvectors u1 , . . . , up of
X >X .
The representation of the n individuals is obtained by plotting
z1 = X u1 vs. z2 = X u2 (eventually vs. z3 = X u3 ).
The representation of the p variables is obtained by plotting
√ √ √
w1 = λ1 u1 vs. w2 = λ2 u2 (eventually vs. w3 = λ3 u3 ).
The quality of the factorial representation can be appreciated
by τq which is the percentage of inertia explained by the first
q factors.
Principal Components Analysis 10-1
Principal Components Analysis
Objective:
Reduce the dimension of a p-variate random variable X
through linear combinations.
These linear combinations should create the largest spread
among the values of X, i.e. we are looking for the linear
combinations with the largest variances.

Principal Components Analysis
Standardised Linear Combination (SLC)

Consider the standardised linear combination (SLC), a weighted
average of X:
Pp Pp
δ>X = j=1 δj Xj ||δ|| = 1, 2
j=1 δj =1
%
standardised
δ = (δ1 , ..., δp )> is the weighting vector, determines the direction

of the SLC line.

Choose the SLC by finding the δ which maximizes the variance of

the projections δ > X :
max Var(δ > X ) = max δ > Var(X )δ.

{δ:kδk=1} {δ:kδk=1}
Solution via Theorem 2: the maximizing vector (the best direction)

is the eigenvector γ1 corresponding to the largest eigenvalue λ1 of
the covariance matrix Σ = Var(X ).

Get:
first PC: Y1 = γ1> X
second PC: Y2 = γ2> X
...and so on with γi ⊥γj ∀i 6= j.
In general:
The PC transformation of a random variable X with E(X ) = µ,
Var(X ) = Σ = ΓΛΓ> is:
Y = Γ> (X − µ)
Note: the variable X is centered to get a zero mean PC.

Direction in Data
5
−5
−3 −2 −1 0 1 2 3
Projection
1
−1
−3 −2 −1 0 1 2 3
Explained variance 0.50520

Total variance 1.96569
Explained percentage 0.25701
Figure: An arbitrary SLC MVApcasimu

Direction in Data
5
−5
−3 −2 −1 0 1 2 3
Projection
1
−1
−3 −2 −1 0 1 2 3
Explained variance 1.46049

Total variance 1.96569
Explained percentage 0.74299
Figure: The most interesting SLC MVApcasimu

Example
Bivariate normal distribution N(0, Σ), Σ = ρ1 ρ , ρ > 0.
1
Eigenvalues of this matrix are λ1 = 1 + ρ and λ2 = 1 − ρ with
corresponding eigenvectors

1 1 1 1
γ1 = √ , γ2 = √ .
2 1 2 −1
The PC transformation is thus
!
1 1 1
Y = Γ> (X − µ) = √ X
2 1 −1
or ! !
Y1 1 X1 + X2
= √ .
Y2 2 X1 − X2

The first principal component is

1
Y1 = √ (X1 + X2 )
2
and the second is
1
Y2 = √ (X1 − X2 ).
2

Compute the variances of these PCs:

1 1
Var(Y1 ) = Var √ (X1 + X2 ) = Var(X1 + X2 )
2 2
1
= {Var(X1 ) + Var(X2 ) + 2 Cov(X1 , X2 )}
2
1
= (1 + 1 + 2ρ) = 1 + ρ
2
= λ1 .
Similarly we find that: Var(Y2 ) = λ2 = 1 − ρ.

Theorem
Let X ∼ (µ, Σ) and let Y = Γ> (X − µ) be the PC transformation.
Then, for j=1,...,p:
E Yj = 0
Var(Yj ) = λj
Cov(Yi , Yj ) = 0, for i 6= j
Var(Y1 ) ≥ · · · ≥ Var(Yp ) ≥ 0
Pp
Var(Yj ) = tr(Σ)
Qpj=1
j=1 Var(Yj ) = |Σ|.

Theorem
There exists no SLC that has larger variance than λ1 = Var(Y1 ).
Theorem
If Y = a> X is a SLC that is uncorrelated with the first k PCs of
X , then Var(Y ) is maximized by a = γk+1 .

Summary: SLC
A standardised linear combination (SLC) is a weighted

P
average δ > X = pj=1 δj Xj where δ is a vector of length 1.
Maximizing the variance of δ > X leads to the choice δ = γ1 ,
the eigenvector corresponding to the largest eigenvalue λ1 of
Σ = Var(X ).
This is a projection of X into the one-dimensional space,
where the components of X are weighted by the elements of
γ1 .
Y1 = γ1> (X − µ) is called the first principal component (PC).

Summary: SLC
This projection can be be generalized to higher dimensions.

The PC transformation is the linear transformation
Y = Γ> (X − µ), where Σ = Var(X ) = Γ> ΛΓ and µ = E X .
Y1 , Y2 , . . . , Yp are called the first, second,. . . , p-th PCs.

Summary: SLC
The PCs have zero means, variance Var(Yj ) = λj , and zero

covariances. From λ1 ≥ . . . ≥ λp it follows that
P
Var(Y1 ) ≥ . . . ≥ Var(Yp ). It holds that pj=1 Var(Yj ) = tr(Σ)
Q
and pj=1 Var(Yj ) = |Σ|.
If Y = a> X is a SLC which is not correlated with the first k
PCs of X then the variance of Y is maximized by choosing a
to be the (k + 1)-st PC.

Principal Components in Practice
µ becomes x,
Σ changes to S = GLG >
Y = (X − 1n x > )G
SY = n−1 Y > HY
= n−1 G > (X − 1n x > )> H(X − 1n x > )G
= n−1 G > X > HX G = G > SG = L
L = diag(`1 , . . . , `p ) is the matrix of eigenvalues of S.

y1 = (X − 1n x > )g1
g1 = 1. Evec(S)
g2 = 2. Evec(S)
g3 = 3. Evec(S)
..
.
The PC technique is sensitive to scale changes, hence the PC

transformation should be applied to data that have approximately
the same scale in each variable.

Example
Assume the bank notes data set. The mean vector of X is:
x = (214.9, 130.1, 129.9, 9.4, 10.6, 140.5)> ,
the vector of eigenvalues of S is:
` = (2.985, 0.931, 0.242, 0.194, 0.085, 0.035)> .

Eigenvectors gj are given by the columns of G:

 
−0.044 0.011 0.326 0.562 −0.753 0.098
 
 0.112 0.071 0.259 0.455 0.347 −0.767 
 
 0.139 0.066 0.345 0.415 0.535 0.632 
 
G= 
 0.768 −0.563 0.218 −0.186 −0.100 −0.022 
 
 0.202 0.659 0.557 −0.451 −0.102 −0.035 
 
−0.579 −0.489 0.592 −0.258 0.085 −0.046
The first column of G is the first eigenvector. Gives the weights

used for the original data in the first PC.
Figure 24 shows the PC plots, genuine bank notes are marked by
”o”, counterfeit by ”+”.

First vs. Second PC Second vs. Third PC
!241.5
!44
! !
! !
!! ! ! ! !
!46
! !! !
! !! !
!240.0
! ! !!
! ! !!
! !! !!
!! ! ! ! !!
PC2
PC3
! !!!!! ! ! !!
! ! !!! !! ! ! ! !!
! !!!!! !!!
! !! !!!!
! ! !!!! !
!
! !! !! !!! !!!
!!!!! ! ! !!!!!!! ! ! ! !
! !!!!! ! ! ! ! ! ! ! ! !
!!!! !! !
!! ! ! !
!48
!
! !!! ! !! ! !! !!
!
!!! ! !
!
! ! !! ! !!!!! ! ! !
! ! ! ! !!
! !
! ! ! !
! !
! !
!238.5
! !
!50
!
!51 !49 !47 !50 !48 !46 !44

PC1 PC2
First vs. Third PC Eigenvalues of S

!241.5
3.0
!
! !
2.0
! !
! !
Lambda
!
!240.0
!!!!! !
PC2
!! !
!! ! !! !!
!
!!!!! ! !!!!!!
! ! !! ! !! !
!
!! ! !
!
!! !!!
!
! !!
1.0
!
!! !
! !!
! !! ! !! !
!!! ! ! !
! ! ! !
! ! ! !
!!!!
! !
! !
!238.5
! ! ! !
0.0
! !
!51 !49 !47 1 2 3 4 5 6

PC1 Index
Figure: Principal components of the bank data MVApcabank

Example (Scaling)
Rescaling of variables: X1 , X2 , X3 , and X6 are now measured in cm,
X4 and X5 remain in mm.
This leads to:
x = (21.49, 13.01, 12.99, 9.4, 10.6, 14.05)> ,
and
` = (2.101, 0.623, 0.005, 0.002, 0.001, 0.0004)> .
The result clearly differs from the preceding example (see Figure
25): 1. PC is domintaed by X4 , 2. PC by X5 . The other variables
have much less weight.

First vs. Second PC Second vs. Third PC
!13.9
!
!6
!
!7
!
! !!!
! ! ! !
!
! !! !
!14.1
!
!8
! ! !!!! !! !
PC2
PC3
!
! ! ! !! !! !
!! ! !
!! !! !! !!! ! !! !
! !!! ! !! !!! !
! !!!! ! ! ! !! ! ! !! !
!! !!
! !
! ! !!
!! ! ! !! !
!10 !9
! ! !!! ! !! ! !
!
!!
! !! !!! ! !! !!!!!! ! !
!!
!! !!!!!! ! ! ! ! !
!!! !
! ! !! !
!
! ! ! ! !!! !!! !
!! ! ! !
! !! !
! ! !
! ! !!! !
!
! ! !
!
!14.3
!
! !
8 9 10 11 12 13 !10 !9 !8 !7 !6
PC1 PC2
First vs. Third PC Eigenvalues of S

!13.9
2.0
!
Lambda
!
!14.1
!
PC2
! !! !!
! ! !! ! ! !
! !
1.0
! !!!!
! !!!!! !
! !
! !!!!!! !!
!! !!!! !
!!
! !!
! !! !!!!
!!
!!
! !
!! !
! ! ! !!!
!
! ! !! !!!
! ! !
!! ! !
! !
!
!14.3
0.0
! ! ! ! !
8 9 10 11 12 13 1 2 3 4 5 6
PC1 Index
Figure: Principal components of the rescaled bank data

MVApcabankr

Summary: PC’s in Practice
The scale of the variables should be roughly the same for PCA.
For the practical implementation of principal components
analysis (PCA) we replace µ by the mean x and Σ by the
empirical covariance S. Then we compute the eigenvalues
`1 , . . . , `p and the eigenvectors g1 , . . . , gp of S.
The graphical representation of the PCs is obtained by
plotting the first PC vs. the second (and eventually vs. the
third).
The components of the the eigenvectors gi are the weights of
the original variables in the PCs.
Interpretation of the PCs
Variance explained by the first q PCs

P
q
Var(Yj )
λ1 + · · · + λq j=1
ψ = = p
P
p P
λj Var(Yj )
j=1 j=1
P
q
Var(Yj )
j=1
=
tr(Σ)

Covariance between the PC vector Y and the original vector X :
Cov(X , Y ) = E(XY > ) − E X E Y >

= E(XX > Γ) − µµ> Γ = Var(X )Γ
= ΣΓ
= ΓΛΓ> Γ
= ΓΛ
correlation between variable Xi and the PC Yj :

1/2
λj
ρXi Yj = γij .
σXi Xi

Example
bank data:
eigenvalue proportion cumulated

of variance proportion
2.985 0.67 0.67

0.931 0.21 0.88
0.242 0.05 0.93
0.194 0.04 0.97
0.085 0.02 0.99
0.035 0.01 1.00
MVApcabanki
Swiss Bank Notes
0.8
!
0.6
Variance Explained
0.4
0.2
!
!
!
0.0
1 2 3 4 5 6
Index
Figure: Relative proportion of variance explained by PCs

MVApcabanki

λ1 + · · · + λq
ψ1 =
P
p
λj
j=1
Plotting procedure:
1. Compute the covariance matrix
2. Compute the eigenvalues
3. Standardize the eigenvalues by the sum of eigenvalues
4. Plot the proportions on the y –axes

Swiss Bank Notes
1.0
X5
0.5
Second PC X2X3
0.0 X1
X6 X4
!0.5
!1.0
!1.0 !0.5 0.0 0.5 1.0
First PC
Figure: The correlation of the original variable with the PCs

MVApcabanki

Summary: Interpretation of PCs
The weighting of the PCs tells us in which directions,

expressed in original coordinates, we have the best variance
explanation. Note that the PCA is not scale invariant.
A measure of how well the first q PCs explain variation is
P P
given by the relative proportion ψq = qj=1 λj / pj=1 λj .
A good graphical representation of the ability of the PCs to
explain the variation in the data is the scree plot of these
proportions.

Summary: Interpretation of PCs
The correlation between a PC Yj and an original variable Xi is

1/2
λj
ρXi Yj = γij σX X .
i j
`j g 2
For the data matrix this translates to rX2i Yj = sX Xij and rX2i Yj ,
i j
which can be considered as the proportion of variance of Xi
explained by Yj .
∗ A plot of rXi Y1 vs. rXi Y2 shows which of the original variables
are most correlated with the PCs, namely those which are
near the periphery of the circle of radius 1.

Asymptotic Properties of the PCs
Theorem
Let Σ > 0 with distinct eigenvalues, and let U ∼ m−1 Wp (Σ, m)
with spectral decompositions Σ = ΓΛΓ> , and U = GLG > . Then
√ L
(a) m(` − λ) −→ Np (0, 2Λ2 ),
√ L P
(b) m(gj − γj ) −→ Np (0, Vj ), with Vj = λj k6=j (λ λ−λ
k
γ γ>,
)2 k k k j
λj λk γrk γsj
(c) Cov(gj , gk ) = Vjk , (r , s)-element of Vjk is: [m(λj −λk )2 ]
,
(d) elements in ` asymptotically independent of elements in G

Example
Let X1 , . . . , Xn be a sample drawn from N(µ, Σ),
nS ∼ Wp (Σ, n − 1) then, from previous theorem:
√ L
n − 1(`j − λj ) −→ N(0, 2λ2j ), j = 1, . . . , p .
Since variance 2λ2j is unknown, use log transformation and

Transformation Theorem 224:
r
n−1 L
(log `j − log λj ) −→ N(0, 1)
2
Then, a two–sided confidence interval is given by:
r r
2 2
log(`j ) − 1.96 ≤ log λj ≤ log(`j ) + 1.96
n−1 n−1
With the Bank data: n = 200, `1 = 2.98.

Therefore:
r
2
log(2.98) ± 1.96 = log(2.98) ± 0.1965
200 − 1
Confidence interval: P{λ1 ∈ (2.448, 3.62)} ≈ 0.95

Variance explained by first q PCs
λ1 + · · · + λq
ψ= ·
P
p
λj
j=1
`1 + · · · + `q
ψb = ·
Pp
`j
j=1

From theorem and non–linearity of ψ in λ, apply the

Transformation Theorem 4.11 and get:
√ L
n − 1(ψb − ψ) −→ N(0, D> VD)
V = 2Λ2
D = (d1 , · · · , dp )>
( 1−ψ
∂ψ tr(Σ) for 1 ≤ j ≤ q,
dj = = −ψ
∂λj tr(Σ) for q + 1 ≤ j ≤ p.

Theorem
√ L
n − 1(ψb − ψ) −→ N(0, ω 2 ),
2
ω 2 = D> VD = 2
(1 − ψ)2 (λ21 + · · · + λ2q )
{tr(Σ)}

+ ψ 2 (λ2q+1 + · · · + λ2p )
2 tr(Σ2 ) 2
= (ψ − 2βψ + β)
{tr(Σ)}2
λ21 + · · · + λ2q
β= 2 .
λ1 + · · · + λ2p
Remark: use tr(Λ) = tr(Σ) and tr(Λ2 ) = tr(Σ2 ) to simplify

the calculation!
Example
The first PC for the swiss bank notes resolves 67% of the variation.
Test whether the true proportion could be 75%.
The confidence interval at a 1-α=0.95 significance level is given by:
s
b2
ω
0.668 ± 1.96 ,
n−1
b2:
to get ω
`21
βb = = 0.902.
`21 + · · · + `2p
tr(S) = 4.472
Xp
tr(S 2 ) = `2j = 9.883
j=1

2 tr(S 2 ) b2
b2 =
ω (ψ − 2βbψb + β)
b = 0.142
tr(S)2
So: r
0.142
0.668 ± 1.96 = (0.615, 0.720).
199
The hypothesis that ψ = 75% would thus be rejected!

Summary:Asymptotic Properties of PCs
The eigenvalues `j and eigenvectors gj are asymptotically

normal distributed, in particular
√ L
n − 1(` − λ) −→ Np (0, 2Λ2 ).
q
L
For the eigenvalues it holds n−1 2 (log `j − log λj ) −→ N(0, 1).

Summary:Asymptotic Properties of PCs
The asymptotic normal distribution allows to construct

confidence intervals and tests for the proportion of variance
which is explained by the first q PCs.
It holds for the estimate ψb of ψ that
√ L
n − 1(ψb − ψ) −→ N(0, ω 2 ) .

Normalized Principal Components Analysis
The PCA depends on the scale of Xj .

Standardisation of the components Xj , X data matrix:
Correction for the mean:
XC = HX
centered data matrix (recall H = In − n−1 1n 1>

n)

Correction for scale:
XS = HX D−1/2 ,
D = diag(sX1 X1 , . . . , sXp Xp ) , x S = 0 , SXS = R–correlation matrix
PCA of XS is called NPCA
R = GR LR GR > ,
LR = diag(`R R
1 , . . . , `p ).
The NPC’s are:
Z = XS GR = (z1 , . . . , zp ) .

Note that the NPC’s fulfill:
z = 0,
SZ = GR > SXS GR = GR > RGR = LR .
Covariance and correlation

1 >
SXS ,Z = X Z = GR LR
n S
RXS ,Z = GR LR LR −1/2 = GR LR 1/2
p
rXi Zj = rXsi Zj = lj gR,ij
p
X
rX2i Zj = 1.
j=1

French Food Data
The data set consists of the average expenditures on food for

several different types of families in France (manual workers =
MA, employees = EM, managers = CA) with different numbers of
children (2,3,4 or 5 children). The data is taken from Lebart,
Morineau and Fénelon (1982).

French Food data

MA5
0.8
0.6
Second Factor ! Families
0.4
MA4
EM5
0.2 EM4
MA3
0.0
CA5 EM3
MA2
!0.2
CA2
!0.4
CA3
CA4 EM2
!1.0 !0.5 0.0 0.5
First Factor ! Families
Figure: Representation of the individuals MVAnpcafood

French Food data
1.0
bread
milk
wine
0.5
Second Factor ! Goods
0.0 vegetables
meat
poultry
fruits
!0.5
!1.0
!1.0 !0.5 0.0 0.5 1.0
First Factor ! Goods
Figure: Representation of the variables MVAnpcafood

Summary: NPC’s Analysis
NPCA is PCA applied to the standardized (normalized) data

matrix XS .
The graphical representation provides the same kind of picture
as in PCA but here in the relative position of individuals, each
variable has the same weight (in the PCA, the variable with
the largest variance has the largest weight).
The quality of the representation is appreciated through
`1 + `2 + . . . + `q
ψ= .
Pp
`j
j=1
Principal Components as a Factorial Method
The empirical PCs (normalized or not) turn out to be equivalent to

the factors that one would obtain by decomposing the appropriate
data matrix into its factors
the PCs are the factors representing the rows of the centered
data matrix;
the NPCs correspond to the factors of the standardized data
matrix.

Obtain representations of the individuals (the rows of X ) and of

the variables (the columns of X ) in spaces of smaller dimension.
Work with
XC = HX .
The spectral decomposition of XC> XC is related to that of SX :
XC> XC = X > H> HX = nSX = nGLG > .
The factorial variables are obtained by projecting XC on G,
Y = XC G = (y1 , . . . , yp ).

Since HXC = XC , it immediately follows that
y = 0,
SY = G > SX G = L = diag(`1 , . . . , `p ).
The scatterplot of the individuals on the factorial axes are thus

centered around the origin and are more spread out in the first
direction (first PC has variance `1 ) than in the second direction
(second PC has variance `2 ).

Duality Relations
The projections of the columns of XC onto the eigenvectors vk of

XC XC> are
1 p
XC> vk = √ XC> XC gk = n`k gk .
n`k
Projections on the first p axes are the columns of
√
XC> V = nGL1/2 .

Geometric Representation
Observe
xC>[j] xC [k] = nsXj Xk ,

||xC [j] ||2 = nsXj Xj ,
where xC [j] and xC [k] denote the j-th and k-th column of XC . If
θjk is the angle between xC [j] and xC [k] , then
xC>[j] xC [k]
cos θjk = = rXj Xk
kxC [j] k kxC [k] k

Quality of the Representations
Quality of the representation is given by

`1 + `2 + . . . + `q
ψ= .
Pp
`j
j=1

it is better to compute the angle ϑik between the representation of

an individual i and the k-th PC or NPC axis.
yi> ek yik
cos ϑik = =
kyi kkek k kxCi k
for the PCs or analogously
zi> ek zik
cos ζik = =
kzi kkek k kxSi k
for the NPCs, where ek denotes the k-th unit vector

ek = (0, . . . , 1, . . . , 0)> .

An individual i will be represented on the k-th PC axis if its

corresponding angle is small, i.e., if cos2 ϑik for k = 1, . . . , p is
close to one. Note that for each individual i,
p
X > GG > x
yi> yi xCi Ci
cos2 ϑik = >x
= >x
=1
k=1
xCi Ci x Ci Ci
The values cos2 ϑik are sometimes called the relative contributions
of the k-th axis to the representation of the i-th individual

Summary: PC as a Factorial Method
NPCs are PCs applied to the standardized (normalized) data

matrix XS .
The graphical representation of NPCs provides a similar type
of picture as that of PCs, the difference being in the relative
position of individuals
The quality of the representation of a variable can be
evaluated by the percentage of Xi ’s variance that is explained
by a PC, i.e., rX2i Yj .

Common Principal Components (CPC)
Joint dimension reduction technique

Estimation of PCs simultaneously in different groups
Identical space spanned by eigenvectors
Flury (1988)
HCPC : Σi = ΓΛi Γ> , i = 1, ..., k
Σi population covariance matrix for group i
Γ = (γ1 , ..., γp ) transformation matrix
Λi = diag(λi1 , ..., λip ) eigenvalue matrix

Example
CPC analysis for the implied volatility surfaces of the DAX index in
1999. Day-by-day surface smoothing.
Three groups (maturities in months): τ = 1, τ = 2 and τ = 3
Moneyness range: 0.85 − 1.10

PCP for CPCA, 3 eigenvectors
1.0
0.5
loading
0.0
!0.5
!1.0
1 2 3 4 5 6
moneyness
Figure: Factor loadings of the first (thick), the second (medium), and the
third (thin) PC MVAcpcaiv

Factor Analysis 11-1
Factor Analysis
The essential purpose is to describe the covariance

relationships among many variables in terms of a few
(unobservable) factors.
It can be considered an extension of principal component
analysis . . . the approximation based on the factor analysis
model is more elaborate.

The Factor Analysis Model
Observe x = (x1 , . . . , xp )> ∈ Rp and represent it by k ”factors”:
P
k
xj = qj` f` + µj
`=1
↑
factors
aim at k small!
The random variables fj are ”unobserved underlying factors”.
Usually, factors cannot be uniquely determined. The choice of the
factors depends on the situation.
Example
In matrix form the factor model can be rewritten:
X = QF + µ
Let Var (X ) = Σ, E [F ] = 0 and Var (F ) = Ik

Suppose that only the first k eigenvalues are positive, i.e.,
λk+1 = . . . = λp = 0. Then the (singular) covariance matrix is:
k
!
X Λ1 0 Γ>
Σ= λ` γ` γ`> = (Γ1 Γ2 ) 1
.
0 0 Γ>
2
`=1

PCs are:
Y = Γ> (X − µ) ⇒ X − µ = ΓY = Γ1 Y1 + Γ2 Y2 ,
where
! !
Y1 Γ>
1
Y = = (X − µ),
Y2 Γ>
2
with
! !!
Γ>
1 Λ1 0
(X − µ) ∼ 0, .
Γ>
2 0 0
In other words, Y2 has a singular distribution with mean and

covariance matrix equal to zero.
Therefore
1/2 −1/2
X = Γ1 Λ1 Λ1 Y1 + µ.
Defining
1/2
Q = Γ 1 Λ1
−1/2
F = Λ1 Y1 ,
we obtain the factor model.

Factor Analysis model
We split the effects into common and specific ones.
X = QF + U + µ
Q = (p × k) loadings
F = (k × 1) common factors
U = (p × 1) specific factors
The object of factor analysis is to find the loadings Q and the
variance Ψ of the specific factors U. The estimates are deduced
from the covariance structure of X .
Assumptions:
E [F ] = 0
Var(F ) = Ik
E [U] = 0
Var(U) = Ψ = diag(ψ11 , . . . , ψpp )
Cov(Ui , Uj ) = 0, i 6= j
Cov(F , U) = 0
P
k
2 +ψ
σXj Xj = Var(Xj ) = qj` jj
`=1

Define
P
k
hj2 = 2
qj` communality
`=1
ψjj specific variance
Notice that Var Xj = hj2 + ψjj , i.e., the communality is the part of
variance of Xj explained by the factors. The specific variance is the
unexplained part. The goal of FA is to explain as much as possible.
Decomposition of covariance
All we know about the factors, factors loadings, and specific
variances is that
Σ = QQ> + Ψ

Note that the covariance matrix of this model can be written as
Var (X ) = Σ = E(X − µ)(X − µ)> − E(X − µ) E(X − µ)>

= E{(QF + U)(QF + U)> }
= E(QFF > Q> ) + E(QFU > ) + E(UF > Q> ) + E(UU > )
= Q E(FF > )Q> + E(UU > )
= Q Var (F )Q> + Var (U) = QQ> + Ψ

Interpretation of the Factors
The following covariance between X and F is obtained via
ΣXF = E{(QF + U)F > } = Q.
The correlation is
PXF = D −1/2 Q, (20)
where D = diag(σX1 X1 , . . . , σXp Xp ). It is possible to consider which

of the original variables X1 , . . . , Xp play a role in the unobserved
common factors F1 , . . . , Fk .

Invariance of Scale
Assume that we have the following FA model for X :
Var X = QX Q>X + ΨX .
What happens if we change the scale of X ?
Y = CX , C = diag(c1 , . . . , cp )
Var(Y ) = CΣC >

= CQX Q> >
X C + CΨX C
>
Hence the k-factor model is also true for Y with

QY = CQX
ΨY = CΨX C > .

Non-Uniqueness of Factor Loadings
The factors loadings are not unique! For orthogonal G:
X = (QG)(G > F ) + U + µ.
This is a k-factor model with factor loadings QG and common
factors G > F . In practical analysis, we choose the rotation G which
gives ”desireable” interpretation.
For the purpose of evaluation, the non-uniqueness can be solved by
imposing additional constraints, e.g.
Q> Ψ−1 Q is diagonal.

Number of parameters in the model

There are pk + p parameters
( pk parameters from Q and p parameters from Ψ).
There are 12 {(k(k − 1)} constraints
(e.g. Q> Ψ−1 Q is diagonal):
d = # pars for Σ unconstrained

− # pars for Σ constrained
1
= 2 p(p + 1) − (pk + p − 12 k(k − 1))
= 12 (p − k)2 − 12 (p + k).
d < 0 infinity of exact solutions
d > 0 look for approximate solutions
Example
p = 3, k = 1 ⇒ d = 0
 
q12 + ψ11
 
Σ =  q1 q2 q22 + ψ22 
q1 q3 q2 q3 2
q3 + ψ33
d = 0 yields only a unique numerical solution! It need not be

consistent with statistical thinking

Example 
  
q1 ψ11 0 0
   
Where Q =  q2  and Ψ =  0 ψ22 0 .
q3 0 0 ψ33
We have
σ12 σ13 2 σ12 σ23 2 σ13 σ23
q12 = ; q2 = ; q3 =
σ23 σ13 σ12
and
ψ11 = σ11 − q12 ; ψ22 = σ22 − q22 ; ψ33 = σ33 − q32 .
In this particular case (k = 1), the only rotation is defined by

G = −1, so the other solution for the loadings is provided by −Q.

Example
Suppose now p = 2 and k = 1, then d < 0.
! !
1 q12 + Ψ1
Σ= =
ρ 1 q1 q2 q22 + Ψ2
We have an infinity of solutions: for any α(ρ < α < 1) a solution

is provided by:
q1 = α; q2 = ρ/α; Ψ1 = 1 − α2 ; Ψ2 = 1 − (ρ/α)2

Summary: Factor Analysis Model

The FA model aims to describe the dependencies between the
p variables in a data set by a lower number k < p of latent
factors, i.e. it assumes X = QF + U + µ.
The random vector F (k-dimensional) contains the common
factors, U (p-dimensional) the specific factors, Q(p × k) the
loadings matrix.
It is supposed that F and U are uncorrelated and have mean
zero and uncorrelated components, i.e., F ∼ (0, I),
U ∼ (0, Ψ) with a diagonal Ψ, Cov(F , U) = 0.
This leads to the covariance structure Σ = QQ> + Ψ.

Summary: Factor Analysis Model

The interpretation of the factor F is obtained through the
correlation PXF = D −1/2 Q.
A normalized analysis is obtained by the model
P = QQ> + Ψ The interpretation of the factors is given
directly by the loadings Q : PXF = Q.
The factor analysis model is scale invariant. The loadings are
not unique (only up to multiplication by an orthogonal
matrix).
The non-uniqueness of the model is determined through the
degrees of freedom d = 1/2(p − k)2 − 1/2(p + k)

Estimation of the Factor Model

Decomposition of covariance matrix
S = QbQ
b> + Ψ b
bQ
Q b> common factors
b
Ψ specific factors
b of Q:
Given an estimate Q
k
X
cjj = sX X −
ψ j j
qc
2
j` ,
`=1
P
k
hbj2 = qc
2
j` is an estimate for the communality. The problem can
`=1
be solved exactly in the ideal case of d = 0.
Factor Model for the Correlation Matrix
It is often easier to make the calculations for the standardized

model.
Define
Y = HX D−1/2
↑
centering matrix
Find a decomposition of the form
R = Qd d > d
Y QY + ΨY
bY = D−1/2 Q
Q bX and Ψb Y = D−1 Ψ
bX.

Example
Data set consists of the averaged marks (from 1 low to 7 high) for
31 car types. We consider price, security and easy handling.
 
1 0.975 0.613
 
R= 1 0.620  .
1
We look for one factor, i.e. k = 1. (# number of parameters of Σ

unconstrained – # parameters of Σ constrained) equals here
1 2 1 1 2 1
2 (p − k) − 2 (p + k) = 2 (3 − 1) − 2 (3 + 1) = 0.
So there is an exact solution!

Example
The equation
   
1 rX1 X2 rX1 X2 qb12 + ψb11 qb1 qb2 qb1 qb3
   
 1 rX1 X3  = R =  qb22 + ψb22 qb2 qb3 
1 qb32 + ψb33
b2 = qb2
yields the communalities h i i
rX1 X2 rX1 X3 rX1 X2 rX2 X3 rX1 X3 rX2 X3

qb12 = qb22 = qb32 = .
rX2 X3 rX1 X3 rX1 X2

Example
Together with ψb11 = 1 − qb12 , ψb22 = 1 − qb22 and ψb33 = 1 − qb32 we
get the solution
qb1 = 0.982 qb2 = 0.993 qb3 = 0.624

ψb11 = 0.035 ψb22 = 0.014 ψb33 = 0.610.
Since the first two communalities are close to one we can conclude
that the first two variables, namely price and security, are explained
by the factor very well.
This factor might be interpreted as a “price+security” factor.

The Maximum Likelihood Method
Log-likelihood function ` for a data matrix X of observations for

X ∼ Np (µ, Σ):
n
n 1X
`(X ; µ, Σ) = − log | 2πΣ | − (xi − µ)Σ−1 (xi − µ)>
2 2
i=1
n n n
= − log | 2πΣ | − tr(Σ−1 S) − (x − µ)Σ−1 (x − µ)> .
2 2 2
b = x:
Evaluated at its maximum µ
n
b, Σ) = −
`(X ; µ log(| 2πΣ |) − tr(Σ−1 S) .
2

By substituting Σ = QQ> + Ψ
nh i
b, Q, Ψ) = − log{| 2π(QQ> + Ψ) |} − tr{(QQ> + Ψ)−1 S} .
`(X ; µ
2
Require that Q> Ψ−1 Q is diagonal matrix.
The maximum likelihood estimates of Q and Ψ are obtained using
an iterative numerical algorithm.

Large Sample Test for the Number of

Common Factors
Test
H0 : Σ = QQ> + Ψ
H1 : Σ arbitrary (positive definite) matrix
The likelihood ratio statistic is

maximized likelihood under H0
−2Λ = −2 log
maximized likelihood
!−n/2
|Q̂Q̂> + Ψ̂|
= −2 log
|Sn |
and with some multiplier it is χ21 [(p−k)2 −p−k] distributed.
2

The Principal Component Method
Decompose Var(X ) = S = ΓΛΓ> .

Retain the first k eigenvectors to build
p p
Q̂ = [ λ1 γ1 , . . . , λk γk ].
Omitting p − k eigenvectors should not cause big error if the

corresponding eigenvalues λi , i = k + 1, . . . , p are small.
Specific variances are estimated by diagonal elements of
b = S − Q̂Q̂> .
Ψ

Error of Approximation
Residual matrix S − (Q̂Q̂> + Ψ̂)

[ diag is 0 but off-diag not]
Analytically:
Xh i2
S − (Q̂Q̂> + Ψ̂) ≤ `ˆ2k+1 + . . . + `ˆ2p
i,j
i,j
gives an estimate of error of the approximation.

This gives simple criterion for the choice of number of factors.

Example
This example uses a consumer-preference study from Johnson and
Wichern (1998). Customers were asked to rate several attributes
of a new product. The responses were tabulated and the following
correlation matrix R was constructed:
Attribute (Variable)
 
Taste 1 1.00 0.02 0.96 0.42 0.01
 
Good buy for money 2  0.02 1.00 0.13 0.71 0.85 
 
Flavor 3  0.96 0.13 1.00 0.50 0.11 
 
 
Suitable for snack 4  0.42 0.71 0.50 1.00 0.79 
Provides lots of energy 5 0.01 0.85 0.11 0.79 1.00

Example
λ1 = 2.85 and λ2 = 1.81 of R are the only eigenvalues greater
than one.
λ1 + λ2 2.85 + 1.81
= = 0.93
p 5
Estimated factor Specific

loadings Communalities variances
Variable q̂1 q̂2 ĥj2 ψ̂jj = 1 − ĥj2
1. Taste 0.56 0.82 0.98 0.02
2. Good buy for money 0.78 -0.53 0.88 0.12
3. Flavor 0.65 0.75 0.98 0.02
4. Suitable for snack 0.94 -0.11 0.89 0.11
5. Provides lots of energy 0.80 -0.54 0.93 0.07
Eigenvalues 2.85 1.81
Cumulative proportion of total (standardized) 0.571 0.932
sample variance
Table: Estimated factor loadings, communalities, and specific variances

Example
 
0.56 0.82
0.78 −0.53
  !
 
>
  0.56 0.78 0.65 0.94 0.80
Q̂ Q̂ + Ψ̂ =  0.65 0.75  +
  0.82 −0.53 0.75 −0.11 −0.54
0.94 −0.11
 
 
0.80 −0.54
   
0.02 0 0 0 0 1.00 0.01 0.97 0.44 0.00
0 0.12 0 0 0 0.01 1.00 0.11 0.79 0.91
   
   
   
+
 0 0 0.02 0 0 =
  0.97 0.11 1.00 0.53 0.11 .

0 0 0 0.11 0 0.44 0.79 0.53 1.00 0.81
   
   
0 0 0 0 0.07 0.00 0.91 0.11 0.81 1.00
The communalities (0.98, 0.88, 0.98, 0.89, 0.93) indicate that the
two factors account for a large percentage of the sample variance
of each variable.
Method of Principal Factors
We start with an estimate of the communality:

1) e2
h = the square of the multiple correlation
j
b with V = Xj
coefficient, i.e. ρ2 (V , W β)
W = (X` )`6=j
βb = OLS of regression of V on W
2) e2
h = max |rXj X` |
j
`6=j
R = correlation matrix.

Algorithm of Principal Factors Method

R = QQ> + Ψ; R − Ψ = QQ>
ψejj = 1 − h̃j2
Construct R−Ψ e
e Pp
R−Ψ = λ` γ` γ`>
`=1
√
qb` = λ` γ` , ` = 1, . . . k
b 1/2
Q = Γ1 Λ 1
Γ1 = (γ1 , . . . , γk )
Λ1 = diag(λ1 , . . . , λk )
P k
cjj
ψ = 1− qc2
j`
`=1

Car Marks Data
0.6
service
0.4
value
0.2
Second Factor
price
security
easy
0.0
economic
!0.6 !0.4 !0.2
sporty
look
!1.0 !0.5 0.0 0.5 1.0 1.5

First Factor
Figure: Loadings of the car marks, factor analysis with k = 2

MVAfactcarm

Rotation
The factors can be rotated without any loss of information.

Usually, we rotate the factors in a way which provides reasonable
interpretation.
In the most simple case of k = 2 factors a rotation matrix G is
given by !
cos θ sin θ
G(θ) =
− sin θ cos θ
which represents a clockwise rotation of the coordinate axes by the
angle θ (then Q̂∗ = Q̂G(θ)).

Rotation - Varimax method
The varimax method tries to find “reasonable rotation”

automatically.
The interpretation of the loadings would be very simple if the
variables split into disjoint sets, each of which is associated with
one factor.
The idea of the varimax method is to find the rotation which
maximizes the sum of the variances of the squared loadings q̂ij∗
within each column of Q̂∗ .

Example
Let us return to the marketing example of Johnson and Wichern
(1998). The basic factor loadings given in Table 16 of the first
factor and a second factor are almost identical making it difficult
to interpret the factors. Applying the varimax rotation we obtain
the loadings q̃1 = (0.02, 0.94, 0.13, 0.84, 0.97)> and
q̃2 = (0.99, −0.01, 0.98, 0.43, −0.02)> . The high loadings,
indicated as bold entries, show that variables 2, 4, 5 define
factor 1, a nutricional factor. Variable 1 and 3 define factor 2
which might be referred to as a taste factor.

Strategy for Factor Analysis
1. Perform a principal component factor analysis, look for

suspicious observations, try varimax rotation
2. Perform maximum likelihood factor analysis including varimax
rotation
3. Compare the factor analyses: do the loadings group in the
same manner?
4. Repeat the previous steps for other number of common factors
5. For large data sets, split them in half and perform a factor
analysis on each part. Compare the solutions.

Summary: Estimating the Factor Model
In practice Q and Ψ have to be estimated from

S=Q bQ b The number of parameters is
b> + Ψ.
d = 21 (p − k)2 − 12 (p + k).
If d = 0 then exists an exact solution. In practice
unfortunately d > 0.
The maximum-likelihood method supposes a normal
distribution for the data, a solution can be found by numerical
algorithms only.

Summary: Estimating the Factor Model
The method of principal factors is a two-stage method which

b from the reduced correlation matrix R − Ψ,
calculates Q e Ψ
e a
pre-estimate for Ψ. The final estimate for Ψ is found by
cii = 1 − Pk q
ψ c2
j=1 ij .
The principal component method is based on an

approximation Qb to Q.
A better interpretation of the factors can be found by rotating

the factors.

Factor Scores
Factor scores are the estimation of the unobserved factors Fl ,

l = 1, ..., k for each individual xi , i = 1, ..., n.
Scores are useful for interpretation and diagnostic analysis.
There are three methods to estimate the factor scores, here

we present the regression method.

Factor Scores Estimation: the Regression

Method
Consider the joint distribution of (X − µ) and F, and using the

regression analysis on the factor model X = QF + U + µ, the
resulting joint covariance matrix is:
!
X −µ QQ> + ψ Q
Var =
F Q> Ik
Note that QQ> + ψ = Σ.

The matrix has dimension (p + k) × (p + k).

The conditional distribution of F |X is multinormal (Theorem

5.1), with:
E(F |X = x) = Q> Σ−1 (X − µ)
Var(F |X = x) = Ik − Q> Σ−1 Q

In practice: replace the unknown Q, Σ and µ by corresponding

estimators, leading to the estimated individual factor scores:
fbi = Q
b> S −1 (xi − x̄)
Note: the original sample matrix S is a more robust estimator for

Σ than Q bQ b against incorrect determination of the number
b> + ψ,
of factors.

Use R instead of S if variables are standardized (also in the joint
covariance matrix):
fbi = Q
b> R−1 (zi )
Where
−1/2
zi = DS (xi − x̄)
DS = diag (s11 , ..., spp )
b is obtained with R
Q
If the factors are rotated by the orthogonal matrix G, the factors
scores have to be rotated accordingly:
fbi ∗ = G > fbi

ML − Factor Loadings ML − Factor Loadings

X14 X9
0.4
X6 X8
X2X10
0.4
X9X1
X5 X1
X7
X10 X11 X6
0.0
0.0
X3
q2
q3
X12
X2 X14
X11 X13 X3
X12
X8 X13 X5
−0.6
−0.4
X7
−0.5 0.0 0.5 −0.6 −0.2 0.2 0.6
q1 q2
ML − Factor Loadings
X9
0.4
X8
X2 X10
X6 X11 X1
0.0
q3
X14
X12 X13
X3
X5
−0.4
X7
−0.5 0.0 0.5
q1
Figure: Factor loadings - Maximum Likelihood estimation

MVAfacthous

ML − Factor Loadings − Rot ML − Factor Loadings − Rot

X14 X8
X6 X2
0.5
0.5
X12X2 X12 X6X14
q2
q3
X8
X9 X11 X9
X1
X7X5 X10
X13 X10
−0.5
X3
−0.5
X11
X3X1
X13 X7
X5
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
q1 q2
ML − Factor Loadings − Rot

X8
X2
0.5
X12 X6
X14
q3
X11 X9
X13 X10
−0.5
X3 X1
X7X5
−1.0 −0.5 0.0 0.5 1.0
q1
Figure: Factor loadings - Maximum Likelihood estimation with Varimax

rotation MVAfacthous

ML − Factor Scores ML − Factor Scores

371
877 371
877
0 1 2 3 4
0 1 2 3 4
370
876372
878
373
879 372
878
373
879 370
876
369
875 369
875
764 669
163
258668
162
167
673
164
670 764669
258 163
162
668
167
673
164
670
226
732
268
774263
769 263 774
769 268 226
732
234
740 234
740
187
693 229
735225768
731 262 158
664 158693
664
262
768 187 225 735
731 229
775
269
233
739 775
269 233
739
205
711
196
702204789
710 205
711
204
710196
702
283
281
787 733
227 259
765
265
771 259
765
265
771 281
787
733
227 283
789
284
790 605
99254687
760 408
914 605
99
284
790 408916
914
181
183
689 261
767 980
474871
365
410
916 181
687
183
689 261604
767 410 474 760
980 254
604
98 18098 257871
365
f2
f2
257
763 264
770 361
867
366
872 264
770 366
872 763 361
867
203
709 180
686 228766
232
738
734 260
267 161989
773 260
766 686
267
773 228
734 203
709
232
738
193
699
292
798191
697
282
788
305
811
280
786
780
274307 736
813
182
688
190
696 184
690
238
744
230 224 667
730 483
159
665 160 865
666
482
988 359
480
986
360
866
392
898
358
864
464
970363
869
367
873
423
929 368
874
376
882
160 690
666 161
667
182
688
159
665
184 274813
780 307730
224 280
786
368
874238
744
305
811
230
736
363
869
367
873
282
788
376
882
292
798 365
8
480
986
360
866
190
696
392
898
358
864 423
929
464
970
193 988
699
59697
191 483
989
482
547 53784
41511
509 100
606304
810
783
276
782
277
275
781
278 188
694 176
682 235
179741
685 221 672
223
729
727 166
169
675 473
979
481
987465
971
151
657 362
868
460
966
463
969 443
949
454
960927
421
357
863917
411
918
412 151
657 169
675166
672
100
606
509179
685
3 607 221
727
176
682 5
511
275
781 235
741
783
276
782
277
223
729 443
949
278
784 362
868
188
694
547
41
927
421
357
863
918
412917
411
454
960 304
810
460
966
463
969 473
979
465 987
971 481
510
4 189
695 218
724 101
607 170
676 488
994470
976 948
442
364
870 218
724
170
676 101 510
4884 948
442
364
870 189
695 470
976
488992
994
200
706
562
56
199
705 345
851
192
698 308
814
306
812 102
608
237
743 266674
772 168
165
671
133
639 486
992
484
990 152
658
154
660
977
978
471
472
466
972
487
993 462
968
469
975
467
973
461
967
468
974
433
939453
959
983 950
444
953
447
394
452
958
477 900920
414
409
915377
883
378
884
928
391
897
422 381
887
379
885 152
658
154
660 133
639266
772 168
674
165
671 102814
608 308306
812377
883
378 950
444
409
915
379
885 920
414
381
887
953
447
237
743
394
900
928
391
897
422 462
968
453
959
199
705
452
958 345
851
461
967
468
974
983
477 200
706
562
467
973
192
69856
977
978
471
472
466
972
433
939469
975 486
485990
484
197
703
201
707
564
58
571
65 546
40
291
797
300
806
596
253
75990
279
785
602
96
186
692
296
802 720
214
178
684
325
831680
174309
815
315
821
236
742
220
726
1001
495
618
112
231
737
312
8181005
499
485
991
678
172
132
638 155
661
156
662
653
147
149
655 940
479
985
459
965
432
938
456
962 396
902
455
961
449
955
434
397
903
448
954
450
956413
919
395
901
451
957
431
937 436
942
403
909
382
888
428
934
435
941
440
946 374
880
380
886
384
890
155
661
653
147
149
655 156
662 596
132
638
90
186
692 602
96
680
174
678
172
220
726315
8211001
495
309
815
618
112
312
818
178
684
1005
499
720
214
374
880
279
785
384
890
396
902
413
919
231
737
546
40
403
909
397
903
296
802
236
742
382
888
325
831
440
946
380
886
197
703
449
955
395
901
436
942
448
954
450
956
291
797
571
65
455
961
434
940
479
985
957
435
941 201
707
459
965
432
938
451431
937
456
962428
934 58993
564487
300
806
991
253
759
342
848
194704
700 6587
293
799
198512 81592
86
548
42 303
809
250
756
297
803
252
758326
832
595
89
600
94
778
272
273
779
1010
185
691504
514
216
722
545
39
321
827
75
581 175
681
207
713
322
828
683
177
319
825
8523
723
217
1011
327
833505 314
820
173
679
17
617
1000
494
623
117
1002
496
728
222
115
621
108
614
109
615
119
625
118
624
105
611
536
11130
310
816
219
725
131
637
493
999 140659
129
635
646
677
171
136
642
138
644 153 650
144
150
656 458
964
930
424
475
981
476
982
457
963
146
652
143
649
425
931
446
952
429
935
478
984383
889
947
390
896
441
437
943
445
951407
913
375
881
419
925
418
924
387
893
1010
504 650
1011
505
144
150
656
146
652
143
649 153
659129
635
140
646 131
637
185
691
595
136
642
138
644 89493
999
219
725 173
679
677
171
108
614
109
615
1000
314
820494
115
621
723
217
105
611
310
816
319
825
342
848
175
681623
117
273
779
194
700
119
625
118
624 375
881
592
86
1002
496
728
222
600
617948913
514
216
722407
512
207
713
778
272
683
177
536
30
111 6545
39383
889
548
42
387
893 390
896
523
297
803
17
321
827
322
828 446
952
587
81
293
799
326
832
947
441
445
951
198
704
75
581
418
924
327
833437
943
419
925930
424
476
982
478
984
303
809
475
981
457
963
429
935 458
964
425
931 250
756
252
758
354
860
285
791 559
301
80753 757
251
549
43570
344
850
750
239
745
244
550
589
8344
588
8264599
755
249
577
71
240
746 93
753
247
597
91 209
715
206
1008
502
712 126
632
320
826
627
121 313
819
120
626 620
114
113
619
110
616
106
612
103
609
116
622
104
610
107
613
1004
498 134
640
137
643 157
663 933
148
654
427 402
908
420
926
430
936
404
910
398
904438
944
405
911
389
895
393
899
426
932 126
632 148
654
627
121
1008
502 157
663137
643134
640
597
91106
612
313
819
107
613 620
114
113
619
110
616
116
622
104
610 209
715
285
791
599
93
103
609
120
626
320
826
1004
498206
712 549
43
389
895
550
44 402
908
398
904
588 404
910
438
944
405
911
577
393
899
828371
589 344
850
559
53 420
926
430
936
426
932 239
745
240
746 750
244354
860
757
251
570
64
933
427
755
249
753
247
195 507
701 290
796
1805
572
66 590
288
794
299 84
580
579
591
857374
565
59
63513
569
330
836
294
800
598
92
594
334
84088
302
808
241
747
603
97
7829
582
76
544
38213
719
323
328
834
584
78
601
95
1009
503 520
14
211
717
208
714
543
37
583
77
542 123
629
628
122
522
16
525
215
721
36 19
318
824 528
22
535
29
125
631
521
15
526
20
1003
497
1007
501 634
1006
500
311 128
817 135
641
489
995
130 648
636 145 923
651 417
416
922
439
945
400
906
406
912
386
892
385
891
401
907
388
894
399
905 123
629
628
122651
1009
503
125
631 145 141128
634
507 598
1 99892
489
995
130
636135603
641 1003
497
1007
501
1006
500
211
717 195
701
213
719
528
594
88
601
95 22591
208
714
535
29
215
721
542
318
82436
311
817 85 544
15525
543
37
521
526
20 19513
520
14
38
522
16
386
892
583 406
912
7800
294
323
829
385
891 580
328
834
5907
401
84
334
840
388
894
7778
590
399
90584
74
400
906
923
417
330
836
416 808
922
582
76
439
945
579
73 302
241
747569
63
796
572
66
288
794
301
807
290565
59
299
805
563
352
858
349
855
57 708
350
856 202
348
854 289
795
560
54 2749
508
574
68243
593
87
551
558
5245578
336
842
335
841
576
70
331
83772
295
801
339
845 519
777
271
566
60
748
242
13
586
80
585
79
248
754
270
776
341
847
340
846
329
835
337
843
553
552
46 518
12324
830
516
10 210
716
524
18
317
823
718
212
316
822 630
124
533
27538
531
2532
633
127
529
23
647
141
492
998
139
645 142 415
921 630
124
633
127
647
648
142
139
645 492 297
210
716
508 270
776
718
212317
823593
316
82287
538
32
529
23
777
271
524
18
533
531
2527
339
845
350
856
324
830
551
341
847
553 45
340
846518
337
843
552
46
519
1213
295
801
336
842
516
10
586
558
52
80
585
79
578
560
415
921
202
708
335
841 54
72 574
331
837
329
835 68
748
242
576
70
243
749
349
855 60 858
289
795
248
754
348
854
566
57 567
563
352
47
298
804
338
844567
61 47338
844 298
804 61
351 862
857 356 67557
573 51 556
50245
751
246
752568 119534
62517 28
532
26 541
530
2435
540
34
527
21 541
35
530
24
540
34
527
21 534
28
9556
532
26 50 351
857
517
11 557
51 56862573
67245
751
246
752 356
862
255 839
761
286
792
256333
762
353
859 346
852
343
849
55
561
355
861 347
853575
69554
48506515
1012 539
33
537
31 490
996
491
997 1012
506 490
996
491
997 343839
849 554
33348
537
31 515
539
33
286
792 255346
761
256852
762 347575
853 69
55
561 353
859
355
861
−2
−2
287 838
793 332 555
49 332
838
555
49 287
793
−2 −1 0 1 2 −2 −1 0 1 2
f1 f3
ML − Factor Scores
2
484
990
254
760 356
862 486
992
481
987
485
991
483
989
352
858353
859
355
861
253
759 482
988
488
994
354 806300299
805 252
758753 470
976
860
564
562
5658 699 301 756
807 250247
565
757
25159
570
64
755
249 487
993
466
972
978469 933
975
465
971
472
473
979
977 427
200
706
201
707 348
854
193
191
697290
796
192
698 750
244
288
794
289
795 566
56960
63248
754
245
751
246
752 980
474471433931
939
467
973 425
458934
964
572
66304
810
239
745 567
61 459
965 428
1
349
855 303
809
243
240
746
573
67 749
241
747 463
969
464
970
460
966
468
974 431
937
432
938
475
981
457
963
455
961
57 851
563 345
190
696189
695
55
561 302
808
576
70
574575
68 748
242
69 568 62 359
865
361
867
360
866
365
871 461
967
480
986
423
929
983
477
462
968 930
424
456
962
451
957
434
940
479
985
476
982
454
960
453
959
4984
9 35
41
429
935420
926
437
943
917
411
478426
932419
925
702 703
196 197
199
705 292
798
198
704 291
797
559
293
79953 344
850331
837
330
836
326
832327
833 392
898
358
864 452
958
362
868 449
955
448
954
450
956 923
417
430
936
436
942
446
952
920
414 381
887
416
922
418
924
405
911
204
710
257
763
205
711 571
65
547
41 587
81 188
694
560
54346
852
577
229
735 580
853
579
73
7172
233
739
557
296
802
589
83 51 230
736
7578
347 4831
238
744
329
835
75
581586
80
582
76 410
916 918
412
394
900
395
901
953
447
948
442
928
422
927
357
863
421
391
897
950
444 9 439
945
4947
441
438
944
404
910
45
51
400
906
203
709 283
789 202
708
282
788
546
40 305
811
278
784 590
84334
840
335
841
558
52
234
740
280
786 294
800 585
79
325
328
834
232
738
584
78237
743
235
741
323
829236
742
231
737 363
869364
870
443
949
367
873 396
902
368
874
397
903
376
882413
919
440
946
382
888
403
909
409
915 393
899
402
908
390
896
398
904
379
885 401
907
388
894415
921
406
912
255
761
256
762 783
277
276
782297
803
588
82336
842
295
801 298
804
519
13
322
828
321
827 523
17 370
876 366
872 377
883
380
886
383
889 399
905
385
891
3511
8
281
787
350
856 51781
57 548
27542
549
43
550
279
785306
812
44 733
227 516
10
72518
513 12223
729
730
28
34
224
583
77
324
830
520
14
522
16 371
877
369
875 408 884
914 378
384
890389
895
387
893
386
892
0
510
287
79345 6 813
512 551
30745
308
814
226
732 337
843
338
844
339
845
225
731552
46 545
340
84639
544
38
341
847
556
50 517
11
525
19 373
879
372
878 407
913
778
272
176
682553
47
6
1207
713
83
77
514
216
722
720
214206
712 8 221
727
521
15 374
880
f3
285286 780
791792 274591
85
600
94
775
269
592
8687 777
271
599
93209
715543
37
102
608
208
714 515
536
320
82630
617 9
526
20532
533
27
1002
496
120
626
535
11129 26
534
524
182
728
222
1004
6234988539
33 375
881
194
700
195 839
701 333 593
273
779
594
88 178
684 542
55436
48
213
719
215
721
319
825
101
607 117
618
112
318
824
316
822103
609
531
528
1001
495
1000
312
81849425
2223
118
624538
119
62532
5497
29
537
31
311
817
541
35
284
790 342
848 332
838 179
685601
270
77695 309
815
175
681555
49
315
821 1005
499
317
823 1003
104
610
115
621540
116
62234
530
24
527
1007
501
1006
50021
168
674
605
99
509
398 268 724
774
602
96
343
849 211
717
680
174
723
217
218 310
816
314
8207 110
616
18
2716
12
105
611 620
114
113
619 166
672
187 604100
606 220
726
173
679 210
109
615
108
614 165
671
693 180
686 596
902603
508
186
692
97668
164
670
162
167
673
163
669 266
772
313
819 107
613
106
612
678
172
170
676
677
171
169
675
493
999
182
688 263
769
597
91
595
89 262265
768 261
767 219667
725
267
773
158
664
264
770 161
133
639132
638
134
640
159
665 135
641
181
687
183
689 598
184
69092
258
764
185 771
691 260
766 489
995
136
642
131
637 130996
636
492
998 156
662
259
765 137
643
138
644
129
635 490
491
997
507
1 128
634
140
646
647
141
160
666
139
645
155
661 157
663
153
659
154
660
−2
151
657
152
658648
142
653
147
150
656
149
655146
652
650
144
148
654
143
649
145
651
1008
502 627
121
1012
506
1011
505
1009
503 628
122
126
632
1010
504 123
629
125
631630
124 633
127
−2 −1 0 1 2
f1
Figure: Factor scores - Maximum Likelihood estimation MVAfacthous

ML − Factor Scores − Rotated ML − Factor Scores − Rotated
2
990
484 486
992481
987483
989
482
988 470
976
469
975 483
989
482
988
481
987484
990
486
992
470
976
469 934
975
485994
991 488
487
993
466
972465
971
933
427
433
939
473
979458
964
425
931 980
474428925
934 419917
411 980
474 473
979
488
994
465
971485
991
487
993
917
411
433
939
466
972 428
458
964 933
427
425
931419
925
978
472
977
471467
973
459
965
431
937455
961
432
938
463
969
464
970
460
966
930
424
456
962 480
986
423
929
435
941
359
865
451
957 361
867
365
871
437
943
420
926
434
940 381916
887 410 876370
371
877
373
879
372
878 370
876
877
371373
879
372
878 480
986
361 865
867
365
871
410
916 359 978
472
929
460
966467
973
977
471
463
969
464
970
423 381
887
455
961
459
965
431
937
432
938435
941
451
957
930
424
456
962
434
940 463
9 37
43
420
926
457
963
475
981461
967
429
935
468
974479
985
983
477
476
982 454
960
360
866
417
462
968
478
984 436
942
426
932
923358
864
416
922
430
936 918
920
412
414
405
911
4
918
24
392
898
446
952
438
944
947
441
448
954
449
955
453
959 927
421
406
912
948
442
357
863 376 408 875
882
368
874 369 369
875 360
866
392
898
358
864 454
960
368
874 461
967
918
412
468
974
462
968 920
414
983
477
927
376
882
421 479
985
959
948
442
357
863 448
954
449
955
453 4952
9
436
942
475
981
476
98257
429
935
446
947
441 426
932
405
911
418
924
923
417
430
936
478
984 416
922
438
944406
912
452
958395
901
362
868
450
956
439
945
953
447
928
422
4
9 413
919
415
921
404
910
394
900950
444
445
951
382
888
396
902
391
897443
949
363
869
364
870
01
07 366 914
379
885
377
883
399
905872 408872
914 362
868
366 869
363443
949 452
958
364
870394
900
950
444
953
447
928
422
377
883413
919
395
901
450
956
379
885
396
902
391
897 382
888 445
951 439
945
404
910401
907415
921
399
905
400
906440
946
393
899
390
896
398
904 380
886
403
909
402
908
367
873
388
894
409
915
397
903 378
884
385
891
387
893
383
889
389
895 367
873 409
915403
909
397
903
378
884 380
886
440
946
390
896 402
908
393
899
387
893
398
904
383
889 388
894
400
906
385
891
389
895
386
892
384
890 407
913 384
890407
913 386
892
1
760
254 375
881
374
880 254
760 374
880375
881
252
758
753
247
253
759
250
756
565
59
757
251 233
739
229
735 229233
735739 253 756
759 252
758
250 753
247
757
251565
59
570
64
755
249 304
810
193
699 238
744
230
736 234 732
740 226 234
740
226
732 304 806
810
238
744 570
64
755
249
352
858
356
862
353
859 300750
806
299
805 248
754
569
244663
566
60
245
751
567
246
752 191
697
1809
303
192
698 232
738
235
741
237
743 227731
733 225 227 699
225 733
731 193
191
697 230
736
232
738 300743
235
741
192
698 303
809
223237
352
858
750
244 63 754
569
299
805 248
356
862
566
6061245
751
567246
752
301
807
355 564
861 290
796
288
794
289
795 239
745
240
746
243
749 568
241
747 189
695190
696
62 832
345
851 231
737
236
742 730
224
228
734
223
729 164
670
167
673
163
669 164
670
167
673
163
669 190
696228
734730
224
189
695
345
851729 301
807
231
737
239
745
290
796
236
742
240
746 243 859
749
288
794
241
747
289
795
353355
861
568
62
0
162
668
0
58
562
56
354 854
860 200
706 302
808
748
242 327
833
326 188
694
325
831 523
17 221 775
727 269674 162
668 775
269 56188
694
200
706
562 564
58727
221
354
860 326
832
325
831 302
808
327
833
523
17 748
242
f2
305
811
f2
572
34866 575
201
707 576
70 344
850292
798
196
702
580
74
330
836 75
581
582
76323
829
328
834
586
80204
710
584
78 283
789
280
786 728
222 168 268
774
166
672 158
664 268
774 283
789
204
710
196
702 158
664 305
811
292
798
280
786 201
707 166
672 75
581582
572
168
67466
344
850348
85476576
323
829
328
834
728
580
222
74
330
836 70
586
80
584
78
573
67
349
855 69
574
68331
837 578
329
83572
577
71
579
73 296
802
585
79205
711322
828
321
827
583
783
277 520
7714
525
522
16
545
39
306
812 19
307
813 1002
4961001
1004
536
30498
623
120
626
117495
1005
4622
99 170
676
165
671
678
172 169
675 205
711 307
813783
277 296
802
306
812 1001
495
169
675 322
828
321
827
170
676
165
671
349
8551002
496
577578
71
545
39579
73
117 585
79
520
574
68
14 573
525
19
522
16
331
837
72 67
329
835
1004
498
583
678
172
623
536
3077
120
626 575
69
55 703
561 197
291
797
293
799
559
53
560
54
557
51 587
81
334
840
257
763
589
83
335
841
590
84294
800519
297
803
298
804
547
41295
801
336
842
588
82
13
282
788 324
830
276
782
516
278
78410
513
518
12738
517
11
275
781
279
785
544
281
787
308
814
207
713521
21615
526
20
524
514
818
534
28535
539
33
320
826
176
682
532
26
533
27
720
722
214 29
617
111103
609
102
608
528
22
538
32118
624
541
35
529
531
2523
319
825
1000
311
817494
618
112
101
607
1003
497
119
625
312
818
1007
501
309
815115
621
116
620
114
1006
500
104
610110
616 677
171 132
638
261
767
161
667
263
769
159
665
262
768 258
764
156
662 263
769
258
764 281
787
262763
768257 282
788
547276
782
278
784
275
781
41770
197
703
161
667
291
797
308
814
176
682 102
608
293
799
101
607
587
81
159
665
279
785 618
112
559
297
803
720
214 5381005
514
312
818
589
309
81583
207
713
216
722
319
825
588
82
499
1000
494
294
800 103
609
519
118
624
513
7
119
625
334
840
617
111
320
826
115
621
560
54
590
84
13
335
841
544
38
295
801
324
830
516
1003
497
156
662
518
521
15
12
526
20
677
171
1007
501
528
116
622
22
620
114 10
311
817
524
535
2918
298
804
1006
104
610
132
638
336
842 500
110
616557
51 517
55
561
538
32
533
27 11
534
28
529
23
531
25 532
26
541
35539
33
57 852
563
347
853
346199
705
198
704
571
203
709
546
65558
5240 548
42
549
43
550
44
338
844340
846
4847
510341
339
845
337
843 5777
511
778
272
683
177
515
9543
206
71237
209
715537
780
274
208
71431
215
721 540
34
530
178179
684685
213
719
318
824
316
822
542
36 317
823
175
681315
821
527
24
21
310
816
314
820
680
174
113
619
105
611
109
615
108
614
218
724107
613266
772135
641
493
999 267
773
133
639
264
770
265
771
260
766
136
642
157
663
259
765 160
666
154
660
155
661151
657
653
147
150
656 259767
203
709
265
771
765 261
780
274
511 199
705
267
773
5264
260
766179
685
546
40
571
65
4 724
510
160
666
198
704548
4243
178
684
218 680
683
177
315
821
133
639
266
772
5 49
550
4
151
657
174
206
712
209
715
4
57
563
175
681
778
272 154
660
314
820 113
619
493
999
543
37
558
105
61152
340
846
208
714
310
816
108
614215
721
213
719
339
845
109
615 341
847
318
824
337
843
155
661
542
36
653
147
107
613
347
853
157
663
135
641
346
852
316
822
317
823
338
844
150
656
136
642
540
34
515
530 537
31
921
527
24
202
708 551
45
552
46
556
50
553
647
600
94
271599
93 718
212
95 717
211210
716
723
217 220
726
173
679
313
819106640
612 134130
636
131
637
489
995
137
643
492
998138
644 153
659 152
658
650
146
652
144
149
655 600
94
220
726152
658
599
93
202
708
173
679
723
217 551
131
637 134
640
4546
153
659
777
271
106
612
211
717
313
819 650
552
149
655
210
716
137
643
138
644 130
636
489
995
146
652
144
556
553
4750
718
212492
998
−1
−1
512 592
86273
779
601 605
99 219 996491
997
490 140142
646
129
635 143
649
148
654 187 60599 604 6592
512 86779
273 601
95 140
646
129
635 143
649
148
654 142 996491
997
490
591
85
554
48
593
8788
5776
94
270
555
49 602
96 98693
604 187
725 141 648
128
634
647
139
645
145
651 693 98 602
96
591
85
594
88219
725
593
87270
776 128
634554
48145
651
647
141648
139
645 555
49
255
761 351
857
350
856 100 686
3606 180 181
687 181
687180688
686 100
606 350 857
351255
761
256
762 509
603
97 186 688
692
596
90 182
184
690183
689 183
689 509
1823 596
184
690 186 856
692
90 603
97 256
762
287 792
793 285
791 194
700
333
839 284
790 508
2 595
89
597
91 185
691 284
790 285
791
194
700 595
89
185
691 597
91 508
2 287
793
286 701
195
332
838342
848343
849 598
92 342701
848 195 598 286839
92 792 333
343
849332
838
−2
−2
627
121 126
632
628
122 633
127 627
126
632
121
628
122 633
127
507
1 1008
502505123
629
1011125
631
630
124 1011
505
507
11008
502 123
629125630
631124
1012 5031010
5061009504 1010
504 1009
503 1012
506
−2 −1 0 1 2 −3 −2 −1 0 1 2
f1 f3
ML − Factor Scores − Rotated

2
491
997
490
996
555
49 539 415
921
33
537
31 527
21923
540
34
400
906
439
945
416
922 4905
399
388
894
9 386
892
01
07
385
891
406
912
1012
506 648
142
332 517
838 11 9532
51526 541
35
530
24 426
932
417 405
911
393
899
438
944
404
910
402
908
398
904389
895
387
893 139
645 633 145
651
127 148
534
28 647
1
554
48 529
23 430
936
420
926 418
924
419
925 492
998
130
636141630
375
881
861
355 55
561
287
793 347
853
575
246
752
245
751
256
762
69568
346
852 62839
333
343
849
556
50 933
427531
533
27
425
93125
316
822
524
18
538
32
718
212
457
963
317
823
311
817
476
982
478
984
429
935
210
716
930
424 437
943 445
951
947
441
390
896
446
952
440
946383
889
380
886
384
890
382
888
407
913
489
995
135
641 374
880
128631
634 157 654
124
663 143
649
146
652
650
144
353
859
356
862 573
67 567
255
761
566
60
61 557
286
79251
242
576
70 329
835
558
74852
331
837
298
804
551
45
516
338
844
337
84310
553
47
552
46
518
12
519
13
324776
830
341
847
340
846 777
271
583
77 270
525
522
1619526
20
521
15
215
721
542
36
475
981
458
964
535
29
318
824
528
22
432
938
1004
498
431
937
459
965
717
120
626
435
941
428
934
1006
500
1003
497
451
957
456
962
455
961
104
610
103
609
211
4613
9
450
956
448
954
1007
501
449
955
116
622
479
985
434
940 620
114
110
616
36
42
395
901
107
113
619
403
909
397
903
928
422
391
897
394
900
953
413
919
381
887134
640
379
885
677
171
396
902
106
612
920
447
414 377
883
137
643 1125
140
646
138
644
136
642 6
628
122
129
635
1009
378
884503 23
29
156
662153656
659150
149
655
653
147
248
754
351
857
574
68 841586
578
72
335 80
585
79339
845
295
801
336
842 584
78
328
834
323
829 520
593
87
544
3814 208
543
37
508
714
2
601
95
603
97
213
719
320
826
536
30 983
477
118
624
461
967 452
958
313
819
453
959
119
625
310
816105
611
115
621219
725
109
615
108
614 409
915
493
999
950
444 627
121
131
637
132
638
1008
502
678
172 126
632 155
661
348
854289
795
288
794 243
749560
54590
241
747
302
808
202
708
569
63 84
334
840
579
73 582
76
294
800
580
330
836
74 513
7 206
712
594
85993
59148788
469
975
209
715 617
433
939
111
467
973 728
222
468
974
598
1002
49692
462
968
623
117
597
91
723
2171005
499
1000
494
173
679 364
870
948
442
917
918
411
412
357
863
927
421 1011
505 154
660
152
658
57
563 485
991 466
972
977
471
599
93 8 5071 314
820 443
949376
882
−1 0
299
805
352
858 565
59
753
247
572
66755
249
290
796
349
855 350
856
240
746
239
745
750
244
570
64 589
83
344
850
484
990 550
588
577
718244
327
833
75828
581
549
43 322
488
994
321
827 978
472
470
976
523
683
17
177
207
713
545
39514
273
779
216
722 175
681
319
825
463
969
460
966
312
818
315
821
595
89
618
112 454
960
220
726
185
691
362
868 165
6711331010
639 504 657 151
301
807 757
251
252
758 559
53 486
992
326
832
297
803 778
272
325
831 236
742
600
94
592
86231
737
465
971
178
684 423
929
1001
495
680
174
464
970
309
815 358
864
392
898 266
367
873
772
168
674
363
869 170
676
368
874 169
675
250 809
756 303587195
701 6987
548
42
512481743 473 608
979
237
720
214 186
692
102 480
986
360
866
359
865 166 665
672
354 806
860 198799
704 29381
285
791 296
802
194
700 482
988
279
785 602
96 596
90101
607
221
727 218 872
724 159 666 160
300 291
797 306
812 223
729 366 667
f3
253
759 571
65
192
698 546
40 342
848 483
989308
814
235
741
176
682 361
867
365
871 161
564
58 345695
851 189 179
685 410
916
201 703
707
200197
56706
562
199
705 304
810 188
694
510
4783
278
784 275736
781
276
782
277
5
511 230
238
744 3730
509
232
738
224
100
606
980
474 688
228
734 182
184 773
690 267408
914
260
766
264
770
547 282
788
41696
190 786 780
274
280 813
307 261
767
191
697
193
699 292 811
798 305 180 689
686
604
98 687 183
181 259
765
265
771
203
709 733
227
254 763
760 257 605
99
283
789281 739
787 233 775 269 262664
768 158
204790
710 284 229 731
735 225
187
693
196711
702 205
−3
234
740 268 769
263
226 774
732 167
673
369
875
164
670
162
668373
879
372
878
258
764
163
669
370
876
371
877
−2 −1 0 1 2
f1
Figure: Factor scores - Maximum Likelihood estimation with Varimax


PC − Factor Loadings PC − Factor Loadings

X6
X14 X12
0.4
X7
0.4
X9X5 X3 X5 X14
X1
0.0
q2
q3
X10
X7 X13
X6
0.0
X2 X3 X1
X12 X11
X8
−0.4
X8 X11 X10X9
−0.4
X13 X2
−0.5 0.0 0.5 −0.4 0.0 0.4
q1 q2
PC − Factor Loadings
X12
0.4
X7
X14 X5
X3
0.0
q3
X13
X6
X11 X1
X8
−0.4
X10
X9
X2
−0.5 0.0 0.5
q1
Figure: Factor loadings - Principal Component estimation

MVAfacthous

PC − Factor Loadings − Rot PC − Factor Loadings − Rot

X14
X6 X12
0.5
0.5
X8 X14
X2
−0.5 0.0
X12 X6
q2
q3
X8 X2
X9
X10 X1 X5
X7 X3X7
X13X11
X5
−0.5
X3
X11 X1
X13 X10
X9
−0.5 0.0 0.5 −0.5 0.0 0.5
q1 q2
PC − Factor Loadings − Rot

X12
0.5
X8 X14
−0.5 0.0
X6
q3
X2
X11X13 X3X7
X5
X10 X1
X9
−0.5 0.0 0.5
q1
Figure: Factor loadings - Principal Component estimation with Varimax


PC − Factor Scores PC − Factor Scores

258 670
764 164
163
669
167
673 371
877 371
877 258
764164
670
163
669
167
673
263 668
226
732
769 162
370
876 370 732
876 226 769263668
162
0 1 2 3
0 1 2 3
268
774
234
740 365878
871 365
871 234
740 268
774
225
731
233
739 372 233
739
372
878 225
731
229733
775
735
269 262 664
768
227 158 373
879 373
879
229
735 775
269227 768
733 262664
158
205 789
711 187
693 259
765
265
771 369
875 205
711 369 693
875 259
765
187
265
771
196
702204 283
710 281
787
254
760 702
196
204916
710254
760 283
789 281
787
99604
605 98181 744
687 261
767
232
738 264
770
260
766 160
666
980
474867 410
916
376
882 410980
474 960 376744
882 232
738 99604
605 98
261
767
264
770260
766 160
666
181
687
284
790 183 238
689 228
734 161
667 483
989 361454
960 284
790
483
989 361
867
454 238228 773
734 161 689
667 183
203
709
257
763 193
699 307
813 267
773 408
914 918
412 412 709
918 203
257
763
193696
699 307
813
408
914 267
190
696 780
274 736
180
686 230741 482657
988
159
665
169
675
166
672 464
970
358
864 455
961
381
887
451
957
927
421 917
411 455
961
451
957917
411 482 927
988
190 464
970
381
887
421 358
864230
736 780
274 169
675 159 657
665
180
686
191
697
305
811 221
727
730
224 170
676 151 480
986
359
865461
967432
938 432942
938 191
697
461
967 305
811
480
986
359
865 235672
166
730
224221
727170
676 151
f2
f2
292
798 280
786
282
788 304
810
100
606
783
277
509
276
7823 682
188
694 184
690179
685 223
729
235 362
868
360
866
433
939
156
662
463
969 378
884
434
940
452
958456
962
950
444
443
949377
883
948
442
357
863
396
902 436
942
437
943
428
934
435
941
420
926 428
934433434
939940
437
943
456
962
420
926 436798
435
941 292 304
810
188
694 362
868
360
866
783
277
452
958156
662
377
883950
444
948
442
276
782
463
969 357
863
396
902280
786
378
884
741
282
788
443
949 223
729 179
685
100
606 509
3 724
547
41 5 155
661
473
979 462
968 953
447 446
952 446
952 547
41 473
979953
447
462
968 155690
661 184
197
703
562
56199
705
200806
706
201
707 300
511
510275
781
278
784
189
695
4851
345
192
698
253
759
596
90
308
814176 743
595
89684
218
724
309
815
237 618
112 671
315
821 168
674481
987
486
992
165 465
971
978
472
152
658
392
898
154
660
460
966
983 431
937
394
900
453
959
423
929
363
869
653
147 382
888
450
956
403
909
467
973
459
965
477 397
903
448
954379
885
928
422
364
870
449
955
146
652
479
985 458
964
923
417
430
936
380
886
457
963 419
925
438
944
416
922 458
964
923
417
467
973 431
937
419
925
200
706
201
707
457
963 430
936
300
806
562
438
944
416
922
197
703
56 705
199345
851189
695
459
965
192
698
486
992481
987
253
759
465
971
423
929450
956 392
898
928
422
978
472
275
781
278
784
460
966
364
870
479
985
382
888
403
909
394
900
453
959
983
477 379
885
397
903
448
954
308
814
449
955
380
886168
674
363
869 45618
511
176
682
510
237
743 112
653
147
146
652
309
815
596
315
821
218
90660
154
165
671
595
89 152
658
58848
564 342
198
704 291
797
301
807
546
40 548
587
8142
279
785296
802
182
688
306
812
602
96
303
809
344
850 297
803
178 680
174
1010
504
312
818 101
607
102
608
220
726
314
820
319
825
266
772 115 678
621
623
117
172
133
639
677
171
977
471
487
993
488
994
466
972
470
976
132
638
131
637
468
974
469
975
150
656
395
901
391
897
476
982
157
663933
427
367
873
930
424
402
908
429
935 426
932
445
951
947
441
409
915 933
427
930
424
426
932429
935
564
58 198 797
704
301
807 291 488
994
466
972
546
40 469
975
344
850445
951
977
471
487
993
468
974
476
982
470
976
303
809 367
873
279
785
587
81
402
908
395
901
306
812
947
441
409
915
342
848
391
897
157
663
101
607
102
608
296
802
548
42602
96
623
117
297
803
178
684
115
621
312
818
677
171
319
825
266
772
678
172
680
174
133
639
314
820
132
638
131
637
220
726
182
688
150
6561010
504
571
65 250
756
252
758
592
86 720
214
325
831 236
742 103
609 485
991 485
991
103
609250
756
252
758 571
65 236
742
325
831 720
214
592
194
700
285
791
354
860195
701
349
855
293
799
507
299
805
352
858
1
53
290
796
240
746
241
747
239
745
6750
512
559244 273
779
302
808
757
251
330
836
588
82
600
94
570
64326
832
598
92
597
91
186
692
778
272569
63
755
249
321
827
599
93322
828
75
581
175
681
207
713
216
722
327
833
683
177582
1011
7677
505
231
737
514 118
624
1001
495
88161005
499
536
30
728
222
313
819
310219
725
620
114
109
615
116
622
113
619
119
625
1007
1000
494501
484
990
493
999129
635
138
644
136
642
140
646
650
144
153655
659 143
649 425
931
440
946
398
904
475896
981
149
920
414
401
907
384
890
383
889
390
404
910
439
945
406
912
418
924
425
931 439
945
299
805
352
858 354
860 418
924
349
855
484
990
293
799
290
796
920
414
285
791
757
251
570
64
755
249
240
746
239
745
750
244 241
747
302
808
569
63
194
700
195
701
475
981 559
330
83653440
946
398
904
326
832
401
907
406
912
588
82
327
833
404
910
582
76
231
737
321
827
384
890
600
390
89694
778
272
1001
322
828
383
889
75
581 599
93495
512
536
30
119
625 686
116
622
207
713
216
722
683
177
118
624
493
999
1005
273
779499
109
615
1007
1000
494
8620
514114
728
222
175
681
501 113
619129
635
138
644
507
191
598
136
642
597
310
816 9
313
819
186
692
140
646
1011
2 659
153505650
144
149649
219
725
655 143
572
350
85666
348
854 549
589
289
7958343577
243
749
550
44
591
8571
294
800
580
508
274513
7
601
95
545
39 320
826
5
723
217
1008
502
584
78 83
173
679 120
626105
611
108
614
535
29
126
632110
616
1004
498
104
610 492
998 478
984405
911
400
906 572
348
85466 289
795 405
911
400
906
243
749
350
856 589
83
478
984 577
71
580
74
584
78583
77
549
43
513
7
545
39 105
611
320
826
601
95
108
614
120
626
535
29
294
800
110
616
550
44
1004
498
591
85
104
610 723
217
492
998 173
679
5081008
502
2 691 126
632
563 202 794
57 708 288849
590
84
343 753
247
594
88
565
59
334
840
579
73
331
837
603
97
748
242
335
841 248
754
185
691
295
801 323
829
209
715
518
544
3812
585
79
542
36
206
712 523
17
516
10
543
37
586
80
328
834
519
13
617
111
517
11628
122
1002
627
121
520496
211
717123
629
521
1415125
631
528 106
612
1006
500 137
643
134
640 135654
641
647
141
489
995 148366899
872 399
905
368
874
386
892
393
374
880413
919 57
563 202794
413708
919 288753
247
565
59
368
874 248
754 331
837
748
242590
84523
17 323
829
617
111
334
840
399
905
579
73
585
79
386
892
586
80 135
641
335
841
366
872
328
834
519
13
393
899
520
14
518
12
517
11
1002
496
516
10
521
15
543
37
374
880
209
715
594
542
206
712
295
801
343
849
544
38528
88
36
1006
500 137
643
134
640
603
97
106
612 211
717
647
141
489
995 185
503654
148628
122
627
121123
629
125
631
286
792351 560
857 54
558
52 329
835
551
45
87
336
842
574
68 578
339
845
593
566
576
706072
270
776
341
847
777
271
340
846
213
719318
824
1009
503
522
16
208
714
324
830
568
62
317
823
316
822
526
20
22538
32
107
613
1003
524
18497
529
23
630
124
534
28541
35 128
634
139 651
645
130
636 145 389
895
387
893
388
894 351
857
566
60286
792574
68
576
70
568
62
560
329
83554
541
35 558
388
894
534
28
52
522
16
389
895
578
72 339
845
336
842
387
893324
830
524
18
777
271341
847
526
20
340
846
22
538
32
551
45
593
270
776
52987
1003
497
23
317
823
318
824
107
613
316
822
213
719
208
714130
636128
634
139 1009
645 145
651
630
124
255 859
761 333557
839
573
67 51 337
843
552
46 298
804
554
481012 525
19 531
311
81725
533
27 353 573
67 255 751
761 525 557
1983951 913
333 337
843298
804
533
27
311
817531
25
552
46
353 346
852 338
844
553
47
347
853567
61
245
751
556
50
506 9716
515
215
210
718
212 633
127
530
24
540
532
2634
539
33
527
490
996 407
913
385
891 415
921 859 415
921 567
61539
245385
891
33 346
852
347
853532
407
26 537
338
844
540
34
215 9 48718
553554
47
490
996
530
24
515
556
50
527 212 1012
210
716506 633
127
256
762
287 862
793 575
69 246
752 721 21
537
31 491
997
648
142 375
881 287
793 256
762 246
752 575
69 721
375
881
31 21
491
997 648
142
356 356
862
−2
−2
332
83855
561 55
561 332
838
355
861 555
49 355
861 555
49
−2 −1 0 1 2 −3 −2 −1 0 1 2
f1 f3
PC − Factor Scores
125
631
123
629
126
632 6
130
24
633
127
628
122
2
627
121 143
649
145
651
650
144
148
654
1010
504
1011
505 151
657
152
658 149
655
183
689
181
687 184691
690 1009
164
670
503
163
669
18510081012
506
167
673
162
668
502 219 666
725
158
664
160
153
659150
656
182
688 218
724
259
765 220
726 159
665 155
661
180
686
693509
187 258
764
595
89
508
6922768
263
769
186
598
92 265
771
262 260
766
173
679
264
770 313
819
211
717
555
497
210
716
161
667
218
12 165 635
671 154 648
660
139
645
140
646
129
128
634
142
1
6043596
198
507 5
9091 97 261
767
680
174723
217
315
821
314
820
267
773
309
815310
816 113
619 678
172
170
676
620
114 138
644
131
637
133
639136
642
137
643
132
638 647
141
130
636
492
998
179
685 554
48 317
823 106
612
169
675 134
640489
995
99 606
605 100268
774
592
603
97
226
732
225
731
8687
178
684
594
88
733
227
593
175
681
556
50 266
772
213
719
542
36
209
715
730
224
316
822
318
824
208
514
8
714
319
825
221222
727728
312
818
618
112
320
826 115
621
109
615107
613
91006
118
624
515 500
1007
501 677
171
530
24
1003
497
105
611
108
614 369 653
493875
999
527 146
652
147
490
996
512 602
96
591
775
26985
297
803
176
682 553
552
46
6 216
722
47207
713
206
712
720
214
5776
270
295
801 298
804
601
95
83
1843
77 1005
499
623
117
1000
494 116
622
110
616
531
528
22
1004
498
120
626 25
529
23
538
104
61032 21 491
997
456
511
510
281
787 548
42
780
274 551
34
234
740
273
779
550
343
84944
549
4
296
802294
800
233
739 544
38
232
738
337
513
340
846
339
845 7
545
238
74439
223
729
228
734
543
37
518
341
847
338
844 12 517
516
10
237
743 11
101
607
324
830
215
721
102
608
231
737
535
29
1001
495311
817
536
30
1002
496533
27
119
625
524
521
15
526
2018 166
672
371372
877878
540
34
537
31370
876 373
879 374
880375
881
283
789 280
786
282
788 778
272
229
735
600599
93
777
271
336
842
94
335
841
334
840
230
736 322
828
321
827
325
831 236
742
235
741
323
829
519
13
328
834583
77111 674
617
520
14
522
16 168 135
641 366
872
443
949
157
663 384
890
383
889 407
913
387
893
386
892
558
52 579
73 584
78
585
79 363
869378
884
408
914 390
896 389
895
393
899
−1 0
342
848 307
813
308
814
332
838
560
54 577
557
517172
326
832
580
7475
581
327
833586
80
582
5787 6523
17 534
28
532
26541 358
864 397
903
376
882 380
886
398399
904905
440
946
391
897
379
885
950
382
888
444
396
902
357
863402
908
403
909
948
442
395
901
449
955
394
900 401
907
404
910388
894
406
912
571
65 781 588
82
f3
559
53
275
333
839
276
782
783
277306
812
279
785 346
852
347
853
329
835 35 156
662
361
867
360
866
359
865 448
954
453
959
983
477
392
898
454
960
462
968
362
868 947
441
377
883
367
873
364
870
381
887
194
700 278
784
305
81181 590
589
5878384
330
836
331
837
574
68
576
70 69 568
575 62 52519 539 464
970
460
966
480
986
463
969 479
985
452
958 409
915
927
421
478 891
984
928
422
953
447
284
790
195
701
286 810
792 241
747
302
808
240
746
748
242
243
749569
63 248
754246
752
567
61
245
751
33
365
871
980
474978
473
979977
472
471
465
971
487
993
475
981
468
974 450
956
476
982
423
929 405385
400
906
911
351
857
350
856 304
190
696188
694
239
745
303
809
570
64566
60 483
989
482
988
481
987 470
976469
975 445
951
285547
791 546
41 254
760
40699
193
750
244
759
189
695
289
795
288
794
192
698
757
251
344
850
253252
758755
249
753
247
565
250
75659 486
992
484
990
488
994
485
991466 967
972 410920
916
461
459
965
414
368
874
205
711
257
763
204
710 256
762
255
761
292
798191
697
345
851
293796
799290 103
609
196 709
702 203
199
705
197
703 291
797
202
708
287
793 418 921
436924
942 415
354
860 349
855
198
704 66573
572
301
807
57 806
563
67 434
940435
941 413
919
562
56 300348
854
299
805 918
412
429
935
432
938 430
936439
945
446
952
564
58 353 561
859 55 431
937 917
411 438
944
416
922
200 858
706
201
707
352 433
939 456
962
467
973961
437
943
923
417
420
926
455
457
963
451
957
419
925
426
932
930
424
428
934
458
964
355
861 425
931
933
427
−3
356
862
−2 −1 0 1 2
f1
Figure: Factor scores - Principal Component estimation

MVAfacthous

PC − Factor Scores − Rotated PC − Factor Scores − Rotated
3
371
877 371
877
370
876 365
871 365
871 370
876
164
670
163
669
167
673 164
670
167
673
163
669
764
258162
668
226
732 372
878373
879 226769
732 258373
764 372
878
162
668
263
769 410
916 918
412455
961
451
957 455
961
451
957 918
412 740 410
916 263879
2
268
774234
740
233
739
225
731 917
411 428
934
437
943 428
934 917
411
437
943 233
739268
774
234225
731
369
875 376
882
980
474 432
938
433
939
436
942
434
940456
962420
926
923
417
446
952 433
939 432
938
420
926
456
962
923
417446
952
434
940 436
942980
474733 376
882 369
875
158
664
733
227
229
735 454
960
483
989
361
867 435
941
431
937
467
973 419
925
458
964 467
973
458
964 431
937 419
925
435
941
229
735
483
989 227361454
960 158
664
262
768
775
269205
711
204
710
196
702 254
760 421967
408887
482
988
914 381
927 461 430
936
429
935
438
944
416
922
930
424
457
963
426
932
933
427 205
702711
196 204
710 254933
760427
930
424
457
963 426
932
429
935
438
944
416
922
430
936
482
988 461 867
775
269
967 262
768
381
887
927
421 408
914
259
765
265 738
771 464
970
358
864480
986
362
868
359
865 452
958
377
883 459
965 425
931 425
931 459
965 464
970
480
986358
864
362
868
452
958
359
865 259
765
265
771
377
883
378
884
360
866
950
444 953
447
450
956 439
945 439
945 953
447
450
956
360
866 378
884
950
444
1
948
1
283
789 264232 156
662442
357
863
463
969 423
929 283
789 232
738423
929 156
662
463
969 948
442
357
863
187 787
693 281 261
767
770
260
766 160
666
238
744
203
709
228
734
257
763 193
699
161
667190
696 166
672
396
902
443
949
473
979
462
968
465
971
392
898
382
888928
422
403
909
379
885
481
987
460
966
453
959
448
954
394
900
486
992
983
477
364
870
397
903
449
955
978
472479
985
380
886 445
951
414 924 418 203
709
257
763 193696
699 190
281
787
486
992481
987 473
979
465
971
238
744
418
924187
693
228
734
978
472
462
968
445
951928
422
261
767
392
898
460
966
453
959
983
477
479
985
396
902
443
949
264
770
382
888
403
909
448
954
394
900
364
870 379
885
260
766
397
903
449
955
166
672 161
667
380
886
160
666
605
99
604
98
284 773
790 267
307
813 230
736
159
665
191
697
169
675
304
810
170
676
292
798
363
869 402
908
977
471468
974
395
901
488
994
487
993
391
897 476920
982
469
975
466
972
470
976
409
915 413
919 284
790
292
798
191 810
697 304 307
813 605
99
488
994
466
972
230
736469
975
413
919
470
976
604
487
99398 920
414773
468
974
476
982
977
471 267363
869
402
908
395
901
391
897
409
915 169
675
170147
676 159
665
f2
f2
181
687 274811
780 305
730
151
657
224
221
727223
729188
694
235
741 189
695
345
851200
706
201
707
168
674
300
806
154
660 146 663
652
653
147 367
873
947
485
991
441
103
609
484
990 200 806
706
201
707 300 189
695188811
694305 103
609 485
991
484
990780
274 235729
741 730
224
223 168
674
221
727 367
873
181
687 947
441 146
652
406653 155657
154
660 151
183
689 179
685
783
277
276
782
280781
786 547
41
197
703
199
705155
661
192
698
562
56253
759 157 440
946
398
904
383
889
384
890
406
912
401
907
475
981 405
911
400
906 368
874 562 199547
56 703
197
705 41851
345
192
698
253 784
759 783
277
276
782 280
786 368
874
179
685 405
911
400
906
475
981 183
689 912
440
946
157
663
398
904384 661
401
907
383
889
890
100
606282
788 275
278
784
309
815 618
112
308
814237
743 564
58807
198
704
102
608
678
172
101
607 301 390
896
404
910
478
984 415
921 564
58 301797
807
198
704 275
781
278 282
788
308
814 237
743180618
415
921
102
608
100101
606607 112
309
815
478
984 390
896
404
910 6 78
1671
72
0
180
686 176
682
218
54724
511 152
658
165
671 291
797
133
639
677
171
252805
758
344
850
250
756
303
809 399
905 291 850252
758
344
250
756
303 812
809 5 682
511 176 686 677
171
218
724 165
133
639
399
905 152
658
509
3 596
184
690 90
510 315
821
680
296
802
174 306
812
312
818115
621
546
40 150
656
240
746
623
117 132
638
131
637
293
799
750
244
241
747
236
239
745 493
999
757
251
570
64 352
858
299 386
892
393
899 352
858
299
805 546
40
293
799 750
244240
746
757
251
570
64
241
747 306510
4 509
3
596
90 623
117
312
818 115
621
680315
821493
999
184
690 132
638
386
892
393
899131
637 150
656
595 178
684
220
726
89 602
342
848 96 279
785
587
81
319
825
314
820
266
772 742
325
831354
860
231
737
118
624
326
832 138
644
129
635
302
808
1001
495
536
30
1005
499
114
650
144
75
581 755
572
66854
249
290
796
136
642
349
855
116
622
569
63
119
625
109
615
620 348 641 374
880
366
872 389
895
388 891
894
387
893 354854
860 572
34866
349
855 290 745
285796
239 587
302
808
755
249 81
569
63
342
848 279 832
785 326 296
802
325
831
75
581 236
742
602
96 231
737
178
684 174
1001
495
536
30
319
825118
624
1005
499
595
89
119
625 314
820
116
622
266
772 109
615 220
726
366
872
388
894
620
114 138
644
389
895129
635
136
642374659
880
387
893 650
144
1010
182
688 504 548
42
297
803
571
65
194
700
592
86 720
214
285
791
273
779
321
827
588
599
93
514
559
53
600
94
207
713
216
310
816
722
330
836
322
828
577
71
153
659
728
327
833
222
882113
619
582
76
143
649
494140
646
243
749
289
795
1007
501
149
655
1000
105
611
288
794
583
77110
616
535
29
108
614
120
626 59 135
492
998
753
247
565
248
754
523
17
104
610
137
643
617
111
1004
498
202
708 407
913 385 791 289
795
288
794
194
700753754
247248
243
749
565
59571 330
836
65559
53588 548
8257742
327
833
600
7194321
827
322
828720
214
297
803
582583
76
599
93
592
86
273
779
523
17 77
207
713
216
617
722
111 1000
494
8611
514105
110
616
535
29
108
614
182
688
120
626
104
610
135
641
728
385
891
222
1007
501
310
816
1004
498 113
619 492
998140153655
646
1010
504
407
913
137
643 149 143
649
1011
505
512195
701 313
819
175
681
683
177
778
272
6 549 320
826
513
7
545
39
350
856
601
95 584
580
589
837478
323
829
331
837 134
640
57
563
1002
585
79496
521
15
748 489
995
647
141 541
35 57 708
563 202 856195
701
350 566 589
83
331
837 580
74
778
272584
78683
177
513
323
8297
545
585
7939 541
35320
826
175
681
1002
601
95496
521
15 313
819 134
640
1011
505
489
995
647
141
507 597
1692
18691
598
92 219
725
294
800
43
723
217
173
679
550
44
591
85 518
12
579
73
590
334
840
209
715
542
3684
516
10242
517520
586
80
328
834
329
835
543
544
38
335
841
206
712 37 14
1006
500
11
106
612538
528
2232
128
634
107
613
522
148
654
519
13 16
145
651 566
60
130
636
139
645
568
524
1862534
28
573
67 353
859
525
19996 33862
490539 356375
881 356 859
862 353573
67 857 60748
242
590
84835
329
568
62512
579
736
586
334
840
335
84180
328
834
525
19
519
13
550
44520
294
800
549
43 14
518
507
112
522
16
539
33
544
38
591
85 517
534
28
516
10
543
3711
597
91
209
715
542
206
71236186
692
524
18 1006
528
598
92
22500
538
32
723
217 106725
612
107
613
173
679 219
128
634
130
636 375 654
881
139
645 148
145
651
1008
508
2
126
632502
594
88
603
97
343
849 211
717
295
801
560
54
558
52
213
719578
317
823
318
824
351
857
316
822
208
714
339
845
336
842
72
576
70
574
68
324
830
255
761 529
23
1003
497
526
20 245
751
567
533
27
531
25 61 532
26 255
761 351 576
70
560
54
574
68
245
751
567
61 578
558
52 72
343
849
339
845
336
842 295
801
594
88526
20
324
830
603
97
532
26 213
719
533
27529
23
1003
497
318
824
316
822
531
25
208
508
714
2 502996
211
717
317
8231008 490
126
632
628
122
185
691 551
45
286
792777
271
341
847
270
776
593
87 340
846
557
298
80451 311
817540
530
2434
246
752 491
997
55 861
561 55
561 286 839
792 246
752
557
51 843 551
777
27145
341
847
340
846
270
776
593
87 311
817
298 537
804 540
34 530
24 491
997
1851009
691 628
122
123
629
1009
503
627
121
125
631 337
843
333
839 346
852
338
844 515
9721
347
853
256
762575
69 537
31
527
21 355 355 793
861 333 346
852
347
853
575
69 337
338
844 31
9527
515 21 503627121 1631
6 23
29
125
630
124 552
46
554
1012
506
633
48
553
47210
716
127556
718
212
50 793
215
287 648
142 287762
256 552
46
553 215
47721
556
50
554
48 718
212210
716
1012
506
648
142630
124
633
127
−2
−2
332
838 332
838
555
49 555
49
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2
f1 f3
PC − Factor Scores − Rotated

649
650
144 145
651
143654
148
2
125
631 630
124
151152
657658 633
127
150
656149
655
160632
666 123
629
126 648
142
628
122 153
659
159
665121661
627 155 154
660
129
635140
646 139
645
146
652
128
634
647
141 374
880 375
881
163
669
167
673 158 1010
664 504
1011
505 165
671
219
725 678
172 131
637 653
147
138
644
136
642
132
638 492
998
137
643 130 996
636
489
995 490 386
892
491
997 407
913
387
893
389
895
164
670
162
668 161
667 1009
503
369
875 133
639 134
640 384
890
383
889 399
905
1
372
878 170
676
169
675 1012
506 210
716 380
886 390
896 393
899
406
912
401
907 388
894
259
765
183
689
371
877
260
766
184
690
264
770
265
771 218726
724
370
876 220373
879
1008
502
185
691 313
819
173
679 113
619
620
114
211
717
166
672
677
171
109
615
493
999408
914
106
612
718
212
376
882107
613
378
884
443
949
555
49
3897
8
382
888
950
403
909
444
363
869
397
903
157
663
396
902
357
863
948
442 79
85
402
908
391
395
901
449
955
377
883
448
954
394
900
530
24
381
887
527135
641
21
366
872
398
904
440
946404
910
947
441
367
873409984
915 385
891
764
258 262
768
263 687
769 181 767
261 267815
773 309315
821
680314
820 310
816
723
217 118
6241007
728
222
115116
621622501
316
822
318
824
1005
499 1006
317
823500
1003
497
358
864
110
616
108
614
105
611454
960
538
32
529
51523
362
868
156
662
531
25 364
870
927
421
453
959
452
958
392
898
9986 9
479
985
540
34
953
44728
431
22 478
405
911
182
688
595
179
685
221
727
89 692
598
92
186
7302618
174
597
91
224
112
508 266
772
319
825
312
818 8623
514
175
681 117
101
607 168
674
1000
494
208
714
554
48 104
610
1004
498
120
626
1001
495
119
625
213
719
320
826536
30
528
361
867
535
2922360
866
359
865
464
970
1002
496
983
477
462
968
480
460
966
533
463
969
27
311
817
524
18
537
450
956
468
974475
981
476
982
400
906
445
951
920
414874
368
226
732
187
693
268604180
686
733
227
225
731 596
90 228
734 223
729
178
684 603
97 209
715
102
608
207
713
365
871
216 542
36517
11
543
37 521
15
324
830 526
20
215
721977
471
410
916
534
978
47228 423
929
541
35 415
921
774 98 509
3 738 722
237
743
720
594
88 206
712
231
737
601
95 556
50
516
51810
298
804
12 617
111 532
26
−1 0
233 606
234
740
739 100 232
507
1682
238
744
176 214
235
741
297
803
592
86 683
177236
742
295
801
593
87 583
552
46
544
513
738
545
39 77
553
47
270
776
341
847
338
844520
980
47414
522 465
971
16979
473 461
967
487
993 459
965
469
975
470
976 539
33942 418919
924 413
775
269
605
99 602
96 591
85
273
779
296
802 322
828
321
827
294
800
551
599
9345 340
846
337
843
323
829
339
845
584
78
328
834
523
585
7917 436
f3
229 511
735 780
274 230
736
548
42
512
6 550
44
549
43325
831
343
849
600
94 519
13
777
336
842
75
581
327
833 586
27180
483
989
582
76 482
988 481
987 488
994
525
19466
972 435
941446
952 439
945
438
944
416
922
54
510 778
272326
832335
841
334
8407472 568
579
73
577578
71580 486
992 918
412
485 940
991 434917
411430
936419
925
281
787
283 788
789 307308
813814
280
786
282 306
812588
82558
52
330
836
560
54 557
3
851
346
852
329
835
332
838
31
37 347
853 62
575
69
484
990
103
609 432
938431
937
456
962
455
961
451
957 417932
437
943
429
935
923
420
926
426
275
781
276
782
783
277 279
785
559
53 590
84 576
70
748
242
248567
574
68
754 61246
752
245
751 433973
939 457
963
930
424
428
934
467964
342
848 305
811
278
784
571 81589
65 587 83
333
839
241
747569
302
80863 566
60 458425
931
188
694
304
810 240
746
303
809
239
745 243
749
755
249
570
64
757
251 753
247
565
59 933
427
190
696
254700
760 189
695
253
759 250
756
750
244
252
758
344
850
194
195
701 286
792
192
698 289
795
351
857
191
697
193
699 350
856
345
851 288
794
290
796
292
798
547
41 546
40 293
799
284711
790 204763
710
205 257 285797
791 291 255
761
202
708 256573
762 67 55
561
203 703
709 199 704
705
197 198 301
807572
349
85566854
287
793
348
196
702 300
806
354563
860 352 859
299
805
57858
353
562
56564
58
200
706
−3
201
707 356
862355
861
−3 −2 −1 0 1 2 3
f1
Figure: Factor scores - Principal Component estimation with Varimax


PF − Factor Loadings PF − Factor Loadings
1.0
X14
X6
0.4
X9X5 X12 X7
X1 X3 X5 X14
0.0
X13
q2
q3
X10
X7 X6
X11
0.0
X2X12 X3 X2 X1
X8 X10X9
X11
X8
−0.4
−1.0
X13
−0.5 0.0 0.5 −0.4 0.0 0.4
q1 q2
PF − Factor Loadings
1.0
X7
X14X12 X5
X3
0.0
X13
q3
X6 X11 X1
X8 X2 X10
X9
−1.0
−0.5 0.0 0.5

q1
Figure: Factor loadings - Principal Factors estimation MVAfacthous

PF − Factor Loadings − Rot PF − Factor Loadings − Rot
1.0
X14
X6 X1X9
X10
0.5
X5
X3X7
X2 X13 X11
X12
0.0
q2
q3
X8 X6
X9 X2 X14
X5
X10 X1 X7 X8
X12
−0.5
X11 X3
−1.0
X13
−0.5 0.0 0.5 −0.5 0.0 0.5
q1 q2
PF − Factor Loadings − Rot

1.0
X9 X1
X10
X5
X11X13 X3X7
0.0
q3
X2 X6
X8 X14
X12
−1.0
−0.5 0.0 0.5

q1
Figure: Factor loadings - Principal Factors estimation with Varimax


PF − Factor Loadings − Man. Rot.

1.0
X1
X5X7X3 X13
X9X10
0.5
X11
X7
X14 X12 X5
X3
0.0
X13
q2
q3
X6 X11 X1
X8X2 X10
X9
−0.5
X12
X6 X2
−1.0
X14 X8
−0.4 0.0 0.2 0.4 −0.5 0.0 0.5
q1 q2

1.0
X5X7X3
X14 X12
0.0
X13
q3
X1X6 X11
X2
X9X10 X8
−1.0
−0.4 0.0 0.2 0.4

q1
Figure: Factor loadings - Principal factors estimation with manual


PF − Factor Scores PF − Factor Scores
0 1 2 3 4
0 1 2 3 4
371
877
370
876 371
877
370
876
163
669 369879
875 373
372
878 373
879
372
878
369
875 163
669
258 162
764 668
164
670
167
673 162
668
164
670
167 764
673 258
263
769
226
732
268
774 226 774
732 268 769
263
234
740 234
740
225768
731 262 664 158 229731225
269 664 158
187 735
693 229
775
269
233
739 739
233 735 775 262
768
187
693
196711
702205204789
710 196 710
702 205
711
204
283
281605
787 99
254
760 733
227 771 259
765
265 365 410
871 254 980871
760 365 733
227
283787
789 281
605
99 259
765
265
771
284
790 980
474 408
914 474 867 410 914
408 284
790
604
98
181
687
183
689 261
767260
766 361 916
867 361916 98 767
604 261
260
766 181
687
183
689
203
709
257
763 264
770 160359
666 203 763
709 257 264
770 160
666
f2
f2
193
699 180
686 232
738
238
744228 773
734 267 667 161483
989 366
872 483699
989 193 238
744232
738
366
872228
734 161
667 180
686
191
697
292
798 305
811 307
813
190
696780 230
736 482 865
988 360
866
358
864
464
970 480
986 482 697
988 191917 359
865
190
696
480
986
464
970 360
866
358
864 230786
736
292
798 305813
811 307 267
773
547
41511
5509
282
788
274
280
786
304
810
276
782
100
606
783
277
3784
275
781
278 184
690
182
688
188
694 179741
685
730
224
235 223
729
159
665
166
672 473
979
481
987
169
675 465
971
151
657363
869
392
898
362
868
463
969
460
966454
960
367
873
423
929
368
874927
421
443
949
357
863
918
412
376
882
917
411 481
987 465
971
473
979 411
463
969454
960
423
929
304
810
460
966 392
898
918
412362
868
927
421
188
694
363
869
376
882547
443
949
357
863 41
367
873
368
874235
741 5 780
280
730
224
511
282
788
276
782
783
277
278
784
223
729275
781
274
166
672
100
606169
675
179
685
159 690
665
509
3 184151
657
182
688
200
706 510
4 192695
345
851 189176
682 221
727
101
607 486
992
170
676488
994467
973462
968
154
660 364
870
455
961
461
967
433
939
152
658 948
442
950
444 381
887 200
706
486
992 488
994 455
961
189
695
467
973
433
939 462
968
461
967
345
851
381
887 948
442
364
870
950
444 510
4682
176
221
727 170
676
101
607 154
660
152
658
562
564
58
197
703
56707
199
705
201
342
848 546
40
300
806
698
253
759
291
797
308
814
596
90
306
812
602
96684
279
785
218
724102
608
237
743
1010
504309
815 266 674
772
312
818
1001
618
112
315
821 495
168
133
639
165
671
484
990
485
991466
972
978
472
487
993
132
638
470
976
977
471156
662
155
661
468
974
451
957
434
940
456
962
459
965
469
975 953
447
452
958
453
959928
422
394
900
432
938
983
477
431
937
458
964
653
147 391
897
449
955428
934
436
942
435
941
448
954
479
985395
901377
883
378
884
396
902
920
414
437
446
952
450
956943
409
915403
909
397
903 379
885
419
925
382
888
484
990 470
976
201
707
485
991
300
806
978
472
469
975
759
487
993 197
703
192
698
466
972
428
934
977
471
253 562
432
938
431
937
458
964
564
58
56
451
957
199
705
434
940
456
962
459
965
435
941
419
925
452
958
453
959
983
477
468
974
437
943
479
985449
955
448
954953
447
928
422
394
900
4952
9 36
42
920
414 377
883
396
902
391
897
395
901
446
450
956 378
884
379
885
403
909
397
903
382
888 237
743
546
40
409
915
291
797 342
848 279
785
168
674
102
608
308
814
306
812
602
96165
671
309
815
312
818
1001
495
618
112 315
821
596
90 133
639
156
662
218 661
724
266
772
132
638 155
653
147 1010
504
571
65 198
704512
6293
799587
81 296
802
252
758
186
692
250
756
548
42 178
326
832
595
89 325
831
720
214 680
174 523
17
231
737
236
742
319
825
1011
505
314
820 1005
499 678
172
131
637 153
659 457
963
930
424
150
656429
935413
919
440
946
420
926380
886
947
445
951
441 252
758
250
756 457
963
930
424
420
926
198
704429
935 947
445
951
441380
886
413
919
440
946
571
65
326
832325
831
523
296
802
587
8117
231
737
236
742
293
799 548
42512 720
214
178
684678
172
319
825
61002 680
174
314
820
1005
499 595
89726131
637
186
692 153
659
150
656
1441011
505
354
860
285
791194
700
195
701
592
86
303
809
757
251
344
850
297
803
570
64
750
244 778
272
273
779
600
94 321
827
322
828
327
833
216
722
545
39
75
581
683
177 175
681
514
8829
207
713 220
726
1008
502
1000
617
111
520
14
494
536
30
103
609
623
117115
621
109
615
1002
496
118
624
105
611
310
816108
614
493
999171140663
129
635
138
644
136
642
677646 157
655650
144
146
652
149
425
931
933
427
475
981
476
982430
936
390
896
143
649
438
944
384
890
383
889
923
417418
924
402
908
426
932
398
904374
880
404
910
416
922
407
913 933
425
931
427757
251475
981
476
982
923
417
354
860
426
932
570
750
24464 438
944
430
936
303
809
418
924
344
850
416
922 402
908
404
910 390
896
75
581
398
904
384
890
383
889
327
833285
791194
700
321
827
322
828
297
803
407
913 536
30
103
609
623
117
545
39
514
374
880
520
195
701
14683
493
999
1000
494
592
86
216
722677
115
621
778
272
496
829
207
713
118
624617
111171
175
681109
615
273
779
105
611
600
94
310
816108
614220157
663
129
635
138
644
136
642 140
646 650
146
652
149
655
143
649
1008
502
350
856 301
807
507
352
858 1559
53 549
43
239
745
588
589
8382330
836
755
249
240
746
550
334
84044 185
691
753
247
565
59
569
63
597
91
577
71
598
92
599
9544
580 3 323
723
217
209
715 173
679
522
16
320
826119
625
525
19104
610
728
313
819
222
535
29
528 114640
110
616
620
1004
498
22
120
626116
622
113
619
106
612
126
632 134
137
643
135
641 148
654 478
984 405
911
439
945
387
893412
9
375
881 06 352 805
858 301753
807 755
249
247 565
59240
746405
911
478
984
239
745
330
836
569
63
350
856 406
912
439
945 559
53
387
893
588
577
589
8382
323
829
71 522
16177
549
43
550320
826
525
19
44535119
625
1004
120
626
375
881
334
840 110
616
104
610
728
222
620
114
528
498
22
116
622 173
679
599
93
113
619 134
640
313
819
597
91
135
6411 643
507
723
598
92
217
106
612 185654
691
137 148 126
632
57855
563 349 572
348
854
351
857
66
299
805
202
708 290
796 591
590
288
79484
289
795
560
54
85
335
841579
243
74973
339
845
336
842
74
513
594
88
294
800
241
747
302
808 603
97
248
754
7
38
328
8341009
206
712
582
76 503
213
719
543
37
584
601
9578
542
36
586
80
519
13 211
717
583
77
208
714
521
15
219
725
526
627
12120
628
122
318
824524
18
123
629
1007
501
107
613
1006
500
1003
497
538
32
128
634130
636
489
995
311 998
817 492
647
141 145899
651 393
389
895
401
907
386
892
385
891
400
906 399
905
388
894
415
921
299 56357572
66
348
854349
855
290
796
288
794
248
754 241
747
202
708
243
749
351
857
289
795 302
808 393
899
580
401
907
400
906
74
582
76
399
905
389
895
579
73
584
388
894
586
8078
590
84
385
891 513
328
834
386
892
583
77
519
13
7
294
800 521
15
526
335
841 20
544
38
591
524
18
336
842
53885
32
209
715
1007
501
206
7121006
500
594
88
1003
497
213
719
543
37
311
817
603
97
542
208
318
824
714
339
845
107
613
489
995
211
717
36998
492
601
95 130
636 128
634
219
725
647
141 1451009
651 503 627
121
628
122
123
629
356
862
558
52
331
837
508
574
682 578
341
847
566
60
329
835
593
87
295
801
340
846
551
45
576
70
337
843
748
242 585
79
72
777
271
518
12
270
776 324
830
516
10 533
316
82227
317
823
215
721 125
631
210
716 529
531
25
534
282335 645
541 139 648
142 356
862 331
837
329
835
576
7068921
574
566
60 415
748
242 560
585
7954
578
72518
12
516
10
558
52
324
830
295
801 533
551
4527
534
28 777
271
341
847
593
87
316
822
340
846
529
23
531
25317
823
337
843
541
35 508
270
776
215
721 2
210
716
139
645
648
142 125
631
255
761
286
792 353
859 343
849
573
67 338
844
557
51 552
46
567
61
553
47
245
751
298
804
568
62 1012
506 532
718
21226530
24
540
34
630
124
527
21633
127 353 861
859 245
751
573
67 567
255
76161568
62 343
849
557
286
79251298
804 552
46
532
26
553
47 530
24
338
844
540
34
527
21 718
212 1012
506 630
124
633
127
256
762861 333
839
35555 346
852
347
853 556
50554
246
752
575
69 517
11 539
33
537
31 355 561 246
752
575
69346
852
256
762
347
853 333
839
517
11 539
33
556
50537
31
48 515 490
996 55 9554
48 996
490
287 561
793 332
838 9 491
997 287 838
793 515
332 491
997
−2
−2
555
49 555
49
−2 −1 0 1 2 −2 −1 0 1 2 3
f1 f3
PF − Factor Scores
3
125
631
123
629 630
124633
127
628
122
627
121 126
632
1010
504
1009
503
1011
505
1008
502
2
1012
506
149
655 145
651
143
649
148
654
650
144
181
687 152
658
151
657
139
645
647
141
154
660
153
659653
147146
652
648
142
150
656
183
689
182
688
184691
185 128
634
160
666
140
646155
661
180690
693 686
187 186
692
258
764
263
769 768 259
765
265
262771
261
767
219 637
725
260
766
264
770
267
773 131
129
635
138
644
137
643
136
642130
636 157
663
218
724723
217 158
664 134
640
133
639
1
507
31 596
509 508
90
598
92
597
289
268
774
59591 601
95
270
776
220819
726
167
673
162
668
163
669
164
670
173
679
211
717
266
772
313 106
612
107
613
108
614
210
716
718
212
109
615
159
665
161
667 135662
132
638
641
492
998
156
489490
995996491
997
273
779
600
94599
93
179
685 680
174
175
681 314
820
215
721
315
821
213
719 105
611
310
816 1006
500
113
619
110
616 169
675
677
171
170
676
165
671
678
172
99604
98
100 594
88 542
36 555
491000
494
317
823 1007
501493
999
166
672
f3
284
790 281605606
787 780
274
307
813 339
845
775
269
337
843
338
844
593
308
814
602
96
592
86 87
778
272603
97
341
847
340
846
720
214
178
684 209
715
543
77737101
607
208
714
102
608
206
712
271
216
722 309
815
319
825
316
822
617
111 1005
499
104
610
312
818
318
824
1001
495
1002
496 1003
497
115
621
528
22
728
119
625
222116
622527
62021
530
114
24
168
674
540
34
311
817
226
732
591
85
306
812225
731
176
682
336
842 553
47
544
38
683
177 554
48
207
713 618
112
221
727
320
82620
535
29
524
18
103
609
533
623
27
117
118
624
536
30 529
531
25
1004
52649823
537
538
3231
541
35879
539
33 373
372
878
283
789 512
511
510
45
282
788 275
781
6811
305 279
785
335
841
334
840
234
740
276
782
280
786 550
44
549
43 552
551
4546
733
227545
39
514
556
50
2
7828
34
730
224 515
9120
626
521
15
525
19 534
28
532
26 369
875
371
877 375
881
374
880
548
42 324
830
0
205
711 342
848
204
710 194 784
195
701
700 783
277
278
333
839332
838
229
735
297
803 321
827
322
828
295
801
294
800513
7
519
13
298
804 522
236
74216
223
729
520
14 370 872
876 366 408 890
914 407
913
257
763285
791 546
40 293
799
292
798
291
797 558
52
588
82
590
84
560
589
587
8154
83 233
739
296
802
557
51
577 232
738
230
736235
518
741
12
328
834
325
831 231
737
523
17
517
583
323
829
51677
1011
237
743 367
873 384
383
889
378
884 387
893
386
892
389
895
65547
571 41
286
792 559
53 343
849 7172
579
73
326
832
238
744
5 78
327
833
585
58079
75
74
581 584
78
586
780
5826 363
869368
874
3
8443
949
64
70409
915 377
883
390
896
440
946
397
903 385
891
380
886
398
904
379
885
393
899
403
909 388
894
287
793 568
62 362
868
392
898
358
864 410
916442
928
422
927
421
953
447
394
900 376
882
413
919
396
902
391
897
357
863
950
444
948 382
888
445
951
395
901 402
908
400
906 399
905
401
907
415
921
406
912
196 709
702 203 856 188
694
344
850302
808
346
852 748
242 360
866
361
867
365
871 462
968452
958
453
959450
956
449
955920
414
448
954
918
412
446
952
436
942 947
404
910
441
439
945
438
944405
911
418
924
350256
762
255
761
202
708
345
851
351
857 304
810
190
696 574
68
239
745347
853
569
63
576
70
243
749 567
566
809
240
746 60
241
747
303 61
575
329
83569 359
865 454
960
983
477
423
929
480
986
461
967
468
974
460
966
434
940476
982
479
985
429
935478
984
430
936
923
417416
922
426
932
437
943
420
926 381
887
354703
860
562
56 199
705349
855
198572
704 290
796
288
794
289
795 330
836
570
64
331
837
189
695
565
59 246
752
245
751 980
474 463
969
464
970 456
962
451
957435
941
930
424
457
963
475
981
455
961
459
965 917
411 419
925
58197
564 66
348
854 55
561
750
244
573
67 755
249
757
251 248
754 473
979467
973
977
471
466
972 432
938
431
937
458
964 428
934
57 699
563 191
697
193192
698
253
759 250
756
252
758753
247 465
971
978
472 433
939 425
931
201
707
200806
706 254
760 485
991 487
993
488
994 469933
975
470
976 427
301
807
300 355
861 482
988
481
987
483
989
486
992
299
805
356
862
−2
484
990
353
859
858
352
−2 −1 0 1 2
f1
Figure: Factor scores - Principal Factors estimation MVAfacthous

PF − Factor Scores − Rotated PF − Factor Scores − Rotated

877
371
370
876 371
877
876
370
372
878
373
879
369
875 369
875372
878
373
879
3
163
669
164
670
162
668
167
673 163
669
670
164
162
668
167
673
764
258 226
732 226
732 258
764
2
263774
769268 740
234 365
871
980 234
740 365
871
268410
774 263
769
229739
735
225
731
158
664 233 408
914 474
410 989
916
361
867 483 483
989 229980
735
739
233 474
225
731 361 916
867 408
914
158
664
262775
768 269
227 760
733 254 359
865
360
866
358
864
482917
988
480
986
464
970
376
882 411 254
760 482 733
988 775
269
227970359
865
464 480
986
360
866917
411 768
358
864 262
376
882
196
702 366 869
872 392
898
362
868
363 465
971
454
960
423
929 481
987
918
927
463
969
412
421
473
979
460
966
357
863 455
961
486
992 381
887
428
934 196
702 481 976
987
486
992 465
971
473
979 463
969
428
934
460
966
455
961 454
960
423
929392
898
918
362
868 366
872
381
887
927
412
421
363
869
357
863
187 711205 259
765 367
873368
874
443
949 948
442 433
939
488
994470
976 205
711 488
994470433
939 368
874
443
949
948
442
367
873 259
765
1
204
710 469
975 204990
710
1
693 265
771
283
789 467
973
461
967
950
444
462
968
364
870 451
957
434
940 4
9
484
990 35 925
41
432
938
459
965 419 283
789
484 469
975
467
973 451
957
461
967
187
693 435
941
462
968
434
940
432
938
459
965 419
925950
444 265
771
466
972
953 4942
436
920
414
978
472
456
962
977
377
883
471 487
993
458
964
431
937
437
943 466
972
487
993
978
472977
471458
964456
962
431
937 4
9
437
94336
42
364
870
920
414
953 377
883
f2
447
f2
193
699 452
958
394
900
453
959
928
422 3
8
485
99179
85 193
699 452
958
453
959447
394
900
928
422 379
885
605
99
281
787 261
767
604 264 666
203
709
770
98 766
260 160 238
744
232
738
230
736
228
734
161
667 191
697
190
696
730
224 166
672
396
902
378
884
391
897954
449
955
983
477
403
909
397
903
409
915
48
446
952
395
901
479
985
450
956
468
974 382
888
457
963
413
919 420
926
930
424
947
441
380
886
445
951
440
946 4
9
429
935
438
944
430
936933
427
25
31
418
924
923
417
426
932 203
709 190 991
191 696
697 485
281
787 605
99
238
744
232
738
230
736 228
734
933
427
604
98425
931
730
224
983
477
468
974
457
963
930
424479
985
429
935
448
954
446
952
449
955
450
956
420
926
430
936
923
417
426
932
396
902
395
901
391
897261
767
397
903
947
441
438
944
378
884
382
888
403
909161 666
667
264
770
380
886
260
766
409
915
413
919
445
951
166
672
440
946
418
924 160
284
790 257
763 267
773
292
798
280 657
786 159
665304
810
169
675
235
741
188
694
200
706
223
729189
695
170
676 390
896
384
890
383
889475
981
402
908
476
982 416
922
404
910
478
984405
911
439
945406
912 200 759
706 257
763
189
695284
790
304
810 292 811
798
188
694 786 235
280741 223
729475
981 476
982 416
922
478
984 405
911
439
945402
908
4
9
267
773
404
910169
675
390
896 06
12
159
665
384
890
383
889
170
676
181
687
183
689 305
811
307
813
780
274
547
41 151221
727
345
851
154
660201
707 192
698
168
674
156
662
253
759
165
671 374
880 398
904
407
913387
893
393
899 401
907 201
707 192
698
253 345
851 547
41 305 307
813
780
274 221
727 181
687
183
689168
674398
904
401
907
165
671
393
899 407
913
387
893 156657
374
880
662 151
154
660
180
686 5
511 276
782
783
277 197
703
101
607
152
658 237
743300
806 375
881389
895
386
892399
905
400
906415
921 300703
806197 5
511 276
782
783
277 237
743 101
607
1
6 80
86 389
895
399
905
415
921
400
906 386
892 152
658
282
788179
685
278
784
4781
275 56705
176
682 133
639
102
608
199 252
758 388
894
385
891 56705
199758
252
4784
278
282
788 275
781 176
682 102
608
179
685 132881
133
639
388
894
385
891 375
0
100
606 562 155
661
309
815 653
147
132
638
1001
495
312
818
618
112
325
831 678
172
523
17
231
737 250
756 562 250
756 325
831 100
606
523
17
231
737 309
815
618
1121001
495
312
818 678
172 638 155
661
653
147
184
690
182
688 3 510
509 308
814
218
724
596
90 306
812
546
40
602
96
266
772
564
291
797
279
785
58
720
214315
821 326
832
198
704
296
802
153
659
131
637
319
825
1000
494
149
655
236
742
150
656
1005
499303
809
157
663
493
999
536
30
650
146
652
144
103
609
327
833
623
117677
171757
251
301
807
570
64
750
244755
249 352
858
753
247 352 564
858 58
301
807198757
704 251
753
247
755
249
570
750
24464303
809 510
291
797
546
40 326
832
81802
296
327
833 236
742
308
814
509
306
812
279
785 3
602
96 90536
720
214
596 315
821
30
103
609
623
117
319
825 184
690
182
688
1005
499
218
724
1000
494
266
772
493
999677
171 637
131 157
663 150
656
153
659 650
146
652
144
342
848 571
65178
684587
680
174
293
79981
548
42 321
827
314
820
216
722322
828
129
635
138
644
545
39 1002
140
646496
8617
109
615
514
207
713111
105
611
75
581
115
621
118
624
344
850
136
642 239
745
520
14
143
649
119
625
620
114
323
829
728 330
836
1004
110
616498
525
222
134135
640641
310
816 19
120
626565
569
522
1663
240
746
572
535
320
82629
116
622
104
610 66
59
582
76
299
805
248
754
299
805
572
66
565
59
571
65
330
836
344
850
239
745
569
63
240
746
342
848
248
754
587
293
799 75
581
548
42 321
827
322
828
323
829 178
684
520
14
545
39
522
514
816
216
722
525
207
71319118
624
320
826
115
621
314
820
680
174
1002
496
119
625
617
111
120
626 728
1004
498
222
535
29 620
114
109
615
105
611
110
616
116
622
104
610
310
816 136
642
135
641
134
640 140 655
129
635
138
644646 149143
649
1010 595
186700
5041011
692 89
512
194 220
726
6592
86
273
779
600
94
175
681
778
272 108
614
297
803
354
860683
177
559
549
43
173
67953
350
856137
643
313
819
588
82
589
83
106
612
550
44
209
715
528
577
71
513
107
613
128
634
294
8007580
22
579
73
349
855 74
521
15
1007
501
113
619
290
796
148
654
526
57
202
708
56320241
747
302
808
328
834
1003
497
348
854
1006
500
130
636489
995
311
817
243
749
584
78
583
77
288
794
524
18
289
795
145
651585
79331
837
492
998
538
32
586
80329
835
566
60
576
70
356 859
862 353 353
859 354
860
356
862
57
563 348
854350
856
349
855 290
796
288
794
289
795
202
708 331
837241
747
302
808
243
749 559
194
700
329
835
566
60
576
70 53 6582
580
74
577
71
579
589
8373
588
8276
297
803
512513
586
80328
834
549
43
584
78
550
44
294
800
585
79
683
592
786
177
778
272
583
77 521
15
175
681
595
89
273
779
600
94
209
715
528
526
20
524
1822
538
32
186
692311
817
1003
497
173
679 108 995
614
1007
501
220
726
113
619
1006
500
313
819106
612
107
613 489 137
643
130
636
4921011
998 128
634
1010
504 148
654
145
651
505 285
791
195
701 723
217
597
91 334
840
599
93
219
725
591
85 206
544
38
712
213
719
590
84
543
37 351
857
318
824
647
208
141
714 578
72
519
13574
324
83068
529
533
2723
748
242
531
518
1225
516
10541
35
534
28
567
61 245
751 351
857 285
791
574
68748
242
195
701
245
751567
61590
84 334
840
578
72519
13
518
12
516
10
591
85544
38
324
830206
712533
599
93
543
3727
534
28
213
719
318
824
208
714 529
531
597
912523
541
35
723
217 219
725 505 647
141648
1 598
185
691
1008
502
507 594
92 84588 542
335
841
603
97
336
842
601
95
339 36
211
717
560
54
777
271
551
45 316
822
317
823
139
645
295
801
215
721 648
142 532
530
2426
540
34
568
527
2162
573
67 246
752
355
861
539
33 355 761
861 67 752
573 246 560
5462841
568 335
295
801
336
842
551
45 594
88
777
271
507
339
845 542
36
197532
603 316
822
598
92
26 211
717
317
823
530
24
540
539
601
9533
215
7213 4
185
691
527
21 502 645
1008 139 142
341
847
340
846
593
87
337
843 210
716
558
52255
761
343
849
552
46
718
212
553
47 298
804
557
51 517
11
537
31575
69
55
561
347
853 490
996 255
55 852
561 575
347
85369 558
343
84952
51 804
557 298517
11
552
46341
847
340
846
593
87
337
843
553
47 270537
31210
716
718
212 490
996
1009
503 508
126
632
627
121 2 776
270
338
844333
839
286
792 556
50
256
762 346
852
515
9 491
997 256
762 346
286 839
792 333 556 338
844
50 515
92776
508 491
997
1009
503
627
121126
632
628
122
123
629
125 1012
631 554
48 554
48 628
122123
629
125
631
−2
332
838
−2
630
124633
127506 287
793 287
793 332
838 1012
506 633
127
630
124
555
49 555
49
−2 −1 0 1 2 −2 −1 0 1 2
f1 f3
PF − Factor Scores − Rotated

630
124633
127
125
631
123
629 143
649
148
654
650
144 145
651
126
632 149
655146
652
2
628
122
627
121 150
656 648
142
152
658
151
657 154
660 653
147
153647
659
1010
5041011
1009 160506661
503666
155140
646
129
635128
634
138
644
141
139
645
157
663
156636
662 375
881
502 1012
505
1008 131
637137
643
136
642 130 491880
997
490
996 374407
913
159
665133638
639 134135
640641
132 492
998
489
995 386
892
387
893
1
219
725 384
890 385
891
389
895 399
905
388
894
406
912
258 668
764
259
765
265
771
158767
664 260
766
261264
770
373
879 267 667
773 161266675
772 169
106
612
170
676
166
672
108
614
313
819107
613
165
671678
677
172
171
408
914
493
999
378
884
377
883
409
915
376
882397
903
8 380
886
3889
383
79
85
390
896
440
946
403
909
413
919 393
899
402
908
398
904
382
888 401
907
400
906
404
910
415
921
405
911
262
768
167
673
163
669 372 691
878 185 220
726
723
217 168
674
109
615
105
611 210
716
1006
500
718
212
113
619
110
616366
872
1007
501 527
21
367
873368
874
443
949 396
902
950
444 395
901 947
441
381
887
445
951 439
945
418
924
438
944
263162
769 164
670
689
181
687 184
690
183688
182 369692
875
371
877 218
724 173
679 1005
499
211
717
310
816
1000
494
1001
495
314
820
1003
497
620
114
116
622
104
610
115
621317
823
215
721
311
817
530
410
916540
34
24
541
35
363
869
529
23 364
870
537
31357
863
539
33 391
897
948
442
927
421928
422
394
900
953
447
918
412 920
414
446
952
4942
9 48
54
450
956
449
955
436 416
922
419
925
923
417
426
932
478
984
430
936
180 876
370 186 598 92
680
174
101
607
597
91 309
815312
818
315
821
601
95
319
825
270
776213
719
617
111 1004
498
119
625
528
728
22
222
1002
496
318
824316
822
535
29524
526
2018538
32
531
25
534
28362
868
392
898
555
49
532
26
358
864 452
958
454
960 917
411
453
959 479
985 420
926
437
943
429
935
476
982
268 686
774 175
681618
112 118
624
208
103
609
714
623
536
30
117 120
626533
27 435
941
542
36 48 866
360 423
929
480
986 983
477
f3
595
89 102
608
2727
508
273
779 221
599
93209
715543
37320
826
521
15
525
19
365
871
361
867 359
865 462
968434
940
461
967 451
957
456
962
455
961 930
424
428
934
475
981
0
187
693 1685
179
596
90
507 600
94
720
214
178
684 216
722206
712
603
594
8897
207
713
223
729 777
271
341
847
340
846 522
16 554 515
9970460
966
463
969
464 468
974 457
963
432
938
431
937
459
965
226
732775
269
225
731509 780
3 733
274
100 814
606 228
734
602
30896 730
224
778
272 339
845
337
843
544
593
87
177838
338
844
545
39
514
683 520
14
553
47
231
737
523
17
236
552
74246324
830
556
50 517
11 971467
973 458931
964 425
933
427
604
98 227
307
813 176
682
592
86
306
812
232
738 591
85
235
741 322323
828829
321
827
336
842 237
743
551
45
513
7328
834 583
77
980
474
516
518
1210
298
804 465
473
979 433
939
977
471
466
972 469
975
605
99
234
740 275
781 279
785
230
736
238
744 549
43 325
831
550
44800
335
841
334
840 294 519
295
80113584
78 978
472470
976
487
993
281 735
229739 297
803 332
838
787 233305
811
5
511
282
788
276
782
280
786
278
7846 548
783
277
512 42296
802326
832
588
82
590
589
8384
327
833
577
71
560
54
73
333
839
558
527578
75
581
580
579 586
557
343
8495180 568
585
79
4582
76
72 62 482
988
483
989
488
994
485
991
481
987
283
789 510
4 587
293
799
291
797 81559
53 808 748
242567
61575
69 486
992
284711
790
292
798
342
848
547
195
701
194
700
41791546
40 188
694 303
809
344
850 302
569
63
240
746 566
241
747
574
6860
329
835
576
243
749
330
836 70
346
852347
853 246 990
752
245
751 484
204
710
205 285 304695
810 286
792 239
745
570
64
565
59 3
8 248
754
31
37
257
763 65 696
571 190345
851 189 287
793
290
796
202
708
255
761 755
249
289
795
750
244
288
794
256
762
757
251
250
756 753
247
573
6755
561
350
856 351
857 252
758
203 760
709 191
697 198
704 253572
192
698
349
855
759 66
348
854
−2
196
702 199
705
254193
699
197
703
562
56 354
860 355
861
564
58
201 563
707
200
706 57807
300
806 301805
299 356
862
352 859
858 353
−2 −1 0 1 2
f1
Figure: Factor scores - Principal Factors estimation with Varimax


Cluster Analysis 12-1
Cluster Analysis
Find groups (clusters) in a multivariate dataset.

A group should be as homogeneous as possible.
Differences between groups should be as large as possible.
Cluster analysis is a set of tools for building groups (clusters)

from multivariate data objects.

Application in Medicine
Cluster Analysis is used for medical imaging

Differentiate between various types of tissue and blood in a
3D image on PET scans
Allows for accurate measurement of the rate of a radioactive
tracer which is delivered to the destination of interest
and IMRT segmentation

Divide a fluence map into distinct regions for conversion into
fields in MLC-based radiation therapy

Application in Biology
Movement and habitat preferences of various bird species are

indicators environmental health
Cluster movement into segments of motion
Map the spatial distribution of behaviours
Explore where, when and for how long birds from different
colonies engaged in each activity, during different stages of
the breeding season

Application in Economic Growth Analysis
Formalizing differences in economic growth paths for various

groups of countries w.r.t. diversity of products and its relatedness
between each other
The product space is a network based clustering of products
with high values of similarity and tradability
Visualization of heterogeneity between countries via maximum
spanning trees (MST)

Application in Tourism Management
Track travel behaviour and preferences of certain groups w.r.t.

age, origin etc.
Destination mapping with cluster analysis efficient
development management of attractions, accessibility and
amenities of specific areas

Application in Text Mining
Analyzing warranty or insurance claims, diagnostic interviews,

etc.
Interpret semantic spaces described by words or documents
that were extracted and create a mapping of these words into
a common space, computed from the word frequencies or
transformed word frequencies.
Use CA to identify groups of documents to identify groups of
similar input texts.

Application in Accounting and Insurance
Identify anomalies in accounting and insurance fraud.

Use CA to automate fraud filtering during an audit.
Identify suspicious transactions or observations.

Cluster Analysis in 2 Steps
Cluster analysis can be divided into two fundamental steps.
1. Choice of a proximity measure

Find a way how to define which elements are close to each
other
2. Choice of a group-building algorithm
Find a way how to assign objects to the clusters on the basis
of the chosen proximity measure

Example: U.S. Health Data
This is a data set consisting of 56 measurements of 22 variables. It

states for one year (2005) the reported number of deaths in the 56
states and associated regions of the U.S. classified according to 15
categories:
X1 : State (Area) X12 : Nephritis

X2 : Total number of deaths (All) X13 : Accidents
X3 : Human immunodeficiency virus (HIV) X14 : Vehicle Accidents
X4 : Malignant neoplasms (Malignant) X15 : Suicide
X5 : Diabetes X16 : Assault
X6 : Alzheimer X17 : Firearms
X7 : Heart X18 : Population
X8 : Cerebrovascular (TIA) X19 : Area km2
X9 : Influenza X20 : Region
X10 : Respiratory Diseases X21 : Devision
X11 : Liver X22 : state abbreviations (ANSI)
from Härdle and Hlávka (SMS), 2015 Springer

Division (X21) Region (X20)

New England 1 Northeast 1
Mid-Atlantic 2 Midwest 2
E N Central 3 South 3 ME
W N Central 4 West 4
NH
S Atlantic 5 NY
E S Central 6
W S Central 7
Mountain 8
Pacific 9

Maine New York New Hampshire
X1 All 12868 152427 10194
X2 HIV 11 1644 13
X3 Malignant 3218 35556 2549
X4 Diabetes 385 4051 310
X5 Alzheimer 476 2065 376
X6 Heart 2941 51985 2530
X7 TIA 693 6622 497
X8 Influenza 352 5521 273
X9 Respiratory Diseases 830 6818 630
X10 Liver 116 1224 114
X11 Nephritis 250 2360 173
X12 Accidents 579 4645 477
X13 Vehicle.Accidents 192 1530 162
X14 Suicide 175 1189 162
X15 Assault 22 901 19
X16 Firearms 109 1019 88
X17 Population 1321505 19254630 1309940
X18 Area.km2 79883 122057 23187
X19 Region Northeast Northeast Northeast
X20 Division New England Mid Atlantic New England
X21 ANSI ME NY NH

Data Mining
Extract (unknown) structures or ”knowledge”

Segmentation, Classification, Clustering
Statistical learning algorithms
I Support Vector Machines (SVM)
I Supervised learning
I Calibration by crossvalidation

Summary: Cluster Analysis
Cluster analysis is a set of tools and methods for building

groups (clusters) from multivariate data objects.
Two fundamental steps:
I The choice of a proximity measure
I The choice of a group-building algorithm
maximize simultaneously:
I homogeneity within groups
I heterogeneity between groups

Proximity between Objects

X (n × p) with n measurements (objects) of p variables.
Proximity (similarity) between objects is measured by a matrix
D(n × n)
 
d11 d12 . . . . . . . . . d1n
 . .. 
 .. d . 
 22 
 . . . . 
 .. .. .. .. 
 
D= . . . . 
 .. .. .. .. 
 
 . . . . 
 .. .. .. .. 
 
dn1 dn2 . . . . . . . . . dnn
dij = proximity between the i-th and the j-th object
Proximity
D contains measures of similarity or dissimilarity.
If the dij are distance values, they measure dissimilarity.
If the dij are proximity measures, they measure similarity.
Example for a distance:
dij = kxi − xj k2 , xi , xj ∈ Rp

Distance and Similarity
Distance and similarity are dual. If dij is a distance then

dij0 = maxi,j {dij } − dij is a proximity measure.
Nominal values (like binary variables) yield in general

proximity values.
Metric values lead (in general) to distance matrices.

Example: US health 2005 data: ME, NH, NY
Euclidean distance
 
0.0 850.1 59775.4
 
D= 0.0 60532.1 
0.0
Mannheim distance
 
0.0 1811.0 108574.0
 
D= 0.0 110381.0 
0.0
Maximum distance
 
0.0 669.0 49044.0
 
D= 0.0 49455.0 
0.0 SMSdisthealth05
Projects
Which other distance / proximity measures do you know?

Proximity with binary data
Example
Consider a data matrix X with column means x k . Define the
indicator variable yik = I(xik > x k ), i = 1, . . . , n. How do these
”deviations from the means” cluster?

Example: Everitt and Dunn (1998) binary drug data
M 16 − 29 30 − 44 45 − 64 65 − 74 75 + +
N 683 596 705 295 99
Y 21 32 70 43 19
F 16 − 29 30 − 44 45 − 64 65 − 74 75 + +
N 738 700 847 336 196
Y 46 89 169 98 51
Table: A Three-way Contingency Table: count data for presribed

psychotropic drugs, top table for men and bottom for women in the age
categories: 16-29, 30-44, 45-64, 65-74 and 75 years and older, DY:
taking drugs, DN: not taking drugs, n = 5833.
MVAdrugsim
Proximity with binary data
Basic information on similarity between binary objects:

(xi , xj ); xi> = (xi1 , . . . , xip ), xj> = (xj1 , . . . , xjp ), xik , xjk ∈ {0, 1}.
xik = xjk = 1,
xik = 0, xjk = 1,
xik = 1, xjk = 0,
xik = xjk = 0.

p
X
a1 = I(xik = xjk = 1),
k=1
p
X
a2 = I(xik = 0, xjk = 1),
k=1
Xp
a3 = I(xik = 1, xjk = 0),
k=1
Xp
a4 = I(xik = xjk = 0).
k=1
The following proximity measures are used in practice
a1 + δa4
dij =
a1 + δa4 + λ(a2 + a3 )
Name δ λ Definition
a1
Jaccard 0 1 a1 +a2 +a3
a1 +a4
Tanimoto 1 2 a1 +2(a2 +a3 )+a4
a1 +a4
Simple Matching (M) 1 1 p
a1
Russel and Rao (RR) – – p
2a1
Dice 0 0.5 2a1 +(a2 +a3 )
a1
Kulczynski – – a2 +a3
Table: Common similarity coefficients.

Example: Let us consider a binary data set computed from the car
data set. (
1 if xik > x k ,
yik =
0 else,
for i = 1, . . . , n; k = 1, . . . , p
Consider data points 17–19: Renault 19, Rover, and Toyota

Corolla.
This leads to (3 × 3) matrices.

Jaccard measure
 
1.000 0.000 0.400
 
D= 1.000 0.167  ,
1.000
Simple Matching
 
1.000 0.000 0.625
 
D= 1.000 0.375  ,
1.000
Tanimoto measure
 
1.000 0.000 0.455
 
D= 1.000 0.231 
1.000 SMScarsim
Distance measures for continuous variables
A great variety of distance measures can be generated by

Lr -norms, r ≥ 1,
( p )1/r
X
dij = ||xi − xj ||r = |xik − xjk |r . (21)
k=1
xik denotes the value of the k-th variable on object i.

Example
French railway metric
Let (X , d) be a metric space (France), and fix xh ∈ X (Paris). Set
r = 1 in (1) and define a new metric dh on X by letting

0 xi = xj
def
dh (xi , xj ) =
d + d otherwise
ih hj
for xi , xj ∈ X . Then (X , dh ) is again a metric space.
Quiz 1: What is the Mannheim metric?

Figure: Map of french railroad network
Source: Runde, V. (2005), A Taste of Topology. Ch. 2.

Example
Mannheim metric
Let (X , d) be a metric space (Mannheim). Set r = 1 in (1) and
define a new metric dm on X by letting
def
dm (xi , xj ) = dij
for xi , xj ∈ X . Then (X , dm ) is again a metric space.

y
Mannheim B
(x1 , x2 )
L2
L2
0 LM
A
x
L2 = x12 + x22
LM = LMannheim = |x1 | + |x2 | = L1 metric
Quiz 2: What is the Karlsruhe metric?

Figure: Map of Mannheim around 1800

Source: http://historic-cities.huji.ac.il
Example
Karlsruhe metric
Let (X , d) be a metric space (Karlsruhe). Set r = 1 in (1) and
represent xi , xj in polar coordinates: xi = (dih , ϕih ), xj = (djh , ϕjh ),
where dih , djh are the distances from a fixed point xh and ϕih , ϕjh
the angles from a fixed direction. Define:

min(d , d ) · δ(x , x ) + |d − d | 0 ≤ δ(x , x ) ≤ 2
def ih jh i j ih jh i j
dk (xi , xj ) =
d + d δ(xi , xj ) > 2
ih jh
for δ(xi , xj ) = min(|ϕih − ϕjh |, 2π − |ϕih − ϕjh |). Then (X , dk ) is

again a metric space.

Figure: Examples of Polar Coordinates
Source: Wikipedia

Figure: Map of Karlsruhe
Source: Proceedings of the 14th International Workshop on Graph-Theoretic Concepts in Computer Science,
Voronoi Diagrams in the Moscow Metric, Rolf Klein (1989), Springer-Verlag

Example
x1 = (0, 0), x2 = (1, 0) and x3 = (5, 5)
distance matrix for the L1 -norm:
 
0 1 10
 
D1 =  1 0 9  ,
10 9 0
Squared L2 -(Euclidean) norm:
 
0 1 50
 
D2 =  1 0 41  .
50 41 0
The third observation x3 receives much more weight in the
L2 -norm than in the L1 -norm.
An underlying hypothesis for application of Lr distances is that the

variables are measured in the same scale.
If this is not the case, use a more general metric (A > 0):
dij2 = kxi − xj kA = (xi − xj )> A(xi − xj ). (22)
Example
A = diag(sX−1
1 X1
, . . . , sX−1
p Xp
) gives
p
X (xik − xjk )2
dij2 = .
sXk Xk
k=1
The distances do not depend on the scale of variables.

Mahalanobis Distance
Take A = Σ−1 or S −1 in (2):
dij2 = (xi − xj )> S −1 (xi − xj ).
Example
! ! ! !
0 4
X ∼ N2 ,Σ , Y ∼ N2 ,Σ
0 0
!
1 ρ
Σ= , ρ = 0.9
ρ 1

Example
4
2
0
y
−2
−4
−2 0 2 4 6

Example
! !
0 4
n = 100, µx = , µy = , ρ = 0.9
0 0
4
x2, y2
0
−4
0 4
x1, y1
SMSmdmv
Calculate the D with A = I−1 −1
100 , A = S100 .

Example: French Food expenditures
The data set consists of the average expenditures on food for

several different types of families in France (manual workers =
MA, employees = EM, managers = CA) with different numbers of
children (2, 3, 4, or 5 family members).

Example: French Food expenditures

Euclidean proximity (L2 -norm):
 0.00 5.82 58.19 3.54 5.15 151.44 16.91 36.15 147.99 51.84 102.56 271.83

0.00 41.73 4.53 2.93 120.59 13.52 25.39 116.31 43.68 76.81 226.87
 0.00 44.14 40.10 24.12 29.95 8.17 25.57 20.81 20.30 88.62 
 
 0.00 0.76 127.85 5.62 21.70 124.98 31.21 72.97 231.57 
 0.00 121.05 5.70 19.85 118.77 30.82 67.39 220.72 
4  0.00 96.57 48.16 1.80 60.52 28.90 29.56 
D = 10 ·  
 0.00 9.20 94.87 11.07 42.12 179.84

 0.00 46.95 6.17 18.76 113.03 
 0.00 61.08 29.62 31.86 
 0.00 15.83 116.11 
0.00 53.77
0.00

Example: Using A = diag(sX−1

1 X1
, . . . , sX−1
7 X7
).
⇒ Mahalanobis proximity D ( L2 -norm):
 
0.00 6.85 10.04 1.68 2.66 24.90 8.28 8.56 24.61 21.55 30.68 57.48
 0.00 13.11 6.59 3.75 20.12 13.13 12.38 15.88 31.52 25.65 46.64 
 
 0.00 8.03 7.27 4.99 9.27 3.88 7.46 14.92 15.08 26.89 
 0.00 0.64 20.06 2.76 3.82 19.63 12.81 19.28 45.01 
 
 0.00 17.00 3.54 3.81 15.76 14.98 16.89 39.87 
 
 0.00 17.51 9.79 1.58 21.32 11.36 13.40 
D= 
 0.00 1.80 17.92 4.39 9.93 33.61 
 0.00 10.50 5.70 7.97 24.41 
 
 0.00 24.75 11.02 13.07 
 
 0.00 9.13 29.78 
 0.00 9.39 
0.00

Cluster Analysis for Contingency Tables
If X is a contingency table:
xij
row i characterized by the conditional frequency xi• ,
x
column j characterized by the conditional frequency x•jij ,
P P P P
where xi• = pj=1 xij , x•j = ni=1 xij , x•• = pj=1 ni=1 xij .

Cluster Analysis for Contingency Tables
Distance between rows i1 and i2 :

p
X 2
1 xi1 j xi j
d 2 (i1 , i2 ) =
x•j
− 2
xi1 • xi2 •
j=1 x••
Similarly for columns:

n
X 2
2 1 xij1 xij
d (j1 , j2 ) = − 2
xi• x•j1 x•j2
i=1 x••

Example: Cluster Analysis for US Crime data
This is a data set consisting of 50 measurements of 7 variables. It
states for one year (1985) the reported number of crimes in the 50
states of the United States classified according to 7 categories
(X3 –X9 ):
variable description
X1 : land area (land)
X2 : population 1985 (popu 1985)
X3 : murder (murd)
X4 : rape
X5 : robbery (robb)
X6 : assault (assa)
X7 : burglary (burg)
X8 : larcery (larc)
X9 : auto theft (auto)
X10 : U.S. states region number (reg)
X11 : U.S. states division number (div)

Example
Used are just the 7 crime variables. The matrix is interpreted as
contingency table. Derived is the distance matrix D.
 
0.000 0.004 0.002 0.230 0.142 0.034 . . . 0.009
 
 0.000 0.011 0.172 0.098 0.014 . . . 0.001 
 
 0.000 0.272 0.176 0.051 . . . 0.019 
 
 
 0.000 0.010 0.087 . . . 0.146 
D= 


 0.000 0.037 . . . 0.079 
 
 0.000 . . . 0.008 
 .. 
 ..
. 
 . 
0.000
MVAclususcrime
Q-Correlation Distance
Apart from the Lr -distance measures, the Q-correlation coefficient

can be used
Pp
k=1 (xik − x i )(xjk − x j )
dij = P Pp ,
p
(x − x ) 2 (x − x ) 2 1/2
k=1 ik i k=1 jk j
where x i denotes the mean of xi1 , xi2 , . . . , xip .

Summary: Proximity between Objects
The proximity between data points is measured by a distance

or similarity matrix D whose components dij give the
similarity coefficient or the distance between two points xi , xj .
There exists a variety of similarity (distance) measures for
binary data (e.g. Jaccard, Tanimoto, Simple Matching
coefficients) and for continuous data (e.g., Lr -distances).
The nature of the data could impose to choose a particular
metric A for defining the distance (standardization, χ2 -metric
etc.).

Cluster algorithms
Main types of clustering methods

hierarchical algorithms
partitioning algorithms
Both techniques find groups and elements and assign to clusters.

Main difference: in hierarchical clustering this assignment is not
changed; in partitioning techniques the assignment may change
during the algorithm.

Spectral Clustering
Originating from Graph Theory

Partitional, hierarchical technique based on proximity measure
Model-free (i.e., compactness, normality) and data-driven
Computationally efficient, easy to implement
Suitable for large data

Group Building
Cost of partitioning Y into P and Q groups, Y = P + Q
X
Cut(P, Q) = djk
xj ∈P,xk ∈Q
sum of all ”neglected” similarities between elements of P and

Q
Normalized cut
Cut(P, Q) Cut(P, Q)
Ncut (P, Q) = X + X
djk djk
xj ∈P,xk ∈Y xj ∈Q,xk ∈Y
Minimizing Ncut w.r.t. minimizing Cut(P, Q) and maximizing

X
djk
xj ∈P,xk ∈Y
Normalized cut (NCUT)
Shi and Malik (2000) showed:

y > Zy def
argmin Ncut (P, Q) = argmin = argminQ(y ),
Y=P+Q y y > Vy y
s.t. y is an indicator vector and y > V1n = 0, where

P P
V = diag( nj=1 d1j , . . . , nj=1 dnj ) and Z is a Laplacian matrix
1 1
Z = In − V − 2 DV − 2 .
If y is relaxed to take real values, solution to argminQ(y ) is given

y
by Zγ = λγ (look at a small eigenvalue)
Laplacian matrix properties

Pn 2
a
1. ∀a ∈ Rn a> Za = 1 √ai − √j
2 i,j=1 dij νi νj
1
2. 0 is eigenvalue of Z with eigenvector V 2 1n
3. Z is positive semi-definite and has n eigenvalues
0 = λ1 ≤ . . . ≤ λn
Proof:
1 1 1
Ad.1 Recall Z = In − V − 2 DV − 2 . Let b = V − 2 a, bi = √ai
νi
n n n n
X X ai aj X X
a> Za = ai2 − √ √ dij = νi bi2 − bi bj dij
i=1 i,j=1
νi νj i=1 i,j=1
n n n
! n
1 X 2
X X 2 1X
= νi bi − 2 bi bj dij + νi bi = dij (bi − bj )2
2 i=1 i,j=1 i=1
2 i,j=1

cont’d
Ad.2
1 1 1
One has to show: (In − V − 2 DV − 2 )V 2 1n = 0n .
By definition of V it follows that 0 and 1n are the eigenvalue and
vector of In − V −1 D, respectively
(In − V −1 D)1n = 0n
1 1
(V 2 − V − 2 D)1n = 0n
Ad.3
Follows from (1)


Normalized cut (NCUT)
How to find P ∗ and Q ∗ , (P ∗ , Q ∗ ) = argmin Ncut (P, Q)?

Y=P+Q
1. Construct Laplacian matrix Z from D = {djk }k,j=1,...,n

2. Spectral decomposition of Z: Z = ΓΛΓ>
3. Take eigenvector γn−1 that corresponds to the
second-smallest eigenvalue in Λ: λn−1
4. Calculate the median m(γn−1 ) over all coordinates of γn−1 ,
5. ∀xk ∈ Y if k-coordinate of γn−1 ¡ m(γn−1 ) xk ∈ P, otherwise
xk ∈ Q

Example
8 points
6 8
4
7
2
brand loyalty
0 4
3
5
-2
1
2
-4
-4 -2 0 2 4
price conciousness
Figure: The 8 points example MVAclus8p

Example
Build Laplacian matrix Z for the 8 points
 
3.26 1.95 4.07 3.47 1.06 1.91 0.71 0.33
1.95 2.47 1.46 0.90 2.40 2.28 0.13 0.10
 
 
 

 4.07 1.46 3.56 4.22 1.01 2.14 1.40 0.71 

 3.47 0.90 4.22 3.22 0.62 1.55 1.64 0.77 
Z=
 


 1.06 2.40 1.01 0.62 2.42 3.24 0.24 0.30 


 1.91 2.28 2.14 1.55 3.24 3.23 0.83 0.87 

 
 0.71 0.13 1.40 1.64 0.24 0.83 2.17 2.41 
0.33 0.10 0.71 0.77 0.30 0.97 2.41 1.73

Example 8 points
6 8
4
7
2
brand loyalty
0 4
5
-2
2
-4
-4 -2 0 2 4
price conciousness
Figure: The 8 points example; second smallest eigenvalue: 0.9 and

corresponding eigenvector
(−0.68, −0.68, 0.15, 0.10, 0.13, 0.02, 0.08, 0.08)> MVAclus8psc

Example
Calculate Cut and Ncut for a division of x1 , . . . , x6 , given by a
similarity matrix D, into 2 subsets P and Q, where |P| = 1 or
P = {x1 , x2 , x3 } , {x1 , x3 , x4 } , {x1 , x4 , x5 } , {x1 , x5 , x6 } , {x1 , x2 , x4 } ,
{x1 , x2 , x5 }.
 
1.0 0.8 0.6 0.0 0.1 0.0
 
 0.8 1.0 0.8 0.0 0.0 0.0 
 
 0.6 0.8 1.0 0.2 0.0 0.0 
 
D= 
 0.0 0.0 0.2 1.0 0.8 0.7 
 
 0.1 0.0 0.0 0.8 1.0 0.8 
 
0.0 0.0 0.0 0.7 0.8 1.0

Example
P {x1 } {x2 } {x3 } {x4 } {x5 } {x6 }

Cut 2.3 1.6 1.6 1.7 1.7 1.5
Ncut 1.5 1.3 1.3 1.4 1.4 1.3
P P1 P2 P3 P4 P5 P6
Cut 0.3 3.2 3.1 2.9 2.8 3.0
Ncut 0.1 1.8 1.8 1.4 1.2 1.6
Table: Cut and Ncut for subsets: |P| = 1 (upper panel) and
P1 = {x1 , x2 , x3 }, P2 = {x1 , x3 , x4 }, P3 = {x1 , x4 , x5 }, P4 = {x1 , x5 , x6 },
P5 = {x1 , x2 , x4 } and P6 = {x1 , x2 , x5 } (lower panel).

Example
Build Laplacian matrix Z of the similarity matrix D
 
1.5 −0.8 −0.6 0.0 −0.1 0.0
 
 −0.8 1.6 −0.8 0.0 0.0 0.0 
 
 −0.6 −0.8 1.6 −0.2 0.0 0.0 
 
Z= 
 −0.8 0.0 −0.2 2.5 −0.8 −0.7 
 
 −0.1 0.0 0.0 0.8 1.7 −0.8 
 
0.0 0.0 0.0 −0.7 −0.8 1.5

Example
Find eigenvalues Λ and eigenvectors Γ of Laplacian matrix Z
 
0.4 0.2 0.1 0.4 −0.2 −0.9
 
 0.4 0.2 0.1 0.0 0.4 0.3 
 
 0.4 0.2 −0.2 0.0 −0.2 0.6 
 
Γ= 
 0.4 −0.4 0.9 0.2 −0.4 −0.6 
 
 0.4 −0.7 −0.4 −0.8 −0.6 −0.2 
 
0.4 −0.7 −0.2 0.5 0.8 0.9
Λ = diag {(0, 0.3, 2.2, 2.3, 2.5, 3)}
How do we construct clusters?

Example
Second-smallest eigenvalue λn−1 = 0.3

Corresponding eigenvector γn−1 = (0.2, 0.2, 0.2, −0.4, −0.7, −0.7)>
median(γn−1 ) = −0.1
Split at value −0.1 (or 0)
Cluster P: positive signs x1 , x2 , x3 = 0.2 > −0.1

Cluster Q: negative signs x4 , x5 , x6 = −0.4, −0.7, −0.7

Normalized cut (NCUT) spectral clustering

Hierarchically divide Y into pre-specified number of clusters
K (top-down):
1. Find the division P ∗ and Q ∗ ,

(P ∗ , Q ∗ ) = argmin Ncut (P, Q)
Y=P+Q
2. Decide if the current partition should be

subdivided, i.e. set YK = P ∗ , Q ∗
3. Recursively partition the segmented parts, if

necessary
Empirical Results 13-65
Empirical Results: Clustering
20
−20
−20 0 20
Figure: Simulated set of points to be clustered. Specclust

20
−20
−20 0 20
Figure: Parcellation results for the simulated data into 4 clusters by NCut
algorithm based on the Euclidean distance. Specclust

fMRI
15000
10000
5000
Figure: fMRI image observed every 2 sec, 12 horizontal slices of the

brain’s scan, 91 × 109 × 91(x, y , z) data points of size 22 MB; voxel
resolution: 2 × 2 × 2mm3
Proximity between Voxels

Yt,j - BOLD signal observed at voxel j with
3D coordinates Xj = (xj , yj , zj ), j = 1, . . . , J
Proximity measure wjk between Yj and Yk
(
max {Corrt (Yj , Yk ), 0} , for kXj − Xk k < C
wjk =
0, otherwise
C - fixed distance, such that {ũ : kXũ − Xk k < C } is a 3D

√
neighborhood (3 3mm); Corrt - Pearson correlation over 2 × 1400

Empirical Results: Clustering
Number of clusters: 1000; cluster index s, s = 1, . . . , 1000

I 200: interpretability (anatomical atlases i.e. Talairach)
I 1000: more accurate functional connectivity patterns
min max mean median Total

1 353 207.4 208 1000
Table: Descriptive statistics of clustering results averaged over subjects.

Computational time: 19 × 30 hours

10
80
== Z =>
= X => <= Y ==
60
40
20
== Y =>
Figure: Parcellation results for the 1st subject’s brain into 1000 clusters
by NCut algorithm.
Example: orange spiral data
4
2
0
-2
-4
-3 -2 -1 0 1 2 3
Figure: Spectral clustering with 4 centers for simulated spiral data.

MVAspecclustspiral

Hierarchical Algorithms, Agglomerative

Techniques
Agglomerative algorithms:
1. Construct the finest partition, i.e. each point is one cluster.
2. Compute the distance matrix D.
DO
3. Find the two clusters with the closest distance.
4. Unite the two clusters into one cluster.
5. Compute the distance between the new groups and obtain a
reduced distance matrix D.
UNTIL all clusters are agglomerated.
After unification of P and Q one obtains the following distance to

another group (object) R
d(R, P + Q) =
δ1 d(R, P) + δ2 d(R, Q) + δ3 d(P, Q) + δ4 |d(R, P) − d(R, Q)|
δj weighting factors
Pn
Denote by nP = i=1 I(xi ∈ P) the number of objects in group P.

Name δ1 δ2 δ3 δ4
Single linkage 1/2 1/2 0 -1/2
Complete linkage 1/2 1/2 0 1/2
Average linkage 1/2 1/2 0 0
(unweighted)
nP nQ
Average linkage nP +nQ nP +nQ
0 0
(weighted)
nP nQ n n
Centroid nP +nQ nP +nQ
− (n P+nQ )2 0
P Q
Median 1/2 1/2 -1/4 0
nR +nP nR +nQ nR
Ward nR +nP +nQ nR +nP +nQ
−n 0
R +nP +nQ
Table: Computations of group distances.
Pn
where nP = i=1 I(xi ∈ P) denotes the number of objects in
group P.
Example:
x1 = (0, 0), x2 = (1, 0), x3 = (5, 5) and the squared Euclidean
distance matrix with single linkage weighting.
Recall:  
0 1 50
 
D2 =  1 0 41 
50 41 0
The algorithm starts with N = 3 Clusters.
P = {x1 }, Q = {x2 }, R = {x3 }.

The single linkage distance between the remaining two clusters:
1 1 1
d(R, P + Q) = d(R, P) + d(R, Q) − |d(R, P) − d(R, Q)|
2 2 2
1 1 1
= d13 + d23 − · |d13 − d23 |
2 2 2
50 41 1
= + − · |50 − 41|
2 2 2
= 41

0 41
The reduced distance matrix is then 41 0 .

Dendrogram
a graphical representation of the sequence of clustering

displays the observations, the sequence of clusters and the
distances between the clusters
On the vertical axis the index of points is given.
The horizontal axis gives the distance between the clusters.

8 points
6 8
4
7
2
brand loyalty
4
0
3
5
-2
1
2
-4
-4 -2 0 2 4
price conciousness
Figure: The 8 points example MVAclus8p

Example The distance matrix D (L2 distances) is

 
0 10 53 73 50 98 41 65
 
 0 25 41 20 80 37 65 
 
 0 2 1 25 18 34 
 
 
 0 5 17 20 32 
D= 


 0 36 25 45 
 
 0 13 9 
 
 0 4 
 
0

single linkage dendrogram
20
squared Euclidean distance
15
10
6
5
8
0
5
Figure: The dendrogram for the 8 points example, single linkage
algorithm, euclidean distance. If we decide to cut the tree at the level 10
we define three clusters: {1, 2}, {3, 4, 5} and {6, 7, 8}. MVAclus8p

Single Linkage Algorithm
defines as the distance between two groups the smallest value of

the individual distances.
d(R, P + Q) = min{d(R, P), d(R, Q)}
It is also called the Nearest Neighbor algorithm.

A consequence of this construction is that single linkage tends to
build large groups.

8 points single linkage dendrogram
6 8
20
4
15
7
2
brand loyalty
10
4
0
6
5
5
-2
2
1
8
0
4
2
5
-4
-4 -2 0 2 4
price conciousness
Figure: Single linkage algorithm on squared Euclidean distance for 8

point example with dendrogram. SMSclus8pd

Complete Linkage Algorithm
Tries to correct this by considering the largest (individual)

distances.
d(R, P + Q) = max{d(R, P), d(R, Q)}
It is also called Farthest Neighbor algorithm.

This algorithm will cluster groups where all the points are
proximate, since it compares the largest distances.

8 points complete linkage dendrogram

6 8
4
80
60
7
2
brand loyalty
40
4
0
20
3
−2
0
1
6
1
8
−4
5
2
−4 −2 0 2 4
price conciousness
Figure: Complete linkage algorithm on squared Euclidean distance for 8
point example with dendrogram. SMSclus8p

Average Linkage Algorithm
Compromise between the two nearest and farthest neighbor

average distance.
nQ
nP X
1 X
D(P, Q) = d(xi , yi )
nP nQ
i=1 i=1
Centroid Algorithm
is very similar and uses the natural geometrical distance between R
and the weighted center of gravity of P and Q.

8 points average linkage dendrogram
50
6 8
4

40
7
2
30
brand loyalty
20
4
0
10
−2
6
0
1
8
4
−4
5
2
−4 −2 0 2 4
price conciousness
Figure: Average linkage algorithm on squared Euclidean distance for 8
point example with dendrogram. SMSclus8pa

Centroid Algorithm
R
%H
@HH
% @ HH
% @ H
P Q
weighted center of gravity of P + Q
6
Centroid algorithm uses the distance between R and the weighted

center of gravity of P and Q.

8 points − centroid linkage centroid linkage dendrogram
40
6 8
4

30
7
2
brand loyalty
20
4
0
10
3
−2
6
0
1
8
4
−4
5
2
−4 −2 0 2 4
price conciousness
Figure: Centroid algorithm on squared Euclidean distance for 8 point
example with dendrogram. SMSclus8pc

Ward clustering algorithm
Main difference between Ward and linkage procedures is the

unification procedure.
The Ward algorithm does not put together groups with
smallest distance, rather it joins groups that do not increase
too much a given measure of heterogeneity.
The aim of the Ward procedure is to unify groups such that
the variation inside these groups is not increased too
drastically: the resulting groups are as homogeneous as
possible.

Ward clustering algorithm

Measure (Inertia) of heterogeneity for group R:
nR
1 X
IR = d 2 (xi , x R )
nR
i=1
x R center of gravity (mean).
IR is a measure of the group dispersion around its center of gravity.
When two objects or groups P and Q will be joined, the new group
P + Q will have a larger inertia IP+Q .
IP + IQ ≤ IP+Q
Corresponding increase of inertia is given by (see Table 5)

nP nQ
∆(P, Q) = d 2 (P, Q)
nP + nQ
Ward algorithm idea: “Join groups that give the smallest

increase in ∆(P, Q)”
Unification of P and Q: Ward algorithm is related to the
centroid algorithm - with ’inertial’ distance ∆ rather than the
’geometric’ distance d 2 .

20 Swiss bank notes
4
2
96
21
6577
57 118
154
second PC
51 9834 119 105

129 164
0
25
101
121
162
111
−2
161
−4
−4 −2 0 2 4
first PC
Figure: PCA for 20 randomly chosen bank notes MVAclusbank

Dendrogram for 20 Swiss bank notes 20 Swiss bank notes, cut height 60
100
4
80
Squared Euclidean Distance
2
96
60
21
6577
57 118
154
second PC
51 9834 105
129 119 164
0
25
40
101
121
162
111
−2
20
161
0
−4
118
154
105
164
162
121
101
119
129
161
111
34
77
65
57
96
98
21
25
51
−4 −2 0 2 4
Ward algorithm first PC
(a) The dendrogram for the 20 bank (b) PCA for 20 randomly chosen bank
notes, Ward algorithm notes
Figure: MVAclusbank

French Food data
4
MA5
2
EM5 MA4
EM4 MA3
second PC
0
CA5 EM3 MA2
CA2
CA3
CA4 EM2
−2
−4
−4 −2 0 2 4
first PC
Figure: PCA for the standardized French food expenditures

MVAclusfood

Ward Dendrogram for French Food French Food data, cut height 60
10 20 30 40 50 60 70
4
MA5
Squared Euclidean Distance
2
EM5 MA4
EM4 MA3
second PC
0
CA5 EM3 MA2
CA2
CA3
CA4 EM2
−2
0
−4
CA2
CA3
CA4
EM5
CA5
MA5
MA4
EM4
EM2
MA2
MA3
EM3
−4 −2 0 2 4
first PC
(a) The dendrogram for the (b) PCA for the standardized French
standardized French food expenditures, food expenditures
Ward algorithm
Figure: MVAclusfood
Example: US health data 2005

Perform the cluster analysis of the U.S. health 2005 data set.
Interest in the numbers of deaths related to diseases. Use
Euclidean distance with Ward clustering.
Clusters States
1 IL IN MI MO OH MA NJ GA NC TN VA
2 IA KS MN WI CT PR AL AR KY LA MD
MS OK SC WV AZ CO NV OR WA
3 NE ND SD ME NH RI VT VI GU AS MP
DE DC AK HI ID MT NM UT WY
4 NY PA FL TX CA

Ward dendrogram for US health
NC 3
GA 3
NJ 1
VA 3
MA 1
MO 2
IN 2
TN 3
OH 2
IL 2
MI 2
MD 3
WI 2
AL 3
WA 4
AZ 4
KY 3
LA 3
SC 3
MN 2
Euclidean distance OK 3
NV 4
WV 3
PR 5
CO 4
KS 2
OR 4
AR 3
IA 2
CT 1
MS 3
VT 1
ND 2
DC 3
WY 4
AK 4
GU 5
VI 5
MP 5
AS 5
ID 4
NH 1
HI 4
UT 4
RI 1
MT 4
DE 3
SD 2
NM 4
NE 2
ME 1
FL 3
NY 1
TX 3
PA 1
CA 4
350000 300000 250000 200000 150000 100000 50000 0
Figure: Cluster analysis of U.S. health data set using Ward algorithm and
Euclidean distance. SMSclushealth05
HIV Malignant Diabetes Alzheimer Heart

1 315.73 16597.09 2203.45 2104.36 19044.73
2 157.30 7727.10 1147.10 1150.15 8723.75
3 22.90 1626.70 237.65 221.05 1759.85
4 1198.60 38957.40 5219.80 4487.40 47907.80
TIA Influenza Respiratory Diseases Liver Nephritis
1 4261.00 1837.18 3827.82 721.73 1500.82
2 2066.75 883.25 1918.45 372.50 657.75
3 422.10 182.40 405.50 84.85 105.70
4 9716.80 4519.60 8725.00 2140.40 2619.00
Table: The averages of the U.S. health data set within the 4 clusters.
SMSclushealth05

Cluster 1: IL, IN, MI, MO, OH (Midwest), MA, NJ (Northeast),

GA, NC, TN, VA (South)
Medium to large states: population >60 mio.

Medium numbers of HIV, diabetes and hearth related deaths.
High numbers of cancer (Maliagnacy) related deaths.
Cluster 4: NY, PA, FL, TX, CA (regional inhomogeneous)

Large states: population >12 mio.
Highest numbers of HIV, cancer, diabetes and heart related
deaths.

US health
CA 4
2000
WA NC 3
MN
OR 42VA413 TX 3 FL 3
CO SC 4MA
WI
AZ 3
T4
IN223
N
OH 2
IL 2
ME KS 2
NE
ID
MT
NH
NM
UT PR 4112
IA
CT 51MO
44KY
32
MD
LA
3 333GA
3 23
SD
VT
AK 14OK
2
555414
HI
WV AR
0
WY
ND
DE
AS
MP
VI
GU
DC NV
RI 3
3 AL 3 NJ MI
1 2 PA 1
MS 3
-2000
PC2
-4000
-6000
-8000
NY 1
0 20000 40000 60000
PC1
Figure: Plot of the first two principal components of the U.S. health
2005 data. SMSclushealth05

Example: US cereal data

The dataset UScereal is from the R-package MASS: 65
measurements, 11 variables on cereals on the US market from
1993. Use the variables calories, protein, fat and sugars. Use
Euclidean distances with Ward clustering on standardized data.

C1: 100% Bran C23: Frosted Flakes C44: Oatmeal Raisin Crisp
C2: All-Bran C24: Frosted Mini-Wheats C45: Post Nat. Raisin Bran
C3: All-Bran with Extra Fiber C25: Fruit & Fibre: Dates Walnuts C46: Product 19
C4: Apple Cinnamon Cheerios and Oats C47: Puffed Rice
C5: Apple Jacks C26: Fruitful Bran C48: Quaker Oat Squares
C6: Basic 4 C27: Fruity Pebbles C49: Raisin Bran
C7: Bran Chex C28: Golden Crisp C50: Raisin Nut Bran
C8: Bran Flakes C29: Golden Grahams C51: Raisin Squares
C9: Cap’n’Crunch C30: Grape Nuts Flakes C52: Rice Chex
C10: Cheerios C31: Grape-Nuts C53: Rice Krispies
C11: Cinnamon Toast Crunch C32: Great Grains Pecan C54: Shredded Wheat ’n’Bran
C12: Clusters C33: Honey Graham Ohs C55: Shredded Wheat spoon size
C13: Cocoa Puffs C34: Honey Nut Cheerios C56: Smacks
C14: Corn Chex C35: Honey-comb C57: Special K
C15: Corn Flakes C36: Just Right Fruit & Nut C58: Total Corn Flakes
C16: Corn Pops C37: Kix C59: Total Raisin Bran
C17: Count Chocula C38: Life C60: Total Whole Grain
C18: Cracklin’ Oat Bran C39: Lucky Charms C61: Triples
C19: Crispix C40: Mueslix Crispy Blend C62: Trix
C20: Crispy Wheat & Raisins C41: Multi-Grain Cheerios C63: Wheat Chex
C21: Double Chex C42: Nut&Honey Crunch C64: Wheaties
C22: Froot Loops C43: Nutri-Grain Almond-Raisin C65: Wheaties Honey Gold

Euclidean distance
0 5 10 15 20 25 30 35
C46
C15
C53
C14
C19
C35
C47
C37
Empirical Results
C52
C54
C55
Euclidean distance.
C3
C57
C21
C8
C24
C60
C64
C41
C58
C10
C63
C30
C61
C45
C27
C56

C26
SMScluscereal
C51
C36
C42
C20
C59
C34
C49
C5
C16
C23
C28
C22
C39
Ward dendrogram for US cereal data
C62
C13
C17
C7
C65
C29
C33
C40
C44
C18
C12
C50
C11
C4
C9
C48
C38
C25
C6
C43
Figure: Cluster analysis of U.S. cereal dataset using Ward algorithm and
C32
13-103
C31
C1
C2
65 US cereals, cut height 15
C31
C3
2
C54
C55
C57
C10 C2
1
C63 C1
second PC
C46 C48 C32
C47 C15C60 C8
C64
C14 C61
C19
C53 C38
C37C58
C52 C30
C21
C24
C41 C51 C43 C12
0
C7 C36C6
C65 C18
C34 C50
C35 C42 C25
C39C59C26
C20 C49 C44
C16
C5 C29 C4
C22
C33
C62 C40
−1
C23 C11
C13
C17
C28 C45
C27C9
C56
−2 0 2 4
first PC
Figure: Plot of the first two principal components of the U.S. cereal
dataset. SMScluscereal

k - means Clustering
Fix k Clusters a priori

Assign observations to cluster j with mean x j
Computationally hard/iterative
Standard Algorithms do not yield unique allocation
time investment O(nρk+1 log n)

k - means Clustering
Minimize the within cluster sum of squares w.r.t.

S
S = {S1 , . . . , Sk }, kj=1 Sj = {1, 2, . . . , n} :
k X
X
Ŝ = argmin = ||xi − µj ||22 (23)
S j=1 i∈Sj
The k - means standard algorithm is iterative starting from

random partitions/points.

Standard Algorithm
(t)
Fix an initial set {µj }kj=1 , t = 1
ˆ = argmin||xi − µ(t) ||2

Assign: j(i) j
j
ˆ resulting in (new)
xi belongs then to cluster j(i)
partition
[k
(t)
Sj = {1, . . . , n}
j=1

(t+1) (t) −1 P
Update: µj = #Sj i∈S
(t) xi
j
Iterate: assign, update until convergence.

1 graphics . off ()
2 rm ( l i s t = l s ( a l l =T) )
3
4 eight = c b i n d ( c ( −3 , −2 , −2 , −2 ,1 ,1 ,2 ,4) , c (0 ,4 , −1 , −2 ,4 ,2 , −4 , −3) )
5 eight = eight [ c (8 ,7 ,3 ,1 ,4 ,2 ,6 ,5) ,]
6 r e s u l t s = kmeans ( e i g h t , 2 , a l g o r i t h m=” L l o y d ” )
8 points - k-means clustering
6 8
7
2
brand loyalty
4
0
5
-2
2
-4
-4 -2 0 2 4
price conciousness
Figure: SMSclus8km
Example: US health data 2005

For the 2005 US health dataset were a k-means clustering with 4
clusters performed. The used algorithm is the standard algorithm.

US health data
2000
CA
WA NC
MN
OR TX FL
VA
MA OH
COSCWITN
AZ IN
ME KS
PR
IAKY GA IL
MTNE
ID CT
0
NH
NM
UT
SD
VT
AKHI
ND
WY
DE WVARMD
LAMO
AS
MP
VI
GURINV
DC OKAL NJ MI PA
MS
PC2
−4000
−8000
NY
0 20000 40000 60000
PC1
Figure: SMScluskmhealth


For the US cereal dataset, see Slide 102 for the data description,
were also a k-means clustering performed. This time were used 3
clusters and the used algorithm was again the standard algorithm.

US cereal data
31
3
2
55
5754
1 2 10
1 48 63
32 604615 47
8 64
PC2
38 30 5853
61 19
14
52
37
2421
43 41
636 51 7
0
12
18
50 34 65
25 42 35
26 59
49
44 4 20 2939
2262
33 5 16
−1
40 11 2313
17
45 28
9 27
56
−4 −2 0 2
PC1
Figure: SMScluskmcereal

Example: Quantlet data

Keywords of the Quantlets for the book MVA and the project BCS.
Given are data which indicate similarity of the keywords between
Quantlets. It will be analysed with the k-means algorithm how
similar the Quantlets in MVA and BCS are by using the keywords
as the variables.
The algorithm will build 4 clusters, which will be represented
applying multidimensional scaling on a 2 dimensional graph.

● Cluster4
* * *
−1.
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Metric MDS for BCS quantlets Metric MDS for MVA quantlets
** *
* *
1.0
3
* * *
** *
* *
0.5
* * *
*
2
* *
** * **
0.0
*
* *
***
* ** * *
1
* *
−0.5
* ** *
* * * ** * *** * * *
* * * * *
−1.0
Cluster1 * *Cluster1 * ***** *0*
0
●
* ●
*
● Cluster2 ● Cluster2 *************
****
***
● Cluster3 * ● Cluster3 * *
Cluster4 * * * *
−1.5
● ● Cluster4
* *
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 −3 −2 −1 0
Figure: The clusters of the quantlets in BCS and MVA

MVAQnetClusKmeans
Metric MDS for MVA quantlets Metric MDS for All quantlets
* *
1
*
* ** ******** ******
3
* ** ********
******
********
*****
**
***0******
* ** * * ** *** *** * ** ** * ** * * ** ** ******
0
*
*
*
2
−1
*
*
Example: Quantlet data
The clusters of BSC are corresponding to

1: random numbers 2: visualization
3: distributions 4: dimension reduction
The clusters of MVA refer to

1: cluster analysis 2: multidimensional scaling
3: principal component analysis 4: scatterplot

Minimum spanning trees (MST)

Construct from an undirected graph G with positive edge weights a
set of edges that connects all vertices.
8 points
6 9 8
4
13 4
17 32 7
2
brand loyalty
25 34
36 20 45
18 85 58
4 25 80 65
0
2 34
5 3 37
1 58
−2
5 41 40
25 37
20 1
5
−4
2
−4 −2 0 2 4
price conciousness
Figure: 8 points example for minimum spanning tree. SMSclus8pmst

Spanning tree
Definition
Spanning tree is a connector, acyclical subgraph of G.
Try all possible spanning trees? NO!
Kruskal Algorithm
Prim Algorithm
Clustering into k-groups
Important in power networks, engineering, combinatorical
optimization.

Be greedy!
Kruskal: Sort edges in ascending order of cost.

Add next edge to tree T unless a cycle is
created.
Prim: Take any vertex, add cheapest edge to T that
exactly one each point

Spanning tree
MST: Given connected graph G with positive edge weights, find a
min weight set of edges that connects all of the vertices.
Def.: A spanning tree of a graph G is a subgraph T that is
connected and acyclic. 8 points
6 9 8
4 4
17 7
2
brand loyalty
4
0
2
3
1
−2
20 1
5
−4
−4 −2 0 2 4
price conciousness
SMSclus8pmst2
Property: MST of G is always a spanning tree.
Source of slides: http://www.cs.princeton.edu/~rs/AlgsDS07/14MST.pdf
Summary: Cluster algorithms
The class of clustering algorithms can be divided into two

types: hierarchical and partitioning algorithms.
Hierarchical algorithms start with the finest (coarsest) possible
partition and put groups together (splits groups) from step to
step.
Partitioning algorithms start from a preliminary clustering and
exchange group elements until a certain score is reached.
The agglomerative procedure depends on the definition of the
distance between two clusters. Often used distances are single
linkage, complete linkage, Ward distance.
Summary: Cluster Algorithms
The process of the unification of clusters can be graphically

represented by a dendrogram.
Hierarchical agglomerative techniques are frequently used in
practice. They start from the finest possible structure (all
data points form clusters), compute the distance matrix for
these clusters and join the clusters with the smallest distance.
This step is replied until all points are united in one cluster.

Summary:
After reading this chapter you should understand:

Basic concepts of Cluster Analysis
How the clustering algorithms work
Different types of clustering algorithms
The numerical aspects of Clustering
Important: The distinction between proximity and dissimilarity.

Discriminant Analysis 14-1
Discriminant Analysis
Discriminant analysis is used in situations where the clusters

are known a priori.
The aim of discriminant analysis is to classify one or more
observations into these known groups, minimising the error of
misclassification.

Discriminant Rule
A partition of the sample space
S
J
Rj = Rp
j=1
↑
partition
Rj corresponds to Πj population

ML Discriminant Rule
Density fj of population Πj .
Allocate x to Πj that gives the largest likelihood
Lj (x) = fj (x) = max fi (x)

i
This gives us a partition of the sample space.

Rj are constructed in such a way that random variables from
population Πj are likely to take values in Rj .
If some x lies in Rj , then “there is a good chance” that it in reality
comes from the population Πj .

The sets Rj for the ML discriminant rule:
Rj = {x : Lj (x) > Li (x) for i = 1, . . . , J, i 6= j}.
By classifying the observations into groups we encounter a

misclassification error. For J = 2 groups the probability of putting
x into group 2 although it is from 1:
Z
p21 = P(X ∈ R2 |Π1 ) = f1 (x)dx.
R2
Similarly the conditional probability of classifying an object to Π1

although it comes from Π2 :
Z
p12 = P(X ∈ R1 |Π2 ) = f2 (x)dx
R1

Expected cost of misclassification (ECM)
Misclassified observations create costs C (i|j) when X which

belongs to group Πj falls into Ri .
πj = prior probability of population Πj
pij = probability that x of Πj falls into region Ri
For two populations Π1 and Π2 ,
ECM = C (2|1)p21 π1 + C (1|2)p12 π2 .
Look for a classification rule that minimizes the ECM!

Theorem
The rule minimizing the ECM is given by

f1 (x) C (1|2) π2
R1 = x : ≥
f2 (x) C (2|1) π1

f1 (x) C (1|2) π2
R2 = x : <
f2 (x) C (2|1) π1
The ML rule is a special case of the ECM rule for equal
misclassification costs and equal prior probabilities πj .

Proof via example from credit scoring.

def
Π1 = bad clients that result in the cost C (2|1) if classified as 2
C (1|2) is the cost of loosing a good client classified as bad.
Let γ denote the gain if a client is classified as good and turns out
to be a good credit partner. The total gain of the discriminant rule
“client is good if he falls in R2 ”:
Z
G (R2 ) = −C (2|1)π1 I(x ∈ R2 )f1 (x)dx
Z Z
−C (1|2)π2 {1 − I(x ∈ R2 )}f2 (x)dx + γ π2 I(x ∈ R2 )f2 (x)dx

Proof cont’d.
Z
= −C (1|2)π2 + I(x ∈ R2 )[−C (2|1)π1 f1 (x)+{C (1|2)+γ}π2 f2 (x)]dx
Since the first term in this equation is constant, the maximum is

obviously obtained for
R2 = { x : −C (2|1)π1 f1 (x) + {C (1|2) + γ}π2 f2 (x) ≥ 0 }
This is equivalent to

f2 (x) C (2|1)π1
R2 = x : ≥
f1 (x) {C (1|2) + γ}π2
This corresponds to the set R2 above for a gain of γ = 0.

Example
Suppose x ∈ {0, 1} and
1
Π1 : P(X = 0) = P(X = 1) =
2
1
Π2 : P(X = 0) = = 1 − P(X = 1).
4
the sample space is the set {0, 1}.
Allocate x = 0 to Π1 and x = 1 to Π2 .
Hence R1 = {0}, R2 = {1}, R1 ∪ R2 = {0, 1}.

Example
Consider two normal populations
Π1 : N(µ1 , σ12 ),
Π2 : N(µ2 , σ22 ).
Then ( 2 )
1 x − µi
Li (x) = (2πσi2 )−1/2 exp − .
2 σi
x from Π1 , if x ∈ R1 = {x : L1 (x) > L2 (x)}.

Example (cont’d)
Note that L1 (x) > L2 (x) iff
2 2
σ2 x−µ1 x−µ2
σ1 exp − 12 σ1 − σ2 >1

µ1 µ2 µ21 µ22
x2 1
σ12
− 1
σ22
− 2x σ12
− σ22
+ σ12
− σ22
< 2 log σσ21 .

Example
Suppose that µ1 = 0, σ1 = 1 and µ2 = 1, σ2 = 21 :

1 p 1 p
R1 = x : x < (4 − 4 + 6 log(2)) or x > (4 + 4 + 6 log(2)) ,
3 3
R2 = R \ R1 .
If σ1 = σ2 then ( for µ1 < µ2 )
x from Π1 , if x ∈ R1 = {x : x ≤ 21 (µ1 + µ2 )},

x from Π2 , if x ∈ R2 = {x : x > 12 (µ1 + µ2 )}.

318 13 Discriminant Analysis
2 Normal Distributions
0.8
Densities 0.6
0.4 R1 R2 R1
0.2
0
-3 -2 -1 0 1 2 3
Figure: Maximum likelihood rule for normal distributions

Figure 13.1: Maximum likelihood rule for normal distributions. MVAdisnorm
MVAdisnorm
Hence x isMultivariate
Applied 1 (x ∈ R1 )Analysis
allocated to ΠStatistical if L1 (x) ≥ L2 (x). Note that L1 (x) ≥ L2 (x) is equivalent
to " ( 2 2 )#
Theorem
(a) Suppose Πi = Np (µi , Σ). The ML rule allocates x to Πj ,
where j ∈ {1, . . . , J} is the value that minimizes the square
Mahalanobis distance between x and µi :
δ 2 (x, µi ) = (x − µi )> Σ−1 (x − µi ) , i = 1, . . . , J .
(b) In the case of J = 2:
x ∈ R1 ⇐⇒ α> (x − µ) > 0 ,
where α = Σ−1 (µ1 − µ2 ) and µ = 21 (µ1 + µ2 ).

Sketch of the proof:

Part (a) of the Theorem follows directly from comparison of the
likelihoods. It says that x is allocated to Π1 if
(x − µ1 )> Σ−1 (x − µ1 ) < (x − µ2 )> Σ−1 (x − µ2 )
Rearranging terms leads to:
−2µ> −1 > −1 > −1 > −1

1 Σ x + 2µ2 Σ x + µ1 Σ µ1 − µ2 Σ µ2 < 0
This is equivalent to:
2(µ2 − µ1 )> Σ−1 x + (µ1 − µ2 )> Σ−1 (µ1 + µ2 ) < 0
1
(µ1 − µ2 )> Σ−1 {x − (µ1 + µ2 )} > 0 ⇒ α> (x − µ) > 0.
Applied Multivariate Statistical2Analysis
Bayes Discriminant Rule
x from Πj with prior probability πj .

Bayes Rule: allocate x to Πj if
πj fj (x) = max πi fi (x)

i
Comparison of two discriminant rules?

Admissible Discriminant Rules

Probability of misallocating x to Πi , if x from Πj .
Z (
1 if πj fj (x) = maxi πi fi (x),
pij = φi (x)fj (x)dx with φj (x) =
0 else
A discriminant rule with probabilities pij is as good as another

discriminant rule with probabilities pij0 if
pii ≥ pii0 for all i = 1, . . . , J.
is better if
pii > pii0 for at least one i
We call a discriminant rule admissible if there is no better one.
Probability of Misclassification for the ML rule

Suppose that Πi = Np (µi , Σ) and J = 2. Consider
p12 = P(x ∈ R1 | Π2 ). We have
p12 = P{α> (x − µ) > 0 | Π2 }

If X ∈ R2 , α> (X − µ) ∼ N − 12 δ 2 , δ 2 where
δ 2 = (µ1 − µ2 )> Σ−1 (µ1 − µ2 ) is the squared Mahalanobis distance
between the two populations, we obtain

1
p12 = Φ − δ
2
Similarly, we obtain the probability to be classified into population

2 although x stems from Π1 as p21 =Φ − 21 δ .
Classification with different covariance

matrices
Assume Σ1 6= Σ2 . The minimum ECM depends on ff12 (x) (x) .

1
R1 = x : − x > (Σ−1 −1 > −1 > −1
1 − Σ2 )x + (µ1 Σ1 − µ2 Σ2 )x − k
2

C (1|2) π2
≥ log
C (2|1) π1

where k = 12 log |Σ 1| 1 > −1 > −1
|Σ2 | + 2 (µ1 Σ1 µ1 − µ2 Σ2 µ2 ). Ri is defined
by quadratic function. This quadratic classification rule coincides
with the rule in the case Σ1 = Σ2 , since the term
1 > −1 −1
2 x (Σ1 − Σ2 )x disappears.

Summary: Discriminant Analysis
Discriminant analysis is a set of methods for distinguishing

between groups in data and allocating new observations into
groups.
Suppose that the data come from populations Πj with
densities fj , j = 1, . . . , J. The maximum likelihood
discriminant rule (ML rule) allocates an observation x to that
population Πj which has the largest likelihood
Lj (x) = fj (x) = maxi fi (x).

Summary: Discriminant Analysis
Suppose we have prior probabilities πj for the populations Πj .

The Bayes discriminant rule allocates an observation x to the
population Πj that maximizes maxi πi fi (x). All Bayes
discriminant rules (incl. the ML rule) are admissible.
For the ML rule and J = 2 normal populations, the
probabilities of misclassification are given by

p12 = p21 = Φ − 12 δ where δ is the square root of the
Mahalanobis distance between the 2 populations.
Discriminant rules should minimize the expected cost of
misclassification (ECM).
Discrimination Rules in Practice
Xj ∼ Np (µj , Σ), j = 1, . . . , J, nj obs in Πj

(x̄j , Sj ) estimates (µj , Σ)
PJ PJ
Sj
Su = nj n−J ,n= nj .
j=1 j=1
ML-Rule: allocate x to Πj such that j
minimizes (x − x j )> Su−1 (x − x j ) for j ∈ {1, . . . , J}.

Example
20 randomly chosen banknotes
Su pooled covariance estimate
b = Su−1 (x 1 − x 2 )
α
= (−12.18, 20.54, −19.22, −15.55, −13.06, 21.43)>
1
x = (x 1 + x 2 )
2
= (214.79, 130.05, 129.92, 9.23, 10.48, 140.46)>
separating hyperplane: α b> (x − x) = 0

By applying the discriminant rule to the whole dataset, we obtain
1 misclassification for the forged bank notes and 0
misclassifications for the genuine bank notes.
Example
Allocation regions for J = 3 groups:
1
h12 (x) = (x 1 − x 2 )> Su−1 {x − (x 1 + x 2 )}
2
1
h13 (x) = (x 1 − x 3 )> Su−1 {x − (x 1 + x 3 )}
2
1
h23 (x) = (x 2 − x 3 )> Su−1 {x − (x 2 + x 3 )} .
2
The ML rule is to allocate x to


 Π1 if h12 (x) > 0 and h13 (x) > 0
Π2 if h12 (x) < 0 and h23 (x) > 0


Π3 if h13 (x) < 0 and h23 (x) < 0.

Probabilities of misclassification
Example
In the above classification problem for the swiss bank notes we
have the following situation
predicted membership
genuine (Π1 ) forged (Π2 )
Π1 100 1
actual
Π2 0 99

The apparent error rate (APER) is defined as the fraction of

observations that are misclassified. The APER, expressed as a
percentage, is

1
APER = 100% = 0.5%.
200
For the calculation of the APER we use the observations twice.

The first time to construct the classification rules and the second
time to evaluate this rule. The APER = 0.5% might therefore be
too optimistic. MVAaper

An approach that corrects for this bias is based on the holdout

procedure.
1. Start with the first population Π1 . Omit one observation and
develop the classification rule based on the remaining
n1 − 1, n2 observations.
2. Classify the “holdout” observation using the discrimination
rule of Step 1.
3. Repeat the Steps 1 and 2 until all the Π1 observations are
0 of misclassified observations.
classified. Count the number n21
4. Repeat Steps 1 through 3 for the population Π2 . Count the
0 of misclassified observations.
number n12

Estimates of the misclassification probabilities are then given by

0
n12
0
p̂12 = n2
0
n21
0
p̂21 = n1 .
A more realistic estimator of the actual error rate (AER) is then

given by
0 + n0
n12 21
.
n2 + n1

Fisher’s linear> discrimination Function

Find a projection a x such that a good separation would be
provided, i.e., the ratio of the between-group-sum of squares to the
within-group-sum of squares is maximal.
Y = Xa
The within-sum-of-squares is
J
X J
X
Yj> Hj Yj = a> Xj> Hj Xj a = a> Wa
j=1 j=1
where Yj denotes the j-th submatrix of Y corresponding to

observations of group j and Hj denotes the (nj × nj ) centering
matrix. The within-sum-of-squares measures the sum of variations
within each group.
Example
Suppose there are J = 2 groups and Yj ∈ Rnj .
Recall H = In − n−1 1n 1> 2
n the centering matrix with H = H and
calculate
Hj Yj = (yj,1 − y j , yj,2 − y j , · · · , yj,nj − y j )>
to see that
nj
X
Yj> Hj Yj = (yj,i − y j )2 ,
i=1
where in slight abuse of notation we denote yj,i the observations in

Yj .

The between-sum-of-squares is
J
X J
X
nj (y j − y )2 = nj {a> (x j − x)}2 = a> Ba,
j=1 j=1
The total-sum-of-squares
Xn
(yi − ȳ )2 = Y > HY = a> X > HX a = a> T a
i=1
can be now decomposed as
total SS = within SS + between SS
a> T a = a> Wa + a> Ba
Fisher’s idea was to select an a that maximizes the ratio
a> Ba
.
a> Wa
Theorem
Ba >
The vector a that maximizes aa> Wa is the eigenvector of W −1 B
that corresponds to the largest eigenvalue.
Discrimination rule
We classify x into the group j for which a> x̄j is closest to a> x.
x → Πj where j = arg min |a> (x − x̄i )|.

i

13.2 Discrimination Rules in Practice 325
Swiss Bank Notes

14
12
Densities of Projections 10
4
Forged
2 Genuine
0
-0.2 -0.1 0 0.1 0.2
Figure: Densities
Figure 13.2: ofofprojections
Densities ofgenuine
projections of genuineandand forged bank
counterfeit notesnotes
by by
Fisher’s
Fisher’s dis-
discrimination function
crimination rule. MVAdisfbank MVAdisfbank
Applied
Note thatMultivariate Statistical
the allocation Analysis
rule (13.15) is exactly the same as the ML rule for J = 2 groups
and for normal distributions with the same covariance. For J = 3 groups this rule will be
Example (— bank notes dataset)

Xg denotes the 100 observations for genuine banknotes, and Xf
the 100 observations for the forged banknotes.
The “between-group-sum of squares” is defined as

100 (y g − y )2 + (y f − y )2 = a> Ba
for some matrix B, y = a> x. , and y g , y f denoting the mean for

the genuine and forged bank notes, y = 12 (y g + y f ).
The “within-group-sum of squares” is
100
X 100
X
{(yg )i − y g }2 + {(yf )i − y f }2 = a> Wa. (24)
i=1 i=1

Example
The resulting discriminant rule consists of allocating an
observation x0 to the genuine data if
a> (x0 − x) > 0,
with a = W −1 (x g − x f ), and allocating x0 to the forged data in

the converse case. In our case we have
a = (0.000, 0.029, −0.029, −0.039, −0.041, 0.054)> ·
Here we misclassify 1 genuine and no forged bank note.

Boston Housing
Example
Define groups according to the median value of houses Xe14 : in
group Π1 the value of Xe14 is greater than or equal to the median
of Xe14 and in group Π2 the value of Xe14 is less than the median of
Xe14 . Apply the linear discriminant rule (excluding Xe4 and Xe14 ).
True
Π1 Π2
Π1 216 40
Predicted
Π2 34 216
Table: APER= 0.146 for price of Boston houses. MVAdiscbh
Boston Housing
Example
The APER is biased since we use the data twice. A more
appropriate measure of precision is the AER using the
leave-one-out technique.
True
Π1 Π2
Π1 211 42
Predicted
Π2 39 214
Table: AER= 0.160 for price of Boston houses. MVAaerbh

Boston Housing
Example
Now define as in the cluster analysis chapter the groups via higher
quality of life and house, excluding Xe4 .
True
Π1 Π2
Π1 244 13
Predicted
Π2 7 242
Table: APER= 0.0395 for clusters of Boston houses. MVAdiscbh

Boston Housing
Example
True
Π1 Π2
Π1 244 14
Predicted
Π2 7 241
Table: AER= 0.0415 for clusters of Boston houses. MVAaerbh

Boston Housing
Example
Figure: Discrimination scores for the two clusters created from the
Boston housing data. MVAdiscbh

Summary: Discrim. Rules in Practice
A discriminant rule is a separation of the sample space into

sets Rj . An observation x is classified as coming from
population Πj if it lies in Rj .
The expected cost of misclassification (ECM) for two
populations is given by ECM = C (2|1)p21 π1 + C (1|2)p12 π2 .
The ML rule is applied if the distributions in the populations
are known up to parameters, e.g. for normal distributions
Np (µj , Σ).

The ML rule allocates x to that population that exhibits the

smallest Mahalanobis distance δ
δ 2 (x; µi ) = (x − µi )> Σ−1 (x − µi ).
The probability of misclassification is given by

1
p12 = p21 = Φ − δ ,
2
where δ is the Mahalanobis distance between µ1 and µ2 .

Classification for different covariance structures in the two

populations leads to quadratic discrimination rules.
A different approach is Fisher’s linear discrimination function
which tries to find a linear combination a> x that maximizes
the ratio of the ”between-sum-of-squares” and the
”within-sum-of-squares”. This rule turns out to be identical
with the ML rule in the case of J = 2 for normal populations.

Correspondence Analysis 15-1
Correspondence Analysis
Categorical scales
Attitudes, opinions and demographic characteristics, e.g.
gender, race, social class
Public health, ecology, education, marketing
Quality control: how soft to touch a certain fabric is, how
good a particular food product tastes, or how easy a worker
finds a certain task to be

Two-Way Contingency Table
Variable Z has I levels

Variable Y has J levels
This gives IJ combinations of levels of Z and Y
Count the responses (Z , Y ) and display this information in

rectangular table which has I rows and J columns.
In each cell the number of subjects having the combination of Z
and Y are given.

Example
 
4 0 2 6
  ← Finance
 0 1 1 2 
X =   ← Energy
 1 1 4 6 
← HiTech
5 2 7 14
↑ Frankfurt
↑ Berlin
↑ Munich
Joint distribution: πij = P(Z = i, Y = j) is the probability that Z

is equal to i and at the same time Y is j.
Marginal distribution of Z : πi• - probability that Z is equal to i
Marginal distribution of Y : π•j - probability that Y is equal to j

Independence
The association between Z and Y is given by their joint
distribution, the cdtl distribution of Z given Y , or the cdtl
distribution of Y given Z .
Z and Y are independent if for all i and j:
πi|j = πij /π•j = πi• ,
πj|i = πji /πi• = π•j , or
πij = πi• π•j .
πij denotes the unknown true probabilities.
The sample relative frequencies are denoted by pij = xij /x•• , where
xij are the absolute frequencies and x•• is the sample size.

Measures of Independence
Compare the response on two rows (probability that Z = 1):

1. difference of proportions π1i − π1h
2. relative risk π1i /π1h
3. odds ratio (π11 /π12 )/(π21 /π22 )
Independence implies that difference of proportions = 0, relative
risk = 1, and odds ratio = 1.

Sampling Distributions
The tests are often (not always) identical for all types of sampling.
Poisson sampling
(everything is random)
Multinomial sampling
(total number of observed subjects is fixed)
Independent multinomial sampling
(number of subject in each row or column is fixed)

Maximum Likelihood Estimates
Depending on the sampling distribution, we obtain different

likelihood functions (Poisson or multinomial distributions). The
ML estimate of πij is given by pij = xij /x•• , the relative frequency.
Under independence, the ML estimates of cell probabilities are
2
π̂ij = pi• p•j = (xi• x•j )/x•• .
Using algebra, we get the Likelihood-Ratio Test of Independence.

Ratio Test of Independence
The likelihood ratio statistic is equal to

Q Q I X
J
i j (xi• x•j )
xij X
G = −2 log Λ = −2 log x•• Q Q xij =2 log(xij /Eij ),
x•• i j xij i=1 j=1
def
Eij = (xi• x•j )/x•• (25)
Theory yields that −2 log Λ has, under the null hypothesis, an

asymptotic χ2 distribution with (I − 1)(J − 1) degrees of freedom.

Pearson Chi-Square Test
Using the estimates of the expected frequencies Eij
I X
X J
(xij − Eij )2
t= . (26)
Eij
i=1 j=1
Under the null hypothesis, the test statistic has χ2 distribution

with (I − 1)(J − 1) degrees of freedom.

Idea of the Proof:
Suppose that xij are independent Poisson variables,

E [xij ] = eij .
√
The standardized zij = (xij − eij )/ eij have an asymptotic
PP 2
N(0, 1), therefore zij are asymptotic χ2 with IJ − 1
degrees of freedom.
Replacing eij by their estimates Eij , we obtain the Pearson χ2
statistic t, see (26), and we loose (I − 1) + (J − 1) degrees of
freedom.

Example (Eye-Hair Example)

The “most classical example”.
EYE/HAIR black brown red blond SUM

d.brown 68 119 26 7 220
l.brown 15 54 14 10 93
green 5 29 14 16 64
blue 20 84 17 94 215
SUM 108 286 71 127 592
MVAcorrEyeHair

Example
Original table and values “expected under independence”.
[,1] [,2] [,3] [,4]

[1,] 68 119 26 7
[2,] 15 54 14 10
[3,] 5 29 14 16
[4,] 20 84 17 94
[,1] [,2] [,3] [,4]

[1,] 40.13514 106.28378 26.385135 47.19595
[2,] 16.96622 44.92905 11.153716 19.95101
[3,] 11.67568 30.91892 7.675676 13.72973
[4,] 39.22297 103.86824 25.785473 46.12331
MVAcorrEyeHair

Example
Contributions to Chi-Square statistic and its sum
> (E - X) ^ 2 / E
[,1] [,2] [,3] [,4]
[1,] 19.3459095 1.5214189 0.005621691 34.234171
[2,] 0.2278650 1.8313775 0.726334723 4.963290
[3,] 3.8168794 0.1190937 5.210886943 0.375399
[4,] 9.4210781 3.8004599 2.993334093 49.696722
> Chi2
[1] 138.2898
MVAcorrEyeHair
2
Pearson Chi-Square t has χ (9) cdf
Critical value (α = 0.05) is 16.919.

Example (Car data)

X3 : repair record 1978 (5 best, . . . , 1 worst)
X4 : repair record 1977
X13 : company headquarter (1 US, 2 Japan, 3 Europe)
Question of interest: is there some dependence between the
company headquarters and the repair record?

Example
> X3X4
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 0 0
[2,] 0 7 1 0 0
[3,] 2 2 19 5 0
[4,] 0 0 6 11 0
[5,] 0 1 1 4 5
> Chi2_X3X4
[1] 87.75024
MVAcorrCar
Critical value is 26.296 (α = 0.05)

Example
> X3X13
[,1] [,2] [,3]
[1,] 2 0 0
[2,] 8 0 0
[3,] 26 0 2
[4,] 8 5 4
[5,] 2 6 3
> Chi2_X3X13
[1] 31.32032
MVAcorrCar

Example
> X4X13
[,1] [,2] [,3]
[1,] 2 0 1
[2,] 10 0 1
[3,] 21 1 5
[4,] 13 5 2
[5,] 0 5 0
> Chi2_X4X13
[1] 33.60534
MVAcorrCar

Correspondence Analysis
The tests of independence in contingency table do not provide

any information about the structure of the dependency in the
table.
We can decide whether there is some dependency but it is
impossible to say “how the rows categories influence the
column categories”.
The aim of Correspondence Analysis is to
Investigate relationship (association) between two discrete
variables by deriving row and column indices from their
contingency table.

Example
The French “baccalauréat” data: region (e.g. Ile-de-France) and
modality (e.g. Philosophy)
Question: Do students in certain regions prefer certain modalities
or vice versa?
Percentages of the eight modalities for the Lorraine region:
A B C D E F G H
20.5 7.6 15.3 19.6 3.4 14.5 18.9 0.2

Example
Percentage of the eight modalities for all regions
A B C D E F G H
22.6 10.7 16.2 22.8 2.6 9.7 15.2 0.2
Lorraine seems to over-represent in the modalities E, F, G but

under-represents the specializations A, B, C, D.
Develop an index for the regions so that this
over-/underrepresentation can be measured in a single
number?
How can we weight the regions so that we can see in which
region certain modalities are preferred?
Example
n types of companies and p locations.
Contingency Table
x11 x12 · · · x1p x1•

x21 x22 · · · x2p x2•
.. .. . . . ..
. . . .. .
xn1 xn2 · · · xnp xn•
x•1 x•2 · · · x•p x••

Example
Suppose that n = 3, p = 3 and
 
4 0 2 6
  ← Finance
 0 1 1 2 
X =   1

 ← Energy
 1 4 6 
← HiTech
5 2 7 14
↑ Frankfurt
↑ Berlin
↑ Munich

Example
Location index:
P
n
x
sj ∝ ri x•jij with (company) weight vector r = (r1 , . . . , rn )>
i=1
Company index:
P
p
x
ri∗ ∝ sj∗ xi•ij with (location) weight vector
j=1
>
s∗ = s1∗ , . . . , sp∗
Simultaneously find r = (r1 , . . . , rn )> and s = (s1 , . . . , sp )>
such that proximity (distance) between ri and sj would
indicate positive (negative) association between the i th row
and the j th column.

Summary: Correspondence Analysis
The aim of the correspondence analysis is to develop indices

that show relations between variables in a contingency table.
The joint representation of these indices reveals relations
among the variables.

χ2 Decomposition
χ2 -test statistic for independence in a two dimensional contingency

table
Xn Xp
t= (xij − Eij )2 /Eij
i=1 j=1
where Eij (25) is the expected frequency in cell (i, j) under

independence:
xi• x•j
Eij =
x••

Under the hypothesis of independence: t has χ2(n−1)(p−1)
distribution.
Departure from independence can be measured by the matrix
C whose elements are defined as
1/2
cij = (xij − Eij )/Eij (27)
Notation
A = diag(xi• ),
B = diag(x•j )
Marginal row and column frequencies are a = A1n , b = B1n .
√ √
C b = 0, C > a = 0

Singular Value Decomposition of C
C = ΓΛ∆> .
R = rank(C) ≤ min(n − 1, p − 1)
1/2 1/2
Λ = diag (λ1 , . . . , λR )
λj Eval of CC > .
R
X 1/2
cij = λk γik δjk
k=1
R
X XX
>
tr (CC ) = λk = cij2 = t.
1 i j
SVD decomposes χ2 -value and not the total variance.

Recall the Chapter on “Decomposition of Data Matrices by

Factors”.
duality relations:
−1
δ k = λ k 2 C > γk
−1
γk = λk 2 Cδk
projections on rows and cols:
√
Cδk = λ k γk
√
C > γk = λk δ k
Thus, the eigenvectors δk and γk have almost the same properties

as the indices rk and sk which we would like to obtain.

Suppose that λ1 dominant so that

1/2
cij ≈ λ1 γi1 δj1 .
The matrix of “deviations from independence” can be well

described only by one pair of eigenvectors.
Similarly as in PCA or Canonical Correlation Analysis, the
eigenvalues correspond to explained “variance”.
Often λ1 , λ2 are dominant ⇒
percentage of total χ2 explained by γ1 , γ2 and δ1 , δ2 is large.

Define
rk = A−1/2 Cδk
sk = B −1/2 C > γk
and observe
rk = √1 A−1/2 CB 1/2 sk
λk
sk = √1 B −1/2 C > A1/2 rk
λk
rk and sk are called row and column factor, respectively.

Properties of the factors
1 >
rk = r a=0
x•• k
1 >
sk = s b=0
x•• k
and
λk
Var(rk ) = = Var(sk )
x••
P
λk / i λi is the proportion of variance explained by factor k.
Ca (i, rk ) = xi• rki2 /λk are the contribution of row i to the variance
of the (row) factor rk .
Example
In Belgium, a survey was done to account for people who regularly
read newspapers. The answers were classified according to regions
of residence and language of the newspaper (Flemish, French or
both).
We have 10 regions: Antwerp, Western Flanders, Eastern Flanders,
Hainant, Liège, Limbourg, Luxembourg, Flemish-Brabant,
Wallon-Brabant, city of Brussels.
The language of newspaper is denoted by the first letter.
v: Flemish (Vlaams)
f: French (Francais)
b: both (beide)
Altogether, we have 15 newspapers.
λj % variance cumulated %
183.40 0.653 0.653
43.75 0.156 0.809
25.21 0.090 0.898
11.74 0.042 0.940
8.04 0.029 0.969
4.68 0.017 0.985
2.13 0.008 0.993
1.20 0.004 0.997
0.82 0.003 1.000
0.00 0.000 1.000
So representations in two dimensions will be quite satisfactory (81 % of

the variance).

Ca (i, r1 ) Ca (i, r2 ) Ca (i, r3 )

va 0.0563 0.0008 0.0036
vb 0.1555 0.5567 0.0067
vc 0.0244 0.1179 0.0266
vd 0.1352 0.0952 0.0164
ve 0.0253 0.1193 0.0013
ff 0.0314 0.0183 0.0597
fg 0.0585 0.0162 0.0122
fh 0.1086 0.0024 0.0656
.. .. .. ..
. . . .
f0 0.0810 0.0188 0.0899
Total 1.0000 1.000 1.000
Absolute contributions for the row factor rk

Ca (j, s1 ) Ca (j, s2 ) Ca (j, s3 )

brw 0.0887 0.0210 0.2860
bxl 0.1259 0.0010 0.0960
anv 0.2999 0.4349 0.0029
brf 0.0064 0.2370 0.0090
foc 0.0729 0.1409 0.0033
for 0.0998 0.0023 0.0079
hai 0.1046 0.0012 0.3141
lig 0.1168 0.0355 0.1025
lim 0.0562 0.1162 0.0027
lux 0.0288 0.0101 0.1761
Total 1.000 1.000 1.000
Absolute contributions for the column factor

Example
The tables show for instance the important role of Antwerp and
the newspaper vb in determining the variance of both factors but
clearly the first axis expresses linguistic differences between the 3
parts of Belgium, the second axis shows a larger dispersion
between the Flemish region then the French speaking regions.

336 14 Correspondence Analysis
Journal Data
0.5 ve
lim
foc vc brf
vd
bk
for
va bxl fh
0 hai
vm bj fn fi
bl brw
ff fg
r2, s2
fo
lig
lux
anv
-0.5
vb
-1
-1 -0.5 0 -0.5 1 1.5

r1, s1
Figure: Projection of rows (the 15 newspapers) and columns (the
regions)
Figure 14.1: Projection of rows (the 15 newspapers) and columnsMVAcorrjourn
(the 10 regions)
MVAcorrjourn
Ca (i, r1 ) Ca (i, r2 ) Ca (i, r3 )

va 0.0563 0.0008 0.0036
Example (Interpretation)
High association between the regions and type of newspaper. In
particular vb (Gazet van Antwerp) is read in Antwerp (extremes in
the graph). The points on the left all belong to Flanders, whereas
those on the right all belong to Wallonia. Notice that the
Wallon-Brabant and the Flemish-Brabant are not so far from
Brussels. Brussels is near the center so not far as being an average
and also being near the bilingual newspaper.

Example
Apply correspondence analysis to the French baccalauréat data.
A: Philosophy, B: Economics and Social Sciences, C: Mathematics
and Physics, D: Mathematics and Natural Sciences,
E: Mathematics and Techniques, F: Industrial Techniques,
G: Economic Techniques, H: Computer Techniques.
The data were collected in 22 regions denoted by four-letter codes.
We have 202100 observations in 22 × 8 contingency table.
We did the analysis two times (with and without Corsica) because
the graphics suggests that Corsica is an outlier.

lux 0.0288 0.0101 0.1761
Total 1.0000 1.0000 1.0000
Table 14.3: Absolute contributions of column factors sk .
Baccalaureat Data
0.5
cors
0.4
r2,s2 0.3
0.2
pcha
aqui
0.1 laro Aprov bnor payl
midi
D cham Gfrac
pica cent
0 auve
bour
bret hnor
rhoa limo lorr
H E
ildfB alsa
nopc F
-0.1 C
-0.2 -0.1 0 0.1 0.2 0.3

r1,s1
Figure: Correspondence analysis with

Figure 14.2: Correspondence Corsica
analysis including Corsica MVAcorrbac
MVAcorrbac
eigenvalues λ percentage of variances cumulated percentage

2436.2 0.5605 0.561
1052.4 0.2421 0.803
341.8 0.0786 0.881
Applied 229.5
Multivariate Statistical 0.0528
Analysis 0.934
152.2 0.0350 0.969
109.1 0.0251 0.994
Correspondence
338 Analysis 15-41
14 Correspondence Analysis
Baccalaureat Data
0.1 C
alsaF nopc
ildf
B
0.05 lorrE
H
hnor rhoa bret
bour limo
0
auve
r2,s2
cent
pica
G
frac cham
-0.05 D
midi
A
prov
bnor
payl
laro
-0.1
aqui
pcha
-0.15
-0.2 -0.1 0 0.1 0.2

r1,s1
Figure: Correspondence analysis without

Figure 14.3: Correspondence Corsica
analysis excluding Corsica. MVAcorrbac
MVAcorrbac
eigenvalues λ percentage of variances cumulated percentage

2408.6 0.5874 0.587
909.5 0.2218 0.809
318.5 0.0766 0.887
Applied Multivariate Statistical
195.9 Analysis
0.0478 0.935
149.3 0.0304 0.971
96.1 0.0234 0.994
Example
eigenvalues λ % variances % Cumulative variance

2436.2 0.5605 0.561
1052.4 0.2421 0.803
341.8 0.0786 0.881
229.5 0.0528 0.934
152.2 0.0350 0.969
109.1 0.0251 0.994
25.0 0.0058 1.000
0.0 0.0000 1.000
Eigenvalues with Corsica.

Example
eigenvalues λ % variances % Cumulative variance

2408.6 0.5874 0.587
909.5 0.2218 0.809
318.5 0.0766 0.887
195.9 0.0478 0.935
149.3 0.0304 0.971
96.1 0.0234 0.994
22.8 0.0056 1.000
0.0 0.0000 1.000
Eigenvalues without Corsica.

Example
Region r1 r2 r3 Ca (i, r1 ) Ca (i, r2 ) Ca (i, r3 )
ILDF 0.1464 0.0677 0.0157 0.3839 0.2175 0.0333
CHAM -0.0603 -0.0410 -0.0187 0.0064 0.0078 0.0047
PICA 0.0323 -0.0258 -0.0318 0.0021 0.0036 0.0155
HNOR -0.0692 0.0287 0.1156 0.0096 0.0044 0.2035
CENT -0.0068 -0.0205 -0.0145 0.0001 0.0030 0.0043
BNOR -0.0271 -0.0762 0.0061 0.0014 0.0284 0.0005
BOUR -0.1921 0.0188 0.0578 0.0920 0.0023 0.0630
NOPC -0.1278 0.0863 -0.0570 0.0871 0.1052 0.1311
LORR -0.2084 0.0511 0.0467 0.1606 0.0256 0.0608
ALSA -0.2331 0.0838 0.0655 0.1283 0.0439 0.0767
FRAC -0.1304 -0.0368 -0.0444 0.0265 0.0056 0.0232
PAYL -0.0743 -0.0816 -0.0341 0.0232 0.0743 0.0370
BRET 0.0158 0.0249 -0.0469 0.0011 0.0070 0.0708
PCHA -0.0610 -0.1391 -0.0178 0.0085 0.1171 0.0054
AQUI 0.0368 -0.1183 0.0455 0.0055 0.1519 0.0643
MIDI 0.0208 -0.0567 0.0138 0.0018 0.0359 0.0061
LIMO -0.0540 0.0221 -0.0427 0.0033 0.0014 0.0154
RHOA -0.0225 0.0273 -0.0385 0.0042 0.0161 0.0918
AUVE 0.0290 -0.0139 -0.0554 0.0017 0.0010 0.0469
LARO 0.0290 -0.0862 -0.0177 0.0383 0.0595 0.0072
PROV 0.0469 -0.0717 0.0279 0.0142 0.0884 0.0383
Example
s1 s2 s3 Ca (j, s1 ) Ca (j, s2 ) Ca (j, s3 )

A 0.0447 -0.0679 0.0367 0.0376 0.2292 0.1916
B 0.1389 0.0557 0.0011 0.1724 0.0735 0.0001
C 0.0940 0.0995 0.0079 0.1198 0.3556 0.0064
D 0.0227 -0.0495 -0.0530 0.0098 0.1237 0.4040
E -0.1932 0.0492 -0.1317 0.0825 0.0141 0.2900
F -0.2156 0.0862 0.0188 0.3793 0.1608 0.0219
G -0.1244 -0.0353 0.0279 0.1969 0.0421 0.0749
H -0.0945 0.0438 -0.0888 0.0017 0.0010 0.0112

The baccalauréats B on one side and F on the other side are most
strongly responsible for the variation on the first axis. The second
axis mostly characterizes an opposition between baccalauréats A
and C. Regarding the regions, Ile de France plays an important role
on each axis. On the first axis it is opposed to Lorraine and Alsace,
whereas on the second axis it is opposed to Poitou-Charentes and
Aquitaine.
On the right are the more classical baccalauréats and on the left
the more technical ones. The regions on the left have thus bigger
weights in the technical baccalauréats.

Example
Note also that most of the southern regions of France are
concentrated in the lower part of the picture near the baccalauréat
A.
Finally, looking at the 3-rd axis, we see that it is dominated by the
baccalauréat D (negative sign) and also but to a lesser degree by E
(negative) (as opposed to A (positive sign)). The dominating
regions are HWOR (positive sign) (opposed to NPAC (negative
sign)). So for instance HWOR is particularly poor in baccalauréat
D.

Example
US crime data set: For one year (85) we have the reported number
of crimes in the 50 states of the US classified according 7
categories: murder, rape, robbery, assault, burglary, larceny and
auto-theft.
λj % variance cumulated % variance

4399.0 0.4914 0.4914
2213.6 0.2473 0.7387
1382.4 0.1544 0.8932
870.7 0.0973 0.9904
51.0 0.0057 0.9961
34.8 0.0039 1.0000
0.0 0.0000 0.0000

Table 14.8:Analysis
Correspondence Eigenvalues and explained proportion of variance, Example 14.5.
15-49
US Crime Data
mur
0.4 ass
MS NC
AR AL
0.2 SCrap
VW TN
r2,s2
FL LA
VT
GA
TXbur
ME VANMOK MI
SD MD rob
OR AZ KY
0 ID
WY
KS WA
lar NH CO CA NV
MO NY
DE IL
MT NE PA
IA UT AK INCT
ND OH
WI MN NJ
HI
-0.2 aut
RI
MA
-0.4 -0.2 0 0.2 0.4 0.6

r1,s1
Figure: Projection of rows (the 50 states) and columns (the 7 crime

Figure 14.4: Projection of rows (the 50 states) and columns (the 7 crime categories).
categories) MVAcorrcrime MVAcorrcrime
Looking at the absolute Statistical Analysis
contributions (not reproduced here, see Exercise 14.6), it appears that
the first axis is robbery (+) versus larceny (-) and auto-theft (-) axis and that the second
It appears that the first axis is robbery (+) versus larceny (-)
and auto-theft (-) and that the second factor compares
assault (-) with auto-theft (+).
The dominating states for the first axis are the North-Eastern
States MA (+) and NY (+) compared with the Western
States WY (-)and ID (-). For the second axis the opposition
is between Northern States (MA (+) again and RI (+))
against the Southern States AL (-), MS (-) and AR (-).

Example (Type of companies)

Recall: Rows are Finance, Energy, Hitech, and columns are the
locations Frankfurt, Berlin, and Munich.
4 0 2 6
0 1 1 2
1 1 4 6
5 2 7 14
P
n
x
We want the row and column indices such that sj ∝ ri x•jij and
i=1
P
p
x
ri ∝ sj xi•ij .
j=1

Example
Companies
HiTech
Munich
Finance
Frankfurt
Berlin
Energy
Figure: Types of companies example.

Biplots
Biplots are a low-dimensional display of a data matrix X where the
rows and the columns are represented by points.
Example (10 × 5) data matrix X

Find 10 row points qi ∈ Rk , k < p, i = 1, . . . , 10 and 5 column
points tj ∈ Rk , j = 1, . . . , 5, so that the 50 scalar products between
the row and the column vectors closely approximate the 50
corresponding elements of the data matrix.
xij = qi> + εij
What is the link between correspondence analysis and biplots?

Reconstitution formula
Recall (27) and check
 
PR 1
2
λ γ δ
ik jk 
xij = Eij 1 + k=1
q k
xi• x•j
x••
From this, we obtain the differences between row profiles and

average row profile.
R
X r
xij x•j 1 x•j
− = 2
λk γik δjk
xi• x•• xi• x••
k=1
Corresponding expression holds also for the columns profiles.

Now, if λ1 λ2 λ3 . . ., we can approximate these sums by K

terms:
X K
xij x•j x•j
− = √ skj rki + εij
xi• x•• λk x••
k=1
K
X
xij xi• x•i
− = √ rki skj + ε0ij .
x•j x•• λk x••
k=1
where εij and ε0ij

are the error terms.
This shows the differences between the row profiles and the
average profile

Correspondence analysis is a factorial decomposition of

contingency tables. The p-dimensional individuals and the
n-dimensional variables can be graphically represented by
projecting on spaces of smaller dimension.
Correspondence analysis provides a graphical display of the
association measure cij = (xij − Eij )2 /Eij .

The practical computation consists of first computing a

spectral decomposition of A−1 X B −1 X > and B −1 X > A−1 X
which have the same first p eigenvalues. The graphical
√ √ √
representation is to plot λ1 r1 vs. λ2 r2 and λ1 s1 vs.
√
λ2 s2 . Both plots may be displayed in the same graph taking
into account the appropriate orientation of the eigenvectors
ri , sj .

Canonical Correlation Analysis 16-1
Canonical Correlation Analysis
Most Interesting Linear combinations

We have random vectors X ∈ Rq and Y ∈ Rp
Linear combinations:
a> X and b > Y
Correlation of the linear combinations:
ρ(a, b) = ρa> X b> Y
We want to find a, b that maximize the correlation ρ(a, b)!

Harold Hotelling on BBI:

Suppose that we have two random vectors X and Y with the

following means and covariance structure.

X µ ΣXX ΣXY
∼ , ,
Y ν ΣYX ΣYY
X ∼ (µ, ΣXX ) , ΣXX q × q

Y ∼ (ν, ΣYY ) , ΣYY p × p
and Cov (X , Y ) = ΣXY .

The correlation between a> X and b > Y is

a> ΣXY b
ρ(a, b) =
(a> ΣXX a)1/2 (b > ΣYY b)1/2
Note that ρ(ca, b) = ρ(a, b) for any c ∈ R.

Given this invariance of scale we can rescale the projections and
thus we can equivalently maximize a> ΣXY b under the constraints
a> ΣXX a = b > ΣYY b = 1.

Define
−1/2 −1/2
K = ΣXX ΣXY ΣYY (q × p)
Singular value decomposition (SVD) of K:
K = ΓΛ∆>
with
Γ = (γ1 , . . . , γk )
∆ = (δ1 , . . . , δk )
1/2 1/2
Λ = diag(λ1 , . . . , λk ),
k = rank(K), λ1 ≥ . . . ≥ λk 6= 0 eigenvalues of KK> and K> K,
γ1 , . . . , γk standardized eigenvector of KK> ,
δ1 , . . . , δk standardized eigenvector of K> K.
Canonical correlation vectors
−1/2
ai = ΣXX γi
−1/2
bi = ΣYY δi
Canonical variables
ηi = ai> X
ϕi = bi> Y
(
1 i = j,
Cov(ηi , ηj ) = ai> ΣXX aj = γi> γj =
0 i 6= j.
Canonical correlation coefficients
1/2 1/2
λ1 , . . . , λ k

Theorem
Define fr = max a> ΣXY b under the constraints
a,b
a> ΣXX a = b > ΣYY b = 1,
ai> ΣXX a = bi> ΣYY b = 0, i = 1, . . . , r − 1.
Fix r , 1 ≤ r ≤ k.
1/2
Then the maximum of ρ(a, b) is given by fr = λr and is attained
when
−1/2
a = ar = ΣXX γr
and
−1/2
b = br = ΣYY δr .

Theorem
Let η and ϕ be the canonical variables, i.e., the components of the
vector η are
>
−1/2
ηi = ΣXX γi X,
and the components of the vector ϕ are

>
−1/2
ϕi = ΣYY δi Y,
for 1 ≤ i ≤ k. Then
! !
η I Λ
Var = ,
ϕ Λ I
1/2 1/2
where Λ = diag(λ1 , . . . , λk ).
Summary: CC Analysis
Canonical correlation analysis aims to identify possible links

between two (sub-)sets of variables X ∈ Rq and Y ∈ Rp . The
idea is to find indices a> X and b > Y such that the correlation
ρ(a, b) = ρa> Xb> Y is maximal.
The maximum correlation (under constraints) is found by
−1/2 −1/2
ai = ΣXX γi and bi = ΣYY δi , where γi and δi denote the
−1/2 −1/2
eigenvectors of KK> and K> K, K = ΣXX ΣXY ΣYY .
The vectors ai and bi are called canonical correlation vectors.

Summary: CC Analysis
The indices ηi = ai> X and ϕi = bi> Y are called the canonical

variables.
√ √
The values λ1 , . . . , λk , which are the square roots of the
nonzero eigenvalues of KK> and K> K, are called the
canonical correlation coefficients. The covariance between the
√
canonical variables is Cov(ηi , ϕi ) = λi , i = 1, . . . , k.
The first canonical variables η1 = a1> X and ϕ1 = b1> Y have
√
the largest possible covariance λ1 .
Canonical correlations are invariant w.r.t. linear
transformations of the original variables X and Y .
Canonical Correlations in Practice
In practice, the covariance matrices ΣXX , ΣXY , ΣYY are unknown.

We have to estimate them by sample covariance matrices SXX ,
SXY , SYY and carry out the analysis on the estimates.
Example (Car marks data)

We will investigate the association between the random vectors
(Price, Value stability)
and
(Economy, Service, Design, Sporty car, Safety, Easy handling)

Example (Car marks data)
Price Value Econ. Serv. Design Sport. Safety Easy h.

 
1.41 −1.11 | 0.78 −0.71 −0.90 −1.04 −0.95 0.18
 −1.11 1.19 | −0.42 0.82 0.77 0.90 1.12 0.11 
 
 
−−− −−− | −−− −−− −−− −−− −−− −−−
 
 0.78 −0.42 | 0.75 −0.23 −0.45 −0.42 −0.28 0.28 
 
 
S =  −0.71 0.82 | −0.23 0.66 0.52 0.57 0.85 0.14 
 
 −0.90 0.77 | −0.45 0.52 0.72 0.77 0.68 −0.10 
 
 −1.04 0.90 | −0.42 0.57 0.77 1.05 0.76 −0.15 
 
 
 −0.95 1.12 | −0.28 0.85 0.68 0.76 1.26 0.22 
0.18 0.11 | 0.28 0.14 −0.10 −0.15 0.22 0.32

Example
!
1.41 −1.11
SXX =
−1.11 1.19
!
0.78 −0.71 −0.90 −1.04 −0.95 0.18
SXY = ,
−0.42 0.82 0.77 0.90 1.12 0.11
 
0.75 −0.23 −0.45 −0.42 −0.28 0.28
 
 −0.23 0.66 0.52 0.57 0.85 0.14 
 
 −0.45 0.52 0.72 0.77 0.68 −0.10 
SYY = 

.
 −0.42 0.57 0.77 1.05 0.76 −0.15 

 
 −0.28 0.85 0.68 0.76 1.26 0.22 
0.28 0.14 −0.10 −0.15 0.22 0.32

Example
−1/2 −1/2
Now we estimate K = ΣXX ΣXY ΣYY by
b = S −1/2 SXY S −1/2

K XX YY
b
and perform a singular value decomposition of K,
b = GLD> = (g1 , g2 ) diag(`1/2 , `1/2 ) (d1 , d2 )>

K 1 2
We obtain as canonical correlation coefficients

1/2 1/2
`1 = 0.98, `2 = 0.89.

Example
The first canonical variables are
ηb1 = ab1> x = 1.602 x1 + 1.686 x2
ϕb1 = 0.568 y1 + 0.544 y2 − 0.012 y3 − 0.096 y4 − 0.014 y5 + 0.915 y6
The canonical variable
ηb1 = ab1> x = 1.602 x1 + 1.686 x2
may be interpreted as a price and value index (negatively weighting

the non-depreciation of the car).

Example
Considering the corresponding canonical variable
b1 = 0.568 y1 + 0.544 y2 − 0.012 y3 − 0.096 y4 − 0.014 y5 + 0.915 y6

ϕ
It is mainly formed from the qualitative variables economy,

service and easy handling with positive signs. Design, safety
and sportiness have smaller weight with negative sign.
These variables may therefore be interpreted as an
appreciation of the value of the car.
The sportiness, design and safety features are negatively
related to the price and value index.
350 15 Canonical Correlation Analysis
Car Marks Data
Ferrari
8
Wartburg
7 Jaguar
Trabant
φ1
Rover
Lada
6 Audi BMW
Mazda
Citroen
Peugeot
RenaultMitsubishi
VW Golf Toyota
Ford
Hyundai Mercedes
5 Opel Nissan
Vectra
Fiat
Opel Corsa
VW Passat
4
8 9 10 11 12 13 14
η
1
Figure: The first canonical variables for the car marks data
Figure 15.1: The first canonical variables for the car marks data. MVAcancarm
MVAcancarm
−1/2 −1/2
Now we estimate K = Σ XX ΣXY Σ
YY by
b = S −1/2 SXY S −1/2
K
Example
US crimes data, X ∈ R7 (murder, rape, robbery, assault, burglary,
larceny, autotheft)
US health data, Y ∈ R7 (accident, cardiovascular, cancer,
pulmonary, pneumonia, diabetes, liver)
Estimated matrix K
b MVAcanus
0.23 -0.19 -0.19 -0.29 -0.22 -0.07 -0.10

0.20 -0.14 -0.17 0.04 0.12 -0.19 -0.08
-0.21 0.34 0.01 -0.22 -0.13 -0.15 0.31
0.38 -0.08 0.08 -0.11 -0.34 0.00 0.04
-0.13 -0.08 0.36 0.12 -0.10 -0.21 0.34
-0.25 -0.57 -0.08 0.28 0.06 -0.06 0.13
-0.20 -0.14 0.42 -0.35 0.12 0.24 0.14

Example
b=b
Singular value decomposition of K ΓΛb∆
b>
Estimated canonical correlation coefficients ri = b

1/2
li ,
i = 1, . . . , 7
1 2 3 4 5 6 7
0.928 0.895 0.795 0.752 0.627 0.502 0.278
MVAcanus

Example
Estimated canonical correlation vectors ai , i = 1, . . . , 7
1 2 3 4 5 6 7
-0.173 -0.066 0.233 0.004 -0.269 0.000 0.275
-0.066 0.044 -0.012 -0.021 -0.014 0.201 -0.085
0.005 -0.006 -0.005 -0.007 -0.011 -0.002 -0.002
0.006 -0.002 0.001 -0.001 0.022 -0.014 -0.018
0.002 0.001 -0.001 -0.003 0.003 0.001 0.004
-0.001 0.001 0.000 0.000 -0.001 -0.002 -0.001
0.002 0.000 0.004 0.006 0.000 0.002 -0.001
MVAcanus

Example
Estimated canonical correlation vectors bj , j = 1, . . . , 7
1 2 3 4 5 6 7
-0.056 -0.039 0.048 0.016 0.056 0.038 -0.044
-0.008 -0.019 -0.011 -0.020 -0.014 -0.013 -0.012
0.014 0.008 0.034 0.028 0.049 0.052 0.061
-0.035 0.114 -0.078 -0.080 0.114 -0.084 0.022
0.063 0.075 -0.101 0.207 0.002 0.198 -0.140
0.036 -0.009 -0.010 0.282 0.059 -0.197 -0.264
0.215 -0.022 0.059 -0.169 -0.108 -0.003 -0.369
MVAcanus

Summary: CC’s in Practice
In practice we estimate ΣXX , ΣXY , ΣYY by the empirical

covariances and to compute estimates `i , gi , di for λi , γi , δi
from the SVD of K b = S −1/2 SXY S −1/2 .
XX YY
The sign of the coefficients of the canonical variables tells us
the influence of these variables.

Multidimensional Scaling 17-1
Multidimensional Scaling
MDS uses proximities between objects to produce a spatial

representation of these items.
MDS does not start from the raw multivariate data matrix, X ,
but from a (n × n) dissimilarity or a distance matrix, D
MDS is called a data reduction technique because it considers
the problem of finding a set of points in low dimension that
represents the “configuration” of data in high dimension.
MDS-techniques are used to understand how people perceive
and evaluate all sorts of items.

Aim of Multidimensional scaling
The primary purpose of all MDS-techniques is to uncover whatever

structure or pattern may be present in the “data” and to represent
it in a simple geometrical model or picture.
Mathematically: find a “configuration” of points in R2 that
“preserves” the distances of objects in Rp .
The classical solution
Definition
We say that a distance matrix D = (dij ) is Euclidean if for some
points x1 , . . . , xn ∈ Rp ; dij2 = (xi − xj )> (xi − xj ).

Metric MDS
Multidimensional scaling based on Euclidean proximities is usually
referred to as metric MDS, whereas the more popular non-metric
MDS is used when the proximities are measured on an ordinal
scale.
Example (Intercity Distances)
Consider road distances between six German towns.
MDS can recreate the map from the set of distances.
In real-life applications, the problems are exceedingly more complex
because there are usually errors in the data and the dimensionality
is rarely known in advance.

Example (Intercity Distances)

The table of intercity distances (distance matrix):
Ber Dre Ham Kob Mun Ros
Berlin 0 214 279 610 596 237

Dresden 0 492 533 496 444
Hamburg 0 520 772 140
Koblenz 0 521 687
Munich 0 771
Rostock 0

356 16 Multidimensional Scaling
Initial Configuration
200
EAST - WEST - DIRECTION in km

Dresden
Berlin Muenchen
Rostock
0
Hamburg
-200
Koblenz
-400
-400 -200 0 200 400 600

NORTH - SOUTH - DIRECTION in km
Figure: Metric MDS solution for the inter-city road distances.

Figure 16.1: Metric MDS solution for the inter-city road distances. MVAMDScity1
MVAMDScity1
Map of German Cities
Rostock
800
Figure 16.1: Metric MDS solution for the inter-city road distances. MVAMDScity1
Map of German Cities
Rostock
800
NORTH - SOUTH - DIRECTION in km

Hamburg
Berlin
600
Dresden
400
Koblenz
200
Muenchen
0
0 200 400 600 800
EAST - WEST - DIRECTION in km
Figure: Metric MDS solution for the inter-city road distances after 90◦
Figure 16.2: Metric MDS solution for the inter-city road distances after reflection and 90◦
rotation. rotation. MVAMDScity2 MVAMDScity1

In short, the primary purpose of all MDS-techniques is to uncover structural relations or
patterns in the data and to represent it in a simple geometrical model or picture. One
Example (Dissimilarity of Cars)

Consumers’ impressions of the dissimilarity of certain cars.
Audi 100 BMW 5 Citroen AX Ferrari ...
Audi 100 0 2.232 3.451 3.689 ...

BMW 5 2.232 0 5.513 3.167 ...
Citroen AX 3.451 5.513 0 6.202 ...
Ferrari 3.689 3.167 6.202 0 ...
.. .. .. .. .. ..
. . . . . .

Metric MDS
3
ferrari
2
wartburg
jaguar
trabant
1
y
rover
lada
bmw
audi mazdacitroen
0
mitsubishi
peugeot
renault
toyota
ford nissan fiat
mercedesopel_vectra hyundai
vw_golf
−1
vw_passat
opel_corsa
−4 −2 0 2 4
x
Figure: MDS solution on the car data. MVAmdscarm

Example
The dissimilarities in this table were in fact computed as Euclidean
distances from the original data containing car marks data on
economy, price, security, . . .
Plotting the correlation between the MDS and the original we see:
The first MDS direction is highly correlated with service(-),
value(-), design(-), sportiness(-), safety(-) and price(+). We
can interpret the first direction as the price direction since a
bad mark in price (“high price”) obviously corresponds with a
good mark, say, in sportiness (“very sportive”).
The second MDS direction is highly positively correlated with
practicability

Correlations MDS/Variables
1.0
easy
economic
0.5
security
service
price
value
0.0
y
sporty
look
−0.5
−1.0
−1.0 −0.5 0.0 0.5 1.0

x
Figure: Correlation between the MDS direction and the variables

MVAmdscarm

How can we recognize whether matrix D is Euclidean?

(dij2 = (xi − xj )> (xi − xj ))
Theorem
Define A = (aij ), aij = − 12 dij2 , B = HAH and centering matrix
H = In − n−1 1n 1> n.
Then the distance matrix D is Euclidean if and only if B is positive
semidefinite.
If D is the distance matrix of some data matrix X , then
B = HX X > H

MDS Solution
Let X be the coordinates of n points in Rp , x̄ = 0 and define

B = X X > , bij = xi> xj .
Then dij2 = bii + bjj − 2bij = aij − ai• − a•i + a•• . Hence B = HAH.
It is clear that B is symmetric, positive semidefinite and of rank p,
i.e. B = ΓΛΓ> , where Λ is diagonal matrix of the positive
eigenvalues λ1 > · · · > λp with corresponding eigenvectors
γ1 , . . . , γp with ||γi ||2 = λi .
Hence, the coordinate matrix X can be found as X = ΓΛ1/2 .

Similarity → Distance
Sometimes we have a matrix of similarities C and it is necessary to

“convert” it into a matrix of distances.
C→D
dij = (cii − 2cij + cjj )1/2
Theorem
If C is negative definite, then the distance matrix defined by the
above formula is Euclidean.

Relation to Factorial Analysis

D = Distance matrix of X . Projection on k-dimensional space.
X1 = X L1 is projection of X on the column space of L1 with
(1)
distances dij .
(1)
A measure of discrepancy (STRESS) between D and D1 = (dij ):
n
X (1)
φ= (dij − dij )2
i,j=1
Theorem
Among all projections X L1 , of X onto k-dimensional subspaces of
Rp the quantity φ is minimized when X is projected onto its first k
principal components.
Summary: MDS
MDS uses distances or similarities to project high-dimensional

data in a low-dimensional space.
MDS is identical to principal component analysis.
Roger Newland Shepard on BBI:

Joseph Bernard Kruskal on BBI:

Nonmetric Multidimensional Scaling
Nonmetric MDS is based on a “loose” relationship between

dissimilarities and distances.
The distance is defined as an arbitrary monotone function of
the dissimilarities.
The nonmetric MDS is based on the rank order of the
dissimilarities.
The most common approach to determine distances and to
obtain the coordinates of the objects is the iterative
Shepard-Kruskal algorithm.

Shepard-Kruskal algorithm
1. calculate Euclidean distances from arbitrarily chosen initial

configuration X (or use metric MDS to obtain the initial
coordinates)
2. Define new distances so that they are monotone function of
the original dissimilarities δij (using isotonic regression),
dij = f (δij )
3. Calculate a new configuration of the data which is more
closely related to the distances obtained in step 2 (use a
STRESS measure)
4. Check the change of STRESS, if it is not small enough,
iterate the algorithm
Monotonic Regression
15
Distance 10
5 10 15
Rank
Figure: Ranks and Figure

distances.
16.5: Ranks and distances. MVAMDSnonmstart
MVAMDSnonmstart
Pool-Adjacent-Violator-Algorithm
15

10
e
5 10 15
Rank
Figure 16.5: Ranks and distances. MVAMDSnonmstart
Pool-Adjacent-Violator-Algorithm
15
Distance 10
5 10 15
Rank
Figure: Pool-adjacent violators

Figure 16.6: algorithm.
Pool-adjacent violators algorithm. MVAMDSpooladj
MVAMDSpooladj
Shepard-Kruskal Algorithm
(0)
In a first step, called the initial phase, we calculate Euclidean distances dij from an arbitrar-
ily chosen initial configuration X0 in dimension p∗ , provided that all objects have different
Applied coordinates.
Multivariate One Statistical Analysis
might use metric MDS to obtain these initial coordinates. The second
step or nonmetric phase determines disparities dˆij from the distances dij by constructing
(0) (0)
(0)
Example
Dissimilarities δij of 4 objects based on the car marks data set.
j 1 2 3 4
i Mercedes Jaguar Ferrari VW
1 Mercedes -
2 Jaguar 3 -
3 Ferrari 2 1 -
4 VW 5 4 6 -

Example
Our aim is to find a p ∗ = 2 dimensional representation via MDS.
Assume we choose as initial configuration from metric MDS X0 as
MVAnmdscar1:
Initial Configuration
8
Jaguar
6
i xi1 xi2
1 Mercedes 3 2
4 VW
2 Jaguar 2 7
Ferrari
2 Mercedes
3 Ferrari 1 3
4 VW 10 4
0
0 2 4 6 8 10 12
Figure 16.7: Initial configuration of the MDS of the car data. MVAnmdscar1
Dissimilarities and Distances

10
Applied Multivariate Statistical Analysis (3,4)
(2,4)
8
(1,4)
Example
p
The corresponding distances dij = (xi − xj )> (xi − xj ) are
i, j dij rank(dij ) δij

1,2 5.1 3 3
1,3 2.2 1 2
1,4 7.3 4 5
2,3 4.1 2 1
2,4 8.5 5 4
3,4 9.1 6 6

0 2 4 6 8 10 12
Example Figure 16.7: Initial configuration of the MDS of the car data. MVAnmdscar1
Dissimilarities and Distances

10
(3,4)
(2,4)
8
(1,4)
6
Distance
(1,2)
(2,3)
4
(1,3)
2
0
0 1 2 3 4 5 6 7
Dissimilarity
Figure 16.8: Scatterplot of dissimilarities against distances. MVAnmdscar2

A plot of the dissimilarities is not satisfactory since the ranking of
A plot of the dissimilarities of Table 16.5 against the distance yields Figure 16.8. This
the δij did notis result
relation in asince
not satisfactory monotone relation
the ranking of the of the
δ did not result corresponding
in a monotone
ij relation
of the corresponding distances d . We apply therefore the PAV algorithm.
ij
distances dij . We apply therefore the PAV algorithm.
MVAnmdscar2
Example
The first violator of monotonicity is the second point (1, 3).
Therefore, average the distances d13 and d23 to get the disparities
d13 + d23 2.2 + 4.1
db13 = db23 = = = 3.15.
2 2
Apply the same procedure to the pair (2, 4) and (1, 4) to yield
db24 = db14 = 7.9. The plot of δij versus the disparities dbij represents
a monotone regression relationship.
In the initial configuration, the point 3 (Ferrari) could be moved to
reduce the distance to object 2 (Jaguar). However, this procedure
also alters the distance between objects 3 and 4.

In order to assess how well the derived configuration fits the given
dissimilarities Kruskal suggests a measure called STRESS1 that is
given by
P !1
(d − db ) 2 2
i<j ij ij
STRESS1 = P 2
.
d
i<j ij
An alternative measure of STRESS1 is given by

P ! 12
b 2
i<j (dij − dij )
STRESS2 = P 2
,
i<j (dij − d)
where d denotes the average distance.

Example
(i, j) δij dij dbij (dij − dbij )2 dij2 (dij − d)2

(2,3) 1 4.1 3.15 0.9 16.8 3.8
(1,3) 2 2.2 3.15 0.9 4.8 14.8
(1,2) 3 5.1 5.1 0 26.0 0.9
(2,4) 4 8.5 7.9 0.4 72.3 6.0
(1,4) 5 7.3 7.9 0.4 53.3 1.6
(3,4) 6 9.1 9.1 0 82.8 9.3
Σ 36.3 2.6 256.0 36.4
Average distance is d = 36.3/6 = 6.05. STRESS measures are:

p p
STRESS1 = 2.6/256 = 0.1, STRESS2 = 2.6/36.3 = 0.27

The aim is a point configuration that balances the effects STRESS

and non monotonicity. This is achieved by an iterative procedure
defining new position of object i relative to object j by
!
dbij
NEW
xil = xil + α 1 − (xjl − xil ), l = 1, . . . , p ∗ .
dij
Here α denotes the step width of the iteration.

The configuration of object i is improved relative to object j.

In order to obtain an overall improvement relative to all remaining

points one uses:
n
!
α X dbij
NEW
xil = xil + 1− (xjl − xil ), l = 1, . . . , p ∗ .
n−1 dij
j=1,j6=i
The choice of step width α is crucial. Kruskal proposes a starting

value of α = 0.2. The iteration is continued by a numerical
approximation procedure.

In a fourth step, the evaluation phase, the STRESS measure is

used to evaluate if its change as a result of the last iteration is
sufficiently small to terminate the procedure, or not. At this stage
the optimal fit has been obtained for a given dimension. Hence, the
whole procedure needs to be carried out for a several dimensions.

Example
Let us compute the new point configuration for i = 3 (Ferrari).
The initial coordinates are x31 = 1, x32 = 3. Applying the above
formula yields:
4
!
3 X db31
NEW
x31 =1+ 1− (xj1 − 1)
4−1 d31
j=1,j6=3

3.15 3.15 9.1
= 1+ 1 − (3−1)+ 1 − (2−1)+ 1 − (10−1)
2.2 2.2 9.1
= 1 − 0.86 + 0.23 + 0 = 0.37
NEW = 4.36.
Similarly we obtain x32

j=1,j6=3

3.15 3.15 9.1
= 1+ 1− (3 − 1) + 1 − (2 − 1) + 1 − (10 − 1)
Multidimensional Scaling 2.2 4.1 9.1 17-31
= 1 − 0.86 + 0.23 + 0
= 0.37.
Example
Similarly we obtain xN
32
EW
= 4.36.
First Iteration for Ferrari
8
Jaguar
Ferrari New
4 VW
Ferrari Init
2 Mercedes
0
0 2 4 6 8 10 12
Figure 16.9: First iteration for Ferrari. MVAnmdscar3
Figure: First iteration for Ferrari MVAnmdscar3

Summary: Nonmetric MDS
Nonmetric MDS is based only upon the rank order of

dissimilarities.
The object of nonmetric MDS is to create a spatial
representation of the objects with low dimensionality.
A practical algorithm is given as:
1. Choose an initial configuration
2. Normalize the configuration.
3. Find dij from the normalized configuration
4. Fit dbij , the disparities by the PAV algorithm
5. Find the new configuration Xn+1 by using steepest descent.
6. Go to 2.
Conjoint Measurement Analysis 18-1
Conjoint Measurement Analysis
Conjoint Measurement Analysis (CMA) plays an important

role in marketing. In the design of new products it is valuable
to know which components carry what kind of utility.
CMA is a method for attributing utilities to the components
(part worths) on the basis of ranks given to different
outcomes (stimuli). The overall utility is decomposed as a
sum of the utilities of the components.

Example
A car producer plans to introduce a new car. The elements are
safety components (airbag component just for the driver or also for
the second front seat) and sporty note (leather steering wheel vs.
leather interior). There are 4 lines of cars.
car 1: basic safety equipment and low sportiness

car 2: basic safety equipment and high sportiness
car 3: high safety equipment and low sportiness
car 4: high safety equipment and high sportiness

Example
For the car producer it is important to rank these cars and to find
out customers’ attitudes. A tester may rank the cars as follows:
car 1 2 3 4
ranking 1 2 4 3
The elementary utilities here are the safety equipment and the
sportiness outfit. CMA aims at explaining the rank order given by
the test person as a function of these elementary utilities.

Example
A margarine producer plans to create a new product and varies the
elements calories (low vs. high) and presentation (a plastic pot vs.
paper packed). One has in fact 4 products.
product 1 : low calories and plastic pot packing

product 2 : low calories and paper packed
product 3 : high calories and plastic pot packing
product 4 : high calories and paper packed
These 4 fictive products may now be ranked for example
Product 1 2 3 4
tester’s rank 3 4 1 2
Aim
CMA aims to explain such a ranking by attributing

part-worths to different elements. The part-worths are the
utilities of the elementary components of the product.
In interpreting the part-worths one may discover that for a
test person one of the elements has a higher value

Summary: Conjoint analysis
The conjoint analysis is used in the design of new products.

The conjoint measurement analysis tries to identify partworth
utilities that contribute to an overall utility.
The partworths enter additively into an overall utility.
The interpretation of the part-worths gives insight into the
perception and acceptance of the product.

Design of Data Generation
A stimulus is defined as a combination of the different components.
The profile method asks for the utility of each stimulus. This
may be time consuming and tiring for a test person if there
are too many factors and factor levels.
The two factor method is a simplification and considers only
two factors simultaneously. It is also called trade-off analysis.

Example
Add a product category (property) such as


 1 bread
X3 (use) = 2 cooking


3 universal
to the margarine example. ⇒ 3 ∗ 2 ∗ 2 = 12 Stimuli
two factor method = consider two factors simultaneously.

Example
The trade–off matrices for the levels X1 , X2 , X3 for the margarine
example:
X3 X1 X3 X2
X1 X2
1 1 2 1 1 2
1 1 2
2 1 2 2 1 2
2 1 2
3 1 2 3 1 2
Table: Trade-off matrices for margarine.

Example
For the automobile example additional characteristics may be
engine power and the number of doors. These categories may be
coded as 

 1 50 kW
X3 (power of engine) = 2 70 kW


3 90 kW
and 

 1 2 doors
X4 (doors) = 2 4 doors


3 5 doors

Example
The trade-off matrices for the new car outfit are as follows
X4 X3 X4 X2 X4 X1
1 1 2 3 1 1 2 1 1 2
2 1 2 3 2 1 2 2 1 2
3 1 2 3 3 1 2 3 1 2
X3 X2 X3 X1
X2 X1
1 1 2 1 1 2
1 1 2
2 1 2 2 1 2
2 1 2
3 1 2 3 1 2

Profile method vs. trade-off analysis should be guided by the

aspects:
1. requirements on the test person
2. product perception
3. time consumption
The profile method offers the possibility of a complete product
perception.
With the number of levels and properties the number of stimuli
rise exponentially with the profile method. The time to complete a
questionnaire is therefore a factor in the choice of method.

Summary: Design of data generation
A stimulus is a combination of different properties of a

product.
The design of a conjoint measurement analysis study is by
generation of a list of all factors (profile method) or by
trade–off matrices.
Trade–off matrices are used if there are too many factor levels.

Estimation of Preference Orderings

The conjoint analysis uses an additive model of the form
Lj
J X Lj
X X
Yk = βjl I (Xj = xjl ) + µ; k = 1, . . . , K ∀ j βjl = 0
j=1 l=1 l=1
Xj , j = 1, . . . , J = factors (e.g. calories, paper, use)

xjl , l = 1, . . . , Lj = levels of each factor Xj
µ = overall level (utility)
Yk = observed preference for each stimulus
J
Y
K= Lj .
j=i
Example
X1 = use, X2 = calories
changed notation
x11 = 1, x12 = 2, x13 = 3, x21 = 1, x22 = 2 L1 = 3, L2 = 2
X2
1 2
1 2 1
X1 2 3 4
3 6 5
Table: Ranked products
Ranked products
Example (cont’d)
Order the stimuli
Y1 = Utility (X1 = 1 ∧ X2 = 1)
Y2 = Utility (X1 = 1 ∧ X2 = 2)
Y3 = Utility (X1 = 2 ∧ X2 = 1)
Y4 = Utility (X1 = 2 ∧ X2 = 2)
Y5 = Utility (X1 = 3 ∧ X2 = 1)
Y6 = Utility (X1 = 3 ∧ X2 = 2)

Example (cont’d)
We obtain the decomposition
Y1 = β11 + β21 + µ
Y2 = β11 + β22 + µ
Y3 = β12 + β21 + µ
Y4 = β12 + β22 + µ
Y5 = β13 + β21 + µ
Y6 = β13 + β22 + µ

Metric Solution
In the above example 1, 2, ..., 6 possible outcomes of utility
Hence µ = y = (1 + 2 + 3 + 4 + 5 + 6)/6 = 21/6 = 3.5.
ANOVA table
X2
1 2 p̄x1• β1l
1 2 1 1.5 -2
X1 2 3 4 3.5 0
3 6 5 5.5 2
p̄x2• 3.66 3.33 3.5
β2l 0.16 -0.16
1 P
d
The coefficients βjl are computed as p̄xj − µ, p̄xj = d xjk .
k=1
Lj
P
Note that βjl = 0, j = 1, . . . , J.
l=1
Yb1 = β11 + β21 + µ = −2 + 0.16 + 3.5 = 1.66

Yb4 = β12 + β22 + µ = 0 − 0.16 + 3.5 = 3.33.
Conjoint analysis can be written in the form
Y = Xβ + ε
of a linear model (see Chapter 8)

Rewrite the β coefficients
   
β1 µ + β13 + β22
   
 β2   β11 − β13 
   
 β = β −β 
 3   12 13 
β4 β21 − β22
Define the Design matrix
 
1 1 0 1
 
 1 1 0 0 
 
 1 0 1 1 
 
X = 
 1 0 1 0 
 
 1 0 0 1 
 
1 0 0 0

The formula
Lj
J X
X
Yk = βjl I (Xj = xjl ) + µ
j=1 l=1
then leads to the linear model
Y = Xβ + ε
Design matrix, n persons

 
X 

 

 ..
. 

 
X = 1n ⊗ X =  ..  n − times
 
 . 




X

Nonmetric Solution
Sometimes utilities are not measured on a metric scale. Need to

estimate the coefficients from an adjusted set of estimated utilities.
More precisely, one uses the monotone ANOVA, Kruskal (1965).
First, one estimates the model with the ANOVA technique as
described above. Then one applies a monotone transformation
Zb = f (Yb ) to the estimated stimulus utilities.
Joseph Bernard Kruskal on BBI:

Example
For the car example the reported Yk values were
Y = (1, 3, 2, 6, 4, 5)> . The estimated values are:
Yb1 = −1.5 − 1.16 + 3.5 = 0.84

Yb2 = −1.5 + 1.16 + 3.5 = 3.16
Yb3 = −0.5 − 1.16 + 3.5 = 1.84
Yb4 = −0.5 + 1.16 + 3.5 = 4.16
Yb5 = 1.5 − 1.16 + 3.5 = 3.84
Yb6 = 1.5 + 1.16 + 3.5 = 6.16
We see that Yb4 = 4.16 is below Yb6 = 6.16 and thus an

inconsistency in ranking the utilities occurred. A monotone
transformation Zbk = f (Ybk ) is introduced.

380 17 Conjoint Measurement Analysis
Car rankings
6
●
car5
●
car6
5
estimated rankings
4
●
●
car4
car3
3
●
car2
2
1
●
car1
1 2 3 4 5 6
revealed rankings
Figure: Plot of estimated preference orderings vs. revealed rankings

Figure 17.1: Plot of estimated preference orderings vs. revealed rankings and PAV fit.
MVAcarrankings MVAcarrankings
AppliedWe see that the estimated Ŷ6 = 5.16 is below the estimated Ŷ5 = 5.66 and thus an inconsis-
Multivariate Statistical Analysis
tency in ranking the utilities occurs. The monotone transformation Ẑ = f (Ŷ ) is introduced
k k
to make the relationship in Figure 17.1 monotone. A very simple procedure consists of av-
A very simple procedure consists of averaging the “violators” Yb4

and Yb6 to obtain 5.16. The relationship is then monotone but the
model may now be violated.
This procedure is iterated until the stress measure
P
K
(Zbk − Ybk )2
k=1
STRESS = (28)
P
K
(Ybk − Yb̄ )2
k=1
is minimized over β and the monotone transformation f . The

monotone transformation can be computed by the so called
pool-adjacent-violators (PAV) algorithm.

Summary: Nonmetric Solution
The partworths are estimated via the least squares method.

The metric solution corresponds to an analysis of variance.
The nonmetric solution iterates between a monotone
regression curve fitting and determining the partworths by
ANOVA methodology.
The fitting of data to a monotone function is done via the
PAV algorithm.

Applications in Finance 19-1
Portfolio Analysis
Risk of a portfolio
portfolio of p assets.
price of asset j at time i is pij .
Return
pij − pi−1,j
xij = ·
pij
X ∼ (µ, Σ)
return of a portfolio: Q = c > X ; c > 1p = 1
Expected value of Q: c > µ
Risk (squared volatility): 21 c > Σc

Summary: Risk of a portfolio
Suppose that X is the matrix of returns from p assets in n

time periods. Assume the underlying distribution to be
stationary, i.e., X ∼ (µ, Σ). The (theoretical) return of the
portfolio is a weighted sum of the returns of the p assets,
namely Q = c > X .
The expected value of Q is c > µ, the risk or volatility is
1 > 1 >
2 c Σc = 2 Var(c X ), the half of the variance of Q.

Efficient Portfolio
Variance efficient portfolio

A variance efficient portfolio is one that keeps the risk minimal
under the constraint that the weights sum to 1, i.e., c > 1p = 1.
Find c that minimizes the Lagrangian
1 >
L= c Σc − λ(c > 1p − 1)
2
Harry Max Markowitz on BBI:

Mean-variance efficient portfolio
A mean-variance efficient portfolio has minimal variance among all

portfolios with the same mean.
Find a vector of weights c such that the variance of the portfolio is
minimal subject to two constraints:
1. a certain, pre-specified mean return µ has to be achieved,
2. the weights have to sum to one.

Mean-variance Efficient Portfolio
Mathematically speaking, we are dealing with an optimization

problem under two constraints.
The Lagrangian function for this problem is given by
L = c > Σc + λ1 (µ − c > µ) + λ2 (1 − c > 1p ).
The first order condition for a minimum is

∂L
= 2Σc − λ1 µ − λ2 1p = 0.
∂c

Example
Consider the returns from January 1978 to December 1987 of six
stocks traded on the New York stock exchange.
For each stock we have chosen the same scale on the vertical axis
(which gives the return of the stock). The return of some stocks,
such as Pan American Airways, Gerber, Texaco, and Delta Airlines
are more volatile than the returns of other stocks, such as IBM or
Consolidated Edison (Electric utilities).
We compare returns of two portfolios consisting from IBM and
PanAm.

18.2 Efficient Portfolio 385
IBM Consolidated Edison
0 0
Y
-0.4 -0.4
0 50 100 0 50 100
X X
PanAm Gerber
0 0
Y
Y
-0.4 -0.4
0 50 100 0 50 100
X X
Delta Airlines Texaco
0 0
Y
Y
-0.4 -0.4
0 50 100 0 50 100
X X
Figure: Returns of six firms from January 1978 to December 1987

Figure 18.1: Returns of six firms from January 1978 to December 1987. MVAreturns
MVAreturns
Equally Weighted Portfolio Weights
0.4 Statistical Analysis
0.500 IBM
0.2 0.500 Pan Am
Figure 18.1: Returns of six firms from January 1978 to December 1987. MVAreturns
0.4
0.500 IBM
0.2 0.500 Pan Am
Y
-0.2
-0.4
0 50 100
X
Optimal Weighted Portfolio Weights

0.4
0.896 IBM
0.2 0.104 Pan Am
0
Y
-0.2
-0.4
0 50 100
X
Figure: Portfolio
Figure 18.2: of two
Portfolio assets,
of IBM equal and
and PanAm efficient
assets, weights
equal and efficient weights.
MVAportfol MVAportfol IBM PanAm
The text windows on the right of Figure 18.2 show the exact weights which were used. We
can clearly see that the returns of the portfolio with a higher share of the IBM assets (which
Applied
have a Multivariate
low variance) areStatistical
much less Analysis
volatile.
Nonexistence of a riskless asset
Assume invertibility of the covariance matrix Σ, i.e. the

nonexistence of a portfolio c with variance c > Σc = 0.
A riskless asset would have zero variance since it has fixed,
nonrandom returns. In this case Σ would not be positive
definite.
Under this assumption, we can find the variance efficient portfolio

in the following way:

Nonexistence of a riskless asset - variance eff. portfolio

Differentiation of the Lagrangian L = 12 c > Σc − λ(c > 1p − 1) with
respect to c gives
Σc = λ1p
c = λΣ−1 1p .
1
L = λ2 1p Σ−1 1p − λ(λ1p Σ−1 1p − 1)
2
1
= λ − λ2 1p Σ−1 1p
2
is minimal for λ = (1p Σ−1 1p )−1

Theorem
The variance efficient portfolio weights for returns X ∼ (µ, Σ) are
Σ−1 1p
copt = .
1> −1
p Σ 1p

Existence of a Riskless Asset

Σ is not invertible!
Adjust notation → return to riskless asset: r . (under absence
of arbitrage this is the interest rate)
Partition the vector of returns s.t. the last component is the
riskless asset.
The last equation of the system is
2 Cov(r , x) − λ1 r − λ2 = 0
covariance of the riskless asset r with any portfolio is zero.
λ2 = −r λ1 .

Notation
c = portfolio of risky assets
c0 = portfolio of riskless assets
c0 = 1 − 1>
p C (there is only one riskless asset!)
Σ = cov of risky assets (now invertible)
We see that;
λ1 −1
c= Σ (µ − r 1p )
2
Mean-variance efficient weight vector if there exists a riskless asset.
In the case of existence of a riskless asset the mean variance
efficient portfolio weights are given by
µΣ−1 (µ − r 1p )
c= .
µ> Σ−1 (µ − r 1p )
Corollary
A portfolio of uncorrelated assets whose returns have equal
variances (Σ = σ 2 Ip ) needs to be weighted equally:
1
copt = 1p .
p

Corollary
A portfolio of correlated assets whose returns have equal variances,
i.e.,
 
1 ρ ··· ρ
 
 ρ 1 ··· ρ  1
Σ = σ2  .. ..

. . ..  , − < ρ < 1.
 . . . .  p−1
ρ ρ ··· 1
needs to be weighted equally.

Corollary
A portfolio of uncorrelated assets with returns of different
variances (i.e., Σ = diag(σ12 , . . . , σp2 )) has the optimal weights
σj−2
cj = , j = 1, . . . , p.
P
p
−2
σj
j=1

Corollary
A portfolio of assets with returns covariance
 
Σ1 0 . . . 0
 
 0 Σ . . . ... 
 2 
Σ= . . 
 .. . . . . . ... 
 
0 . . . 0 Σr
has the optimal weights c = (c1 , . . . , cr )> where
Σ−1
j 1
cj = , j = 1, . . . , r .
1> Σ−1
j 1

Summary: Efficient Portfolio
An efficient portfolio is one that keeps the risk minimal under

the constraint that a given mean return is achieved and that
the weights sum to 1, i.e., that minimizes
L = c > Σc + λ1 (µ − c > µ) + λ2 (1 − c > 1p ).
If the riskless asset does not exist, the mean variance efficient
portfolio weights are given by
Σ−1 1p
c= .
1> −1
p Σ 1p

Summary: Efficient Portfolio
In the case of existence of a riskless asset the mean variance

efficient portfolio weights are given by
µΣ−1 (µ − r 1p )
c= .
µ> Σ−1 (µ − r 1p )
The efficient weighting depends on the structure of the
covariance matrix Σ. Equal variances of the assets in the
portfolio lead to equal weights, different variances lead to
weighting according to these variances.

Efficient
390 Portfolios in Practice 18 Applications in Finance

0.4
0.167 IBM
0.2 0.167 Pan Am
0.167 Delta
0 0.167 Edison
Y
0.167 Gerber
0.167 Texaco
-0.2
-0.4
0 50 100
X
Optimal Weighted Portfolio Weights

0.4
0.250 IBM
0.2 0.004 Pan Am
0.041 Delta
0.509 Edison
0
Y
0.007 Gerber
0.189 Texaco
-0.2
-0.4
0 50 100
X
Figure: Portfolio
Figure 18.3: of all six
Portfolio of allassets,
six assets,equal
equal andand efficient
efficient weights. weights
MVAportfol
MVAportfol
Hence the optimal weighting is
S −1 16
b
c= = (0.2504, 0.0039, 0.0409, 0.5087, 0.0072, 0.1890)> .
1>
6S
−1 1
6
As we can clearly see, the optimal weights are quite different from the equal weights (cj =
Summary: Efficient Portfolios in

Practice
Efficient portfolio weighting in practice consists of estimating

the covariance of the assets in the portfolio and then
computing efficient weights from this empirical covariance
matrix.
Note that this efficient weighting assumes stable covariances
between the assets over time.

The CAPM
Let us assume that one of the assets has a return denoted by y0

and that this asset is uncorrelated with a mean–variance efficient
portfolio (the riskless asset with y0 ≡ r may be such an asset).
Recall:
2Σc − λ1 µ − λ2 1p = 0
Var(x0 ) = 0, Σ is singular.
Multiply by c >
2c > Σc − λ1 µ̄ = λ2

2Σc − λ1 µ = 2c > Σc1p − λ1 µ̄1p

2
µ = µ̄1p + (Σc − c > Σc1p )
λ1
For the asset that is uncorrelated with the portfolio:
2 >
y0 = µ̄ − c Σc
λ1
c > Σc
λ1 = 2
µ̄ − y0

µ̄ − y0
µ = µ̄1p + (Σc − c > Σc1p )
c > Σc
Σc
µ = y0 1p + > (µ̄ − y0 )
c Σc
µ = y0 1p + β(µ̄ − y0 )
with
Σc
β≡ .
c > Σc

Summary: The CAPM
The weights of the mean-variance efficient portfolio satisfy

2Σc − λ1 µ − λ2 1p = 0.
In the CAPM the mean of X depends on the riskless asset and
the prespecified mean µ as follows µ = r 1p + β(µ − r ).

Highly Interactive, Computationally Intensive Techniques 20-1
Simplicial Depth
Simplicial depth generalizes the notion of data depth. It allows us
to define a multivariate median and to visually present high
dimensional data in low dimension.
The mean and the mode can be easily extended to multivariate
random variables.
However, the median poses a problem, since in a multivariate
sense, we cannot interpret the element-wise median
(
x((n+1)/2),j if n odd
xmed,j = x(n/2),j +x(n/2+1),j
2 otherwise.
as a point that is “most central”.
An equivalent definition of the median in one dimension is given by

the simplicial depth:
For each pair of datapoints xi and xj we generate a closed interval
with xi and xj as border points.
The median is the datapoint xmed , which is enclosed in the
maximum number of intervals:
xmed = {xi | max #{xi ∈ [xk , xl ]}}.

k,l

This definition involves a computationally intensive operation since

we generate n(n − 1)/2 intervals for n observations.
In two dimensions, the interval [xk , xl ] is replaced by a triangle
constructed from three different datapoints.
In three dimensions triangles become pyramids formed from 4
points and the median is that datapoint that lies in the maximum
number of pyramids.

simplicial depth example

1
3 4
Construction of simplicial depth

point 1 2 3 4 5 6
depth 10 10 12 14 8 8
Highly Interactive, Computationally Intensive Techniques
Table 19.2: Simplicial depths for artificial configuration of points.
20-5
Simplicial depth
4
2
Y
0 -2
-2 0 2
X
Figure 19.2: 10 point distribution according to depth with the median shown as a big star
Figure: The multivariate median
in the center.
is shown as big star in the center
MVAsimdepex
MVAsimdepex
according to depth. It contains 100 data points with corresponding parameters controlling
its spread. The deepest point, the two-dimensional median, is indicated as a big star in the
center. The points with less depth are indicated via grey shades.

Summary
,→ The “depth” of a datapoint in one dimension can be computed by counting
Summary: Simplicial Depth
The “depth” of a datapoint in one dimension can be

computed by counting all (closed) interval of two datapoints
which contain the datapoint.
The “deepest” datapoint is the central point of the
distribution, the median.
The “depth” of a datapoint in arbitrary dimension p is defined
as the number of simplices (constructed from p + 1 points)
covering this point. It is called simplicial depth.

Summary: Simplicial Depth
A multivariate extension of the median is to take the

“deepest” datapoint of the distribution.
In the bivariate case we count all triangles of datapoints which
contain the datapoint to compute its depth.

Projection Pursuit
“Projection Pursuit” stands for a class of exploratory

projection techniques. This class contains methods designed
for analyzing high-dimensional data using low-dimensional
projections.
The idea of Exploratory Projection Pursuit (EPP) looks for
nonlinear structure contained in the data. The idea has been
applied to regression analysis, density estimation, classification
and discriminant analysis.

Projection Pursuit Density Estimation
The problem in density estimation is to estimate the unknown

density function f (x) from observations xi ∈ Rp .
In projection pursuit density estimation we approximate the
unknown density by
m
Y
fˆm (x) = g0 (x) gk (Λ>
k x).
k=1

Summary: Projection Pursuit
Exploratory Projection Pursuit is a technique to find

interesting structure in highdimensional data via
low-dimensional projections. Since the Gaussian distribution
represents a standard situation, we define the Gaussian
distribution as the most uninteresting.
The search of interesting structures is done via a projection
R
score like the Friedman–Tukey index IFT (α) = f 2 . The
parabolic distribution has a low score. We maximize this score
over all projections.

The Jones–Gibson index maximizes
IJG (α) = κ3 (α> X ) + κ24 (α> X )/4
as a function of α.
The entropy index maximizes
Z
IE (α) = f log f
where f is the density of α> X .

In Projection Pursuit Regression the idea is to represent the

unknown function by a sum of non-parametric regression
functions on projections. The key problem is the choice of the
number of terms and often the interpretability.
In Projection Pursuit Density Estimation the idea is to
represent the unknown density by a product of non-parametric
density estimates on projections. The key problem is the
choice of the number of terms.

Sliced Inverse Regression
Sliced Inverse Regression (SIR) is a dimension reduction method.

The idea is to find a smooth regression function that operates on a
variable set of projections.
Given a response variable Y and a (random) vector X ∈ Rp of
explanatory variables, SIR is based on the model:
Y = m(β1> X , . . . , βk> X , ε),
where β1 , . . . , βk are unknown projection vectors

The SIR Algorithm

The algorithm to estimate the EDR-directions via SIR is as follows:
Standardize x:
zi = Σ̂−1/2 (xi − x̄).
Divide the range of yi in S non-overlapping intervals (slices)
Sttks , s = 1, . . . , S. ns denotes the number of observations
within slice Sttks , and ISttks the indicator function for this
slice:
Xn
ns = ISttks (yi ).
i=1

Compute the mean of zi over all slices. This is a crude

b 1 for the inverse regression curve m1 :
estimate m
n
1 X
z̄s = zi ISttks (yi ).
ns
i=1
Calculate the estimate for Cov{m1 (y )}:

S
X
Vb = n−1 ns z̄s z̄s> .
s=1

Identify the eigenvalues λ̂i and eigenvectors η̂i of Vb .

Transform the standardized EDR-directions η̂i back to the
original scale. Now the estimates for the EDR-directions are
given by
β̂i = Σ̂−1/2 η̂i .

408 19 Computationally Intensive Techniques
True index vs Response
150
!
100
!
! !
!
! ! !! !
! !
!
50
! ! ! !! !
! ! ! !
! ! !
! !! !! !! ! !
! ! ! !! !
!
!! ! ! !! ! !! ! ! ! !!! !
! !! !
!! !! ! !
! ! ! !!!! ! ! !! !!
! !!! ! ! !
response
! ! ! !! ! !! !! !!!! ! ! !! !!
! !! !! !!!!!!!!
!!!!!!!
!
! !
!
!!!!
! !
! ! !!!! !!!! !!!!!!
!!
! !!
!! !!
!!! !!
!!
!!!
!!! !!
!!
!!!! !!
!! !!! ! !
! !!
!
!!
!
!!!!
!!!!!!
! !!! !
! !
0
! !! !!! !!! !! ! !
! !!
! ! ! !!
! ! !!!
!
!!!!!
!! !! !
!! !
! ! ! !
!
!!
! !!
!
!
!50
!
!
!
!
!100
!
!
!
!150
!4 !2 0 2 4
first index
Figure:
FigurePlot
19.5: of
Plotthe true
of the response
true response versus
versus the true the true
indices. Theindices
monotonicThe monotonic
and the convex
shapes can be clearly seen. MVAsirdata
and the convex shapes can be clearly seen MVAsirdata
150
!
Figure 19.5: Plot of the true response versus the true indices. The monotonic and the convex
Highly Interactive,
shapesComputationally
can be clearly seen. Intensive Techniques
MVAsirdata 20-18
150
!
100
!
! !
!
!
!
!!!! !
!
50
!! !!
! ! ! ! ! !
! ! !
!
!!
! ! !! ! ! !
! ! ! !!!
! !
!! ! ! !!
!
!! !!! ! ! ! ! ! !! !
! ! ! ! ! ! !! !
!
!!!
! !
!! !! !! ! !
response
!!!! ! ! !
!
! ! !!!
! ! ! ! ! ! ! !!! !
!
!
!!!!
!!!
! !!!!!!! !
! !
!!
! !!!! !! !!! !! !!
!!
!!
!!!!
!!! !!! !
!
! !!! !!!
! ! ! !!!!! ! !
! !!! !
! !!!! !! !
! ! !! !! !
! !!!
!! !!!!! ! !!!!! ! !
0
!!!!! !! ! ! !
! ! ! !!!!
! !!!!!
!!!
! ! !
!! ! !
!!!
! ! !
! ! ! ! ! !
!
!! !
!
!
!50
!
!
!
!
!100
!
!
!
!150
!4 !2 0 2
second index
Figure: Plot
Figure 19.6: of
Plotthe true
of the response
true response versus
versus the true the true
indices. The indices
monotonicThe monotonic
and the convex
shapes can be clearly seen. MVAsirdata
and the convex shapes can be clearly seen MVAsirdata

19.3 Sliced Inverse Regression 407
XBeta1 vs Response XBeta1 XBeta2 Response
150
!
100
! !!
! ! !
!! ! !! ! ! ! ! ! ! ! !
50
! !! ! !!!
50 100 150
! !
! ! !
response
! !! !! !! !!
!! !! !! ! !! !!! ! !
! !!!!!!!
!
!! !!! !!!
!
! ! !!! !!
!!!
! !! !! !!
!! ! !!! !!! !!! ! !! ! !!
!!
!
! !
!!!
!! !!
! ! !! !! !
second index
!!! !! ! !! ! !! !!! ! !!! ! !!!
! !! !!
!
!!! ! !
!!! !!! !
! ! ! !
! !!! !!!
!! !
!!!
!
!! !! !! !! ! ! !!
! ! ! !!! !
!!
!! !!
! ! !
!!! !
!!!
!!! !
! !
!!!
!!!
!!
!!!
! !
!! !!!!!!
!!!
!!
!! ! !!
!
!!!
! !!
!
!!! ! !
!!!
!!!
!!
!!
!!!
!
!! ! !
!!
! !!
! !
!! !!
!!!!! ! !
!!
!!
!!
!
!
! !
!
!!
!!!
!!
!
!!
!!! !!! !! ! !!
! ! !
0
!!!
!!!!
!!
!
!!! !
!!!
!! !
!
!
!!
!!
! !!
!!!!
! ! !
response
!! !
! !!!!
!!!
!!!
!
!
!!!
!!!
!
! !
!
!!!!
!!
! ! !!!! !
! !!
!!
!!
!
!!
!!
!!
!!
! !!! !!!
!
! !
!50
!
! 3
0
!!
! !
2
!
!150!100 !50
! !
! 1
0
! ! ! !1
!2
! !3
!150
! !4
!3 !2 !1 0 1 2 3
!3 !2 !1 0 1 2
first index first index
XBeta2 vs Response Scree Plot
150
1.0
! !
!
100 !
Psi(k) Eigenvalues
! !
0.8
!
!! !! !! !
50
! ! ! ! !!
! ! ! !
! !!
response
! !
!! !!! !! !! !
!!!!!! ! ! !
!! ! ! ! !!
!!! !!! !!
!!
0.6
!! ! ! !! !
!
! ! !!
! !! !
!!!!!!!!!
! ! !! ! ! ! !!!
!!!
!!
!! !
!
!
!!
! !
!
!
!
!
!
!
!
!
!
!
!!!
!!!
! !
!
!!
!
! !
!!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!!
!!
!!
!!
! !
!!
!
!!
!
!!
!
!
!
!
!
!!!!
!!
! ! ! !! !! !
0
!!!!
! !!!!
!
! !!! !!
!
!! !! ! !!!! !
!!
! !
!! !!
!! !
0.4
!
!50
! !
! !
0.2
! !
!
!150
!3 !2 !1 0 1 2 1.0 1.5 2.0 2.5 3.0

second index K
Figure: The left plots show the response versus the estimated
Figure 19.4: SIR: The left plots show the response versus the estimated EDR-directions. The
EDR-directions. The upper
upper right plot right plot is plot
is a three-dimensional a three-dimensional
of the first two directions and theplot of the
response. The lower right plot shows the eigenvalues λ̂ (∗) and the cumulative i
first two directions sum and (◦). the response. The lower right plot shows the
MVAsirdata
eigenvalues λ̂i (∗) and the cumulative β̂ β̂

sum β̂
(◦) MVAsirdata
1 2 3
-0.272 0.964 -0.001

0.670 0.100 0.777
0.690 0.244 -0.630
Table 19.4: SIR II: EDR-directions for simulated data.
SIR II Algorithm
1. Do steps 1 to 3 of the SIR algorithm.

2. Compute the slice covariance matrix Vbs :
n
1 X
Vbs = IHs (yi )zi zi> − ns z̄s z̄s> .
ns − 1
i=1
3. Calculate the mean over all slice covariances:
S
1X b
V̄ = ns Vs .
n
s=1

4. Compute an estimate for the overall mean distance:
1 X b 2
S S
1 X b2
Vb = ns Vs − V̄ = ns Vs − V̄ 2 .
n n
s=1 s=1
5. Identify the eigenvectors and eigenvalues of Vb and scale back

the eigenvectors. This gives estimates for the SIR II
EDR-directions:
β̂i = Σ̂−1/2 η̂i .

19.3 Sliced Inverse Regression 409
XBeta1 vs Response XBeta1 XBeta2 Response
150
!
100
! !!
! ! !
!! ! !! ! ! ! ! ! ! ! !
50
! !! ! !!!
50 100 150
! !
! ! !
response
! !! !! !! !!
!! !! !! ! !! !! ! !
! !!!!!!!
!
!! !!! !!!
!
! ! !!! !!
!!!
! !! !! !!
!! ! !!! !!! !!! ! ! !!
!!
!
! !
!!!! !!
!! !!
! ! !! !! !
second index
!!! !! ! !! ! !! !!! ! !!! ! !!!
! !! !!
!
!!! ! !
!!! !!! !
! ! ! !
! !!! !!!
!! !
!!!
!
!! !! !! !! ! ! !!
! ! ! !!! !
!!
!! !!
! ! !
!!! !
!!!
!!! !
! !
!!!
!!!
!!
!!!
! !
!! !!!!!!
!!!
!!
!! ! !!
!
!!!
! !!
!
!!! ! !
!
!!!
!!!
!!
!!
!
!!!
!
!! ! !
!!
! !!
! !
!! !!
!!!!! ! !
!!
!!
!!
!
!
! !
!
!!
!!!
!!
!
!!
!!! !! ! !! !!
! ! !!
0
!!!
!!!!
!!
!
!!! !
!!!
!! !
!
!
!!
!!
! !
!!!!
! ! !
response
!! !
! !!!!
!!!
!!!
!
!
!!!
!!!
!
! !
!
!!!!
!!
! ! !!!! !
! !!
!!
!!
!
!!
!!
!!
!!
! !! ! !!!
!
! !
!50
!
! 3
0
!!
! !
2
!
!150!100 !50
! !
! 1
0
! ! ! !1
!2
! !3
!150
! !4
!3 !2 !1 0 1 2 3
!3 !2 !1 0 1 2
first index first index
XBeta2 vs Response Scree Plot
150
1.0
! !
!
100 !
Psi(k) Eigenvalues
! !
0.8
!
!! !! !! !
50
! ! ! ! !!
! ! ! !
! !!
response
! !
!! !!! !! !! !
!!!!!! ! ! !
!! ! ! ! !!
!!! !!! !!
!!
0.6
!! ! ! !! !
!
! ! !!
! !! !
!!!!!!!!!
! ! !! ! ! ! !!!
!!!
!!
!! !
!
!
!!
! !
!
!
!
!
!
!
!
!
!
!!!
!! !!
! !
!
!!
!
! !
!!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!!
!!
!!
!!
! !
!!
!
!!
!
!!
!
!
!
!
!
!!!!
!!
! ! ! !! !! !
0
!!!!!!!! !
!
! !!! !!
!
!! !! ! !!!! !
!!
! !
!! !!
!! !
0.4
!
!50
! !
! !
0.2
! !
!
!150
!3 !2 !1 0 1 2 1.0 1.5 2.0 2.5 3.0

second index K
Figure: SIR II mainly sees the direction β2 .The left plots show the
response versus thetheestimated EDR-directions. The upper right plot is a
Figure 19.7: SIR II mainly sees the direction β . The left plots show the response versus
estimated EDR-directions. The upper right plot is a three-dimensional plot
2
of the first two directions and the response. The lower right plot shows the
three-dimensional plot of the first two directions
eigenvalues λ̂ (∗) and the cumulative sum (◦). i
and the response. The
MVAsir2data
lower right plot shows the eigenvalues λ̂i (∗)

In summary, SIR has found the direction which shows
and the cumulative sum (◦)
a strong relation regarding the con-
ditional expectation between β1> x and y, and SIR II has found the direction where the
conditional variance is varying, namely, β2> x. MVAsir2data
The behavior of the two SIR algorithms is as expected. In addition, we have seen that it is
worthwhile to apply both versions of SIR. It is possible to combine SIR and SIR II (Cook
and Weisberg, 1991; Li, 1991; Schott, 1994) directly, or to investigate higher conditional
moments. For the latter it seems to be difficult to obtain theoretical results.
Summary: Sliced Inverse Regression
SIR serves as dimension reduction tool for regression problems.

Inverse regression avoids the curse of dimensionality.
The dimension reduction can be conducted without estimation
of the regression function y = m(x) .
SIR searches for the effective dimension reduction (EDR) by
computing the inverse regression IR.
SIR II bases the EDR on computing the inverse conditional
variance.
SIR might miss EDR directions that are found by SIR II.

Support Vector Machines (SVM)
Classification method
Nonparametric multivariate non-linear statistical technique
Applications: pattern recognition, medical diagnostics, text
classification, corporate bankruptcy analysis

Illustration
Example: company rating based on financial ratios, e.g.

functions of net income, total assets, interest payments
SVM - nonlinear classification technique to produce
company’s score
Company’s score transformed to probability of default (PD)
and rating

Illustration
    
 
       
 
  
    
       
     
  
  
      
  
    
  
   
 

   
   

Figure: Different linear classification functions (1) and (2) and a

non-linear one (3) in the linearly non-separable case.

Loss
Nonlinear classifier function f may be described by a function

class F fixed a priori, i.e. class of linear classifiers
(hyperplanes)
Loss

1 0, if classification is correct,
L(x, y ) = |f (x) − y | =
2 1, if classification is wrong.

Expected and Empirical Risk
Expected risk – expected value of loss under the true

probability measure
Z
1
R (f ) = |f (x) − y | dF (x, y )
2
Empirical risk – average value of loss over the training set
n
1X1
Rb (f ) = |f (xi ) − yi |
n 2
i=1

VC bound
Vapnik-Chervonenkis (VC) bound – there is a function φ

(monotone increasing in VC dimension h) so that for all f ∈ F
with probability 1 − η hold

h log(η)
R (f ) ≤ Rb (f ) + φ ,
n n

Linearly Separable Case
Margin ( d )
Figure: Separating hyperplane and its margin in linearly separable case

Choose f ∈ F such that margin (d− + d+ ) is maximal

No error separation – if all i = 1, 2, ..., n satisfy
xi> w + b ≥ +1 for yi = +1
xi> w + b ≤ −1 for yi = −1
Both constraints are combined into
yi (xi> w + b) − 1 ≥ 0 i = 1, 2, ..., n

Distance between margins and the separating hyperplane is

d+ = d− = 1/kw k
Maximize the margin d+ + d− = 2/kw k could be attained by
minimizing kw k or kw k2
Lagrangian for the primal problem
X n
1
LP (w , b) = kw k2 − αi {yi (xi> w + b) − 1}
2
i=1

Karush-Kuhn-Tucker (KKT) first order optimality conditions

n
X
∂LP
=0: wk − αi yi xik = 0 k = 1, ..., d
∂wk
i=1
n
X
∂LP
=0: αi y i = 0
∂b
i=1
yi (xi> w + b) − 1 ≥ 0 i = 1, ..., n
αi ≥ 0
αi {yi (xi> w + b) − 1} = 0

Pn
Solution w = i=1 αi yi xi , therefore
n n
1 1 XX
kw k2 = αi αj yi yj xi> xj
2 2
i=1 j=1
n
X X n Xn n
X
− αi {yi (xi> w + b) − 1} = − αi yi xi> αj yj xj + αi
i=1 i=1 j=1 i=1
X n X n n
X
= − αi αj yi yj xi> xj + αi
i=1 j=1 i=1
Lagrangian for the dual problem

n
X n n
1 XX
LD (α) = αi − αi αj yi yj xi> xj
2
i=1 i=1 j=1

Primal and dual problems
min LP (w , b)
w ,b
n
X
max LD (α) s.t. αi ≥ 0, αi yi = 0
α
i=1
Optimization problem is convex, therefore the dual and primal

formulations give the same solution
Support vector – points i for which yi (xi> w + b) = 1 holds

Linearly Non-separable Case
Figure: Hyperplane and its margin in linearly non-separable case

Slack variables ξi represent the violation from strict separation
xi> w + b ≥ 1 − ξi for yi = 1,
xi> w + b ≤ −1 + ξi for yi = −1,
ξi ≥ 0
constraints are combined into
yi (xi> w + b) ≥ 1 − ξi and ξi ≥ 0
If ξi > 0, the objective function is

X n
1
kw k2 + C ξi
2
i=1

Lagrange function for the primal problem
X n
1
LP (w , b, ξ) = kw k2 + C ξi −
2
i=1
n
X n
X
αi {yi xi> w + b − 1 + ξi } − µi ξi ,
i=1 i=1
where αi ≥ 0 and µi ≥ 0 are Lagrange multipliers

Primal problem
min LP (w , b, ξ)
w ,b,ξ

First order conditions

n
X
∂LP
=0: wk − αi yi xik = 0
∂wk
i=1
n
X
∂LP
=0: αi y i = 0
∂b
i=1
∂LP
=0: C − αi − µ i = 0
∂ξi
s.t.
αi ≥ 0, µi ≥ 0, µi ξi = 0,
αi {yi (xi> w + b) − 1 + ξi } = 0

Pn
Value i=1 αi yi b = 0. Translate primal problem into
n
X n n n
1 XX X
LD (α) = αi − αi αj yi yj xi> xj + ξi (C − αi − µi )
2
i=1 i=1 j=1 i=1
Last term is 0, therefore the dual problem is

 
X n n
1 XX
n 
max LD (α) = max αi − αi αj yi yj xi> xj ,
α α  2 
i=1 i=1 j=1
s.t. 0 ≤ αi ≤ C
n
X
αi yi = 0
i=1

Nonlinear Classification
Data Space Feature Space
Figure: Mapping two dimensional data space into a three dimensional

feature space, R2 7→ R3 . The transformation
√
Ψ(x1 , x2 ) = (x12 , 2x1 x2 , x22 )> corresponds to K (xi , xj ) = (xi> xj )2

Non-linear classifier – maps the data with a non-linear

structure via a function Ψ : Rp 7→ H into a very large
dimensional space H where the classification rule is (almost)
linear
All the training vectors xi appearing in LD only as scalar
products of xi> xj
Nonlinear SVM – transforms the scalar product to
ψ (xi )> ψ (xj )

Use kernel trick to compute the scalar product via a K

If K exists such that K (xi , xj ) = Ψ(xi )> Ψ(xj ), it can be used
without knowing Ψ explicitly
K (xi , xj ) requires to be positive definite, i.e. for any data set
x1 , ..., xn and any real numbers λ1 , ..., λn , K must satisfy
(Mercer’s theorem)
n X
X n
λi λj K (xi , xj ) ≥ 0
i=1 j=1
Optimization problem (with the same constraints as in linearly

non-separable case)
n
X n n
1 XX
LD (α) = αi − αi αj yi yj K (xi , xj )
2
i=1 i=1 j=1
Applied Multivariate Statistical Analysis (29)

Kernel functions, i.e.

K (xi , xj ) = e −kxi −xj k/2σ – the isotropic Gaussian kernel with
2
constant σ
> −2 −1
K (xi , xj ) = e −(xi −xj ) r Σ (xi −xj )/2 – the stationary Gaussian
kernel with an anisotropic radial basis with constant r and
variance-covariance matrix Σ from training set
K (xi , xj ) = (xi> xj + 1)p – the polynomial kernel of degree p
K (xi , xj ) = tanh(kxi> xj − δ) – the hyperbolic tangent kernel
with constant k and δ

Simulated Data
Parameters of SVM are on kernel parameters and capacity C

Using stationary gaussian kernel, for example
Generate Orange peel data and Spiral data

Simulation: Orange Peel Data

SVM classification plot ●
●
●
3 1.0
●
●
●
● ●
●
●
● ● ●
2 ●
●
● ● ● ● ●
● 0.5
●
● ● ●
●
1 ● ●
●● ●
●●
● ● ● ● ●
● ●
0.0
X2
● ●
0 ●
●●
● ● ● ●
● ● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ●
●
●
−1 ●
● ●
●
●
●
● ●
−0.5
● ● ●
● ●
● ● ●
● ●
−2 ● ●
●
●
●
● −1.0
−3 ●
−3 −2 −1 0 ● 1● 2 3
●
●
X1 ●
Figure: SVM classification for “orange peel” data, n = 200,

n−1 = n+1 = 100, x+1,i ∼ N((0, 0)> , 22 I), x−1,i ∼ N((0, 0)> , 0.52 I) with
SVM parameters r = 0.5 and C = 20/200. MVASVMorangepeel

Simulation: Spiral Data

SVM classification plot
4
● ● ●
● ● ● ● 0.2
● ●
●
● ●
●
●
● ●
●
●
● ●
2
● ● 0.1
● ●
● ● ●
● ● ●● ● ● ●
●●●●
● ●
●●● ●
● ●
● ● ● ●
● ● ●
● ● ●
● ●● ●
●●
● ● 0.0
X2
●0 ●
● ●
●
● ●●
●
●
●
●
●●
● ●
●
●
●● −0.1
● ● ●●
●
−2 ●
● ● ● ●● ●
● ●
●
−0.2
−4
−4 −2 0 2 4
X1
Figure: SVM (r = 0.1, C = 10/200) for noisy spiral data (spread over 3π
radian); distance between spirals: 1, n−1 = n+1 = 100, n = 200. Injected
noise εi ∼ N(0, 0.12 I). MVASVMspiral

Scoring Companies
Credit reform database – solvent (y=1) and insolvent (y= -1)

Period – 1996 to 2002
25 financial ratio variables
Pre-processing – replace outliers with Q25 − 1 ∗ 5IQ and
Q75 + 1 ∗ 5IQ
Example, randomly selected 50 solvent and 50 insolvent
companies, predictors (i) Accounts Payable turnover (x24)
and (ii) ratio of Operating Income and Total Asset (x3)

Scoring Companies
Using isotropic Gaussian kernel

Triangles – insolvent companies, circles – solvent companies
Colored background – score values f
Blue area – the higher the score and the greater the PD


●
0.5
0.2
0.0
0.1 ●
●
●
● −0.5
● ●
● ● ●
x3
● ● ●
● ● ● ● ●
● ● ● ●
● ●●
● ● ●●
0.0 ●● ●
●
●
●
−1.0
●
●
●
−0.1 ●
●
●
● −1.5
● ●
●
● −2.0
−0.2
● ● ● ●
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
x24
Figure: SVM plot of classifier function with σ = 100 and C = 1.

Percentage of misclassification is 0.43.

Effect of parameters
Radial base σ = 100 is too large, SVM has trouble to classify

Reduce σ = 2 and σ = 0.5 (C remains the same), better
classification
Increase C will decrease the distance between group


●
1.0
0.2
0.5
0.1 ●
●
●
● ● ● 0.0
● ● ●
x3
● ● ●
● ● ● ● ●
● ● ● ●
● ●●
● ● ●●
0.0 ●● ●
●
●
●
−0.5
●
●
●
−0.1 ●
●
●
●
●
● ●
−1.0
●
−0.2
● ● ● ● −1.5
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
x24
Figure: SVM plot of classifier function with with σ = 2 and C = 1.

Percentage of misclassification is reduced to 0.27.


●
1.0
0.2
0.5
0.1 ●
●
●
● ●
● ● ●
●
x3
● ● ● ● ●
● ● ●
0.0
● ● ● ●
● ●●
● ● ●●
0.0 ●● ●
●
●
●
●
●
−0.5
●
−0.1 ●
●
●
●
● ●
●
−1.0
●
−0.2
● ● ● ●
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
x24
Figure: SVM plot of classifier function with σ = 0.5 and C = 1.



●
5
0.2
0.1 ●
● 0
●
● ●
● ● ●
●
x3
● ● ●
● ● ● ● ●
● ● ● ●
● ●●
● ● ●●
0.0 ●● ●
●
●
●
●
● −5
●
−0.1 ●
●
●
●
● ●
●
●
−0.2 −10
● ● ● ●
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
x24
Figure: SVM plot of classifier function with C = 200 and σ = 2.


x24
.20: Ratings of companies in two dimensions. High capacity (C = 200) w
basis is fixed at σ = 2. Percentage of missclassification is 0.24.
Validation
Figure: Cumulative accuracy profile (CAP) curve.

Figure 19.21:Figure 3: Cumulative accuracy profile (CAP) curve.
Cumulative accuracy profile (CAP) curve.
advantage over a random assignment of risk scores while those close to one
display good predictive power. Mathematically, if y = {0; 1}, the AR value
is defined as R1
y(x)dx − 21
scoreAppliedMultivariate StatisticalARAnalysis
= R10 .
scale. Now, we introduce Accuracy Ratio (AR) derived from CAP
y
0 ideal
(x)dx − 12
(12)
CAP curve simultaneously measures Type I and Type II errors

CAP curve represents the cumulative probability of default
events for different percentiles of the risk score scale
Accuracy Ratio (AR) – ratio of area between model CAP
curve and the random curve and area between perfect
CAP curve and the random CAP curve
AR – used for measuring and comparing the performance of
credit risk model

Example: Mapping score to PD
Set three rating grades: safe (score f < −0.0115), neutral

(−0.0115 < f < 0.0115) and risky (f > 0.0115)
Calculate (i) total number of companies (ii) number of failing
companies in each group
Ratio of failing in a group to all companies would give the
estimated probability of default, i.e. PDsafe = 0.24,
PDneutral = 0.50 and PDrisky = 0.76

Summary: Support Vector Machines
SVM classification is done by mapping the data into feature

space and finding a separating hyperplane there
The support vectors are determined via a quadratic
optimization problem
SVM produces highly nonlinear classification boundaries

Mvaslides

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mvaslides

Uploaded by

Copyright:

Available Formats

Applied Multivariate Statistical Analysis

Ladislaus von Bortkiewicz Chair of

An old Swiss 1000-franc bank note.

Example: Swiss bank data

X1 = length of the bill

Applied Multivariate Statistical Analysis

Applied Multivariate Statistical Analysis

Applied Multivariate Statistical Analysis

City Country Pop. (10000) Order Statistics

Table: The 15 largest world cities in 2006.

Five Number Summary

Applied Multivariate Statistical Analysis

Applied Multivariate Statistical Analysis

Construction of the Boxplot

Applied Multivariate Statistical Analysis

Boxplot for world cities. MVAboxcity

Boxplot for the mileage of U.S. American, Japanese and European

Variable X6 (diagonal) of bank notes, the genuine on the left.

Variable X1 (length) of bank notes, the genuine on the left.

Median and mean bars indicate the central locations.

Applied Multivariate Statistical Analysis

The outliers are marked by • if they are outside

Applied Multivariate Statistical Analysis

Bj (x0 , h) = [x0 + (j − 1)h, x0 + jh), j ∈ Z.

Applied Multivariate Statistical Analysis

Swiss Bank Notes Swiss Bank Notes

Diagonal of counterfeit bank notes. Histograms with x0 = 137.8

x_0 = 137.65 x_0 = 137.85

Swiss Bank Notes Swiss Bank Notes

x_0 = 137.75 x_0 = 137.95

Diagonal of counterfeit bank notes. Histograms with h = 0.4 and

Modes of the density are detected with a histogram.

Applied Multivariate Statistical Analysis

The consequence of a too large h is a flat and unstructured

It is recommended to use averaged histograms. They are

Applied Multivariate Statistical Analysis

Histogram (at the center of a bin) can be written as

Define K (u) = I (|u| ≤ 1/2)

Applied Multivariate Statistical Analysis

K (u) = 12 I(|u| ≤ 1) Uniform

Table: Kernel functions.

Applied Multivariate Statistical Analysis

0.5 0.5 0.5

Quartic (biweight) Gaussian

Kernel functions. MVAkernelfunctions

137 138 139 140 141 142 143

Densities of diagonals of genuine and counterfeit bank notes.

Choice of the bandwidth h

Contours of the density of X5 , X6 of genuine and counterfeit bank

Summary: Kernel densities

Kernel densities estimate distribution densities by the kernel

Applied Multivariate Statistical Analysis

Summary: Kernel densities

A simple (but not necessarily correct) way to find a good

Applied Multivariate Statistical Analysis

Scatterplots - bivariate or trivariate plots of variables against each

Applied Multivariate Statistical Analysis

2D scatterplot for X5 vs. X6 of the bank notes. Genuine notes are

3D scatterplot for (X4 , X5 , X6 ) of the bank notes. Genuine notes

Draftman’s plot of the bank notes. The pictures in the left-hand

Scatterplots in two and three dimensions help us in seeing

Applied Multivariate Statistical Analysis

Index Index Index Index Index

Γ(n × r ), ∆(p × r ), Γ> Γ= ∆> ∆ = Ir and