Presentation B 6 Sep 2021

02409 Multivariate Statistics
Presentation 7 Sep 2020

Anders Nymark Christensen, Assoc. Prof. DTU Compute, 324/110
anym@dtu.dk
Jannick Jørgensen Lønver, Msc. Student

Laurits Fromberg, Msc. Student
Nikolaj Normann Holm, PhD Student
Today’s lecture
1. SAS revisited, Apodemus case & a bit about correlation

2. Moments of multivariate random variables
3. Estimation of parameters
4. Conditional distributions
5. Hight-weight of children
6. Multiple correlation coefficient
7. Partial correlation coefficients
8. Case on pollution data
2 6. september 2021
Uni- and bivariate normal distributions
The univariate random variable X is normally distributed 𝑋 ∈ 𝑁 𝜇, 𝜎 if
1 1 1
𝑓 𝑥 = exp − 𝑥−𝜇
2𝜋 𝜎 2𝜎
𝜇−𝜎 𝜇 𝜇+𝜎
𝑋
The bivariate random variable 𝑿 = is normally distributed 𝑿 ∈ 𝑁 𝝁, 𝚺 if
𝑋
1 1 1 𝟏
𝑓 𝒙 = exp(− 𝒙−𝝁 𝚺 𝒙−𝝁 )

2𝜋 det 𝚺 2
3 6. september 2021
Marginal distributions and independence
Simultaneous distribution of 25000

corresponding values of height and weight
Marginal distribution of height
Are height and weight independent?
4
No, positively correlated 6. september 2021
Covariance and correlation
Mean and dispersion matrix of two-dimensional random variable
5 6. september 2021
Correlation between
weight of mouse and weight of its reproductive organs

in 149 male Apodemus caught between 28 March and 13 April 1937, 1938 and 1939.
data apodemus;
infile cards missover; This time we have several values of the variable organ for
input weight organ @;
do until (organ=.); each value of the variable weight.
output;
input organ @;
end;
This requires a different input statement.
cards;
16.0 1.6 The value ”.” means ”observation missing”.
17.5 0.7
18.5 1.0 1.4
19.0 1.7 1.8
19.5 1.2 1.3 1.5
20.0 1.5 1.7 1.8 1.9 2.0
20.5 1.1 1.2 1.3 1.4 1.4 1.4 1.6 2.1 /*We put the data into a permanent data
21.0 1.5 1.8 1.8 2.2
21.5 1.6 1.7 1.8 1.8 2.1
set home.apodemus.*/
22.0 1.5 1.5 1.6 1.7 1.7 1.8 1.9 1.9 2.2 2.3 2.7
22.5 1.3 1.6 1.6 1.7 1.7 1.8 1.8 1.9 2.0 2.0 2.0 2.0 2.0 2.1 2.1
2.1 2.1 2.2 2.3 2.3 2.4 data home.apodemus;
23.0 1.3 1.7 1.9 2.0 2.0 2.0 2.0 2.2 2.3 2.3 2.4 2.4 2.6
23.5 1.8 1.9 2.0 2.2 2.2 2.2 2.3 2.4 set apodemus;
24.0 1.8 1.8 1.9 2.3 2.3 2.3 2.4 2.4 2.4 2.4 2.6 3.0
24.5 1.8 1.8 1.9 2.0 2.0 2.0 2.1 2.2 2.2 2.3 2.3 2.3 2.5 run;
25.0 1.4 1.8 1.9 2.0 2.4 2.4
25.5 1.1 1.8 1.9 2.0 2.0 2.0 2.1 2.5 2.6 2.7
26.0 2.0 2.4 2.6 2.8
26.5 2.1 2.2 2.4 2.4 2.4 2.4 2.6 3.0 Title 'Mouse data, first 20 observations';
27.0 2.2 2.3 2.4 2.6 2.9
proc print data=home.apodemus (obs=20);
27.5 2.2 2.4 2.4 2.4
28.0 2.6 run;
29.0 2.4
29.5 2.7
;
run;
From: Fig. 6 in H. P. Hacker and H. S. Pearson (1944): The Growth, Survival, Wandering and Variation of
6 the Long-Tailed Field Mouse, 6. september 2021
Apodemus Sylvaticus . Biometrika, Vol. 33, No. 2 (Aug., 1944), pp. 136-162
Matrix of 2D-scatterplots
SAS code for producing a
matrix of 2-D scatterplots
for the variables considered,
here weight and organ.
In the diagonal we have the
histograms and fitted normal
pdf:
proc sgscatter
data=home.apodemus;
matrix weight organ/
ellipse=(type=predicted)
diagonal=(histogram normal);
run;
7 6. september 2021
'Univariate histogram/cdf and fitted gaussian
Some output from:

proc univariate
data=home.apodemus;
histogram/normal;
cdf/normal;
var weight organ;
run;
Goodness-of-Fit Tests for Normal Distribution Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value Test Statistic p Value
Kol.-Smi. 0.07419799 Pr > D 0.044 Kol.-Smi. 0.07681607 Pr > D 0.030
Cra.-v.Mis. 0.11030575 Pr > W-Sq 0.085 Cra.-v.Mis. 0.14813973 Pr > W-Sq 0.025
And.-Dar. 0.58514995 Pr > A-Sq 0.131 And.-Dar. 0.86813666 Pr > A-Sq 0.025
8 6. september 2021
Weight of mouse and weight of its reproductive organs
Pearson Correlation Coefficients, N = 149
Simple Statistics Prob > |r| under H0: Rho=0
Variable N Mean Std Dev Sum Minimum Maximum
weight organ
weight 149 23.37584 2.37378 3483 16.00000 29.50000
weight 1.00000 0.64963
organ 149 2.02148 0.41777 301.20000 0.70000 3.00000
<.0001
organ 0.64963 1.00000
<.0001
proc corr data=sasuser.apodemus;

var weight organ;
run;
proc sgplot data=sasuser.apodemus;
scatter x=weight y=organ/markerattrs=
(symbol=CircleFilled color=RED
size=10);
ellipse x=weight y=organ/alpha=0.1 ;
run;
9 6. september 2021
Test of correlation coefficient
10 6. september 2021
Correlation does not imply causation I
ρ=0.998
http://www.tylervigen.com
/spurious-correlations
ρ=0.959
Correlation does not imply causation II
• Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and
gesture furtively while mouthing 'look over there'.
Moments of a random variable
The mean is the center of mass in a probability function:
The variance measures the degree of variablity around the mean

𝑉 𝑋 = 𝜎 =𝐸 𝑋−𝜇 =𝐸 𝑋 −𝜇
𝝁𝟏 = 𝟎 𝝁𝟐 = 𝟎 𝝁𝟑 = 𝟔
𝝈𝟐𝟏 = 𝟏 𝝈𝟐𝟐 = 𝟏𝟔 𝝈𝟐𝟑 = 𝟏
Moments of multivariate random variables:
Mean, dispersion and covariance
𝐸(𝑋 ) 𝜇
𝐸 𝑿 =𝝁= ⋮ = ⋮
𝐸(𝑋 ) 𝜇
𝑉(𝑋 ) 𝐶𝑜𝑣(𝑋 , 𝑋 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝜎 𝜎 ⋯ 𝜎

𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝑉(𝑋 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑋 ) ⋯ 𝜎
𝐷 𝐗 =𝚺=𝐸 𝑿−𝝁 𝑿−𝝁 = = 𝜎 𝜎
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝑉(𝑋 ) 𝜎 𝜎 ⋯ 𝜎
𝐶𝑜𝑣(𝑋 , 𝑌 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑌 )
𝐶 𝑿, 𝒀 = 𝐸[(𝑿 − 𝝁)(𝒀 − 𝝂)′ ] = ⋮ ⋮
𝐶𝑜𝑣(𝑋 , 𝑌 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑌 )
𝑉 𝑋 ≥ 0 corresponds to 𝐷 𝐗 symmetric and positive semidefinite
Rules for computing moments of simple functions
Univariate Multivariate
𝐸 𝑎 + 𝑏𝑋 = 𝑎 + 𝑏𝐸 𝑋 𝐸 𝑨+𝑿 =𝑨+𝐸 𝑿
𝐸 𝑋+𝑌 =𝐸 𝑋 +𝐸 𝑌 𝐸 𝑨𝑿 = 𝑨𝐸(𝑿)
𝐸 𝑿𝑩 = 𝐸 𝑿 𝑩
𝐸 𝑿 + 𝒀 = 𝐸 𝑿 + 𝐸(𝒀)
𝑉 𝑎 + 𝑏𝑋 = 𝑏 𝑉 𝑋 𝐷 𝒃 + 𝑿 = 𝐷(𝑿)
𝑉 𝑋 + 𝑌 = 𝑉 𝑋 + 𝑉 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌 𝐷 𝑨𝑿 = 𝑨𝐷 𝑿 𝑨𝑇
= 𝑉 𝑋 + 𝑉 𝑌 if 𝑋, 𝑌 independent
𝐷 𝑿 + 𝒀 = 𝐷 𝑿 + 𝐷 𝒀 + 𝐶 𝑿, 𝒀 + 𝐶 𝒀, 𝑿
= 𝐷 𝑿 + 𝐷 𝒀 if 𝑿, 𝒀 independent
𝐶𝑜𝑣 𝑋, 𝑋 = 𝑉(𝑋) 𝐶 𝑿, 𝑿 = 𝐷(𝑿)
𝐶𝑜𝑣 𝑎𝑋, 𝑏𝑌 = 𝑎𝑏𝐶𝑜𝑣 𝑋, 𝑌 𝐶 𝑿, 𝒀 = 𝐶 𝒀, 𝑿 𝑇
𝐶𝑜𝑣 𝑋 + 𝑈, 𝑌 = 𝐶𝑜𝑣 𝑋, 𝑌 + 𝐶𝑜𝑣 𝑈, 𝑌 𝐶 𝑨𝑿, 𝑩𝒀 = 𝑨𝐶 𝑿, 𝒀 𝑩𝑇
𝐶𝑜𝑣 𝑋, 𝑌 + 𝑉 = 𝐶𝑜𝑣 𝑋, 𝑌 + 𝐶𝑜𝑣 𝑋, 𝑉 𝐶 𝑿 + 𝑼, 𝒀 = 𝐶 𝑿, 𝒀 + 𝐶 𝑼, 𝒀

𝐶 𝑿, 𝒀 + 𝑽 = 𝐶 𝑿, 𝒀 + 𝐶 𝑿, 𝑽
Multivariate Normal or Gaussian Distribution
p-dimensional random variable
Mean value or expectation
Dispersion or
variance-covariance matrix
Frequency or density function
Contour ’lines’ are ellipsoids

with main axes given by
17
eigenvectors of dispersion matrix 6. september 2021
Estimation of parameters I
The i’th
Observation
The mean
The empirical
variance- covariance
or dispersion matrix
𝑿 𝑋 ⋯ 𝑋
Observations collected 𝑿= ⋮ = ⋮ ⋮
𝑿 𝑋 ⋯ 𝑋
in data matrix
Other formulas
for mean and dispersion
Estimation of parameters II
Dispersion Matrix – positive semidefinite
• What does it mean?
• Consider exercise 1.1:
𝑋 1 𝜌 𝜌
• 𝑌 and 𝚺 = 𝜌 1 𝜌 , what happens if 𝜌 < −0.5 ?
𝑍 𝜌 𝜌 1
Dispersion Matrix – positive semidefinite
• What does it mean?
• Consider exercise 2.1:
𝑋 1 𝜌 𝜌
• 𝑌 and 𝚺 = 𝜌 1 𝜌 , what happens if 𝜌 < −0.5 ?
𝑍 𝜌 𝜌 1
Not possible! The positive

semidefinite dispersion matrices
are those that make sense!
PDF for Bivariate Gaussian (rho=0 and rho=0.7)
The influence of the correlation coefficient I
Probability Density Functions for bivariate Gaussians with

correlation coefficients 𝜌 = 0, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99
8 9 3𝜌
Distribution: 𝑁 ,
6 3𝜌 1
The influence of the correlation coefficient II
PDFs from previous slide truncated at z=0.01
Contour plots of Gaussian pdfs for
± 𝝆 = 0, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99
x
x
x
x x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Conditional distribution I
For the two-dimensional normal distribution we get
we get
Main axis of contour ellipse
𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎
𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎
25,000 records of human heights and weights
Source: Statistics Online Computational Ressource
http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_Hei
ghtsWeights.html
The dataset below contains 25,000 records of human heights and weights. These data were obtained in
1993 by a Growth Survey of 25,000 children from birth to 18 years of age recruited from Maternal and Child
Health Centres (MCHC) and schools and were used to develop Hong Kong's current growth charts for
weight, height, weight-for-age, weight-for-height and body mass index (BMI).
data home.heiwei;
input Index Height Weight;
datalines;
1 65.78331 112.9925
2 71.51521 136.4873 SAS program that reads the 25000 records
3 69.39874 153.0269
with corresponding values of Height and
.
. More data lines Weight.
. The data is stored in the permanent
24998 64.69855 118.2655
24999 67.52918 132.2682 dataset home.heiwei from where we may
25000 68.87761 124.8742
refer to it.
;
run;
SAS program reading and analyzing Height-Weight data
Data hw;
set home.heiwei;
proc univariate data=hw;

histogram/normal;
cdf/normal;
var weight;
run;
proc sgplot data=hw;

scatter x=height y=weight;
run;
proc reg data=hw
PLOTS(MAXPOINTS=25000);
model weight=height;
run;
Scatterplot of 25000 values of (Height, Weight)
Weight Distribution Histogram, 25000 obs.
Goodness-of-Fit Tests for

Normal Distribution
Test p Value
Kolmogorov-
>0.150
Smirnov
Cramer-von
0.180
Mises
Anderson-
0.190
Darling
Chi-Square 0.447
Cumulative Empirical Weight Distribution, 25000 obs.
Fitted regression line
Fitted regression line y = 57.57 + 0.082x
y = 57.57 + 0.082x
Given a height of 70 the weight is: Given a weight of 133.02 the height is:
𝑤 = −82.58 + 3.08 70 = 133.02 ℎ = 57.57 + 0.082 133.02 = 68.48
A random subsample of the data
253 observations
To improve the general
view:
A SAS program that picks
1% of the data randomly
ods graphics on;

Title 'Reduced
dataset';
data hwred;
set hw;
x=ranuni(1234);
if x < 0.01 then
output
hwred;
run;
proc reg data=hwred;
model weight=height;
run;
ods graphics off;
Conditional distribution I
For the two-dimensional normal distribution we get
we get
Main axis of contour ellipse
𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎
𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎
Conditional distribution II
Let
Multiple correlation coefficient I
Let
Multiple correlation coefficient II
Example
We consider a three-dimensional random variable
𝑋
𝑌
𝑍
with dispersion matrix
1 𝜌 𝜌
𝚺= 𝜌 1 𝜌
𝜌 𝜌 1
Find the squared multiple correlation between X and (Y, Z )T !
Example – solution, using theorem 1.40
1 𝜌 𝜌
1 𝜌
Σ = 𝜌 1 𝜌 , Σ = , 𝜎 =1
𝜌 1
𝜌 𝜌 1
det Σ
𝜌 | = 1−
𝜎 det Σ
1 1 1−𝜌 1 𝜌+𝜌 𝜌 𝜌−𝜌 𝜌 1+𝜌 𝜌 𝜌−𝜌 𝜌 1
= 1−
1 1 1−𝜌 𝜌
1 − 3𝜌 + 2𝜌
= 1−
1−𝜌
Interpretation: The more correlated the variables are,

the more we know about X given Y and Z.
Multiple correlation coefficient III
Partial correlation coefficient
The partial correlations between some variables given other variables are simply
correlations in the conditional distribution of the ’some’ variables given the ’other’
variables. It follows that
Example: Ice cream - I
We consider a two-dimensional random variable
𝐷 1 0.5
,𝚺= ,
𝐼 0.5 1
where D is the number of drowning accidents, and I is the ice-cream sale.
Should ice-cream be prohibited? Is the ice-cream industry behind drownings
to get more sales?
𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1
where T is the temperature
Find the correlation between D and I conditioned on T

From page 34 in the book
53
SOLUTION ON NEXT SLIDE – no need to write down 6. september 2021
Example: Ice cream - II
Example – solution, use from page 34 in book
𝜌 −𝜌 𝜌
𝜌 | =
(1 − 𝜌 )(1 − 𝜌 )
0.5 − 0.7 0.7
𝐷 1 0.5 0.7 = = 0.0196

𝐼 , 𝚺 = 0.5 1 0.7 (1 − 0.7 )(1 − 0.7 )
𝑇 0.7 0.7 1
There is very little correlation between
drowings and ice cream sale, once we
control for the temperatue!
Example: Ice cream - III
We consider a three-dimensional random variable Let
𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1
where D is the number of drowning accidents, and I is

the ice-cream sale and T the temperature.
Find the dispersion, when conditioned upon

temperature
Example: Ice cream - III
We consider a three-dimensional random variable Let
𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1
where D is the number of drowning accidents, and I is

the ice-cream sale and T the temperature.
Find the dispersion, when conditioned upon

temperature
𝐷 1 0.5 0.7
𝐷 |𝑇 = − 1 0.7 0.7
𝐼 0.5 1 0.7
1 0.5 0.49 0.49 0.51 0.01
= − =
0.5 1 0.49 0.49 0.01 0.51
0.01
𝑐𝑜𝑟𝑟 𝐷, 𝐼 = = 0.0196
0.51 0.51
Example: Ice cream - IV
data ice (type=corr);

infile cards missover;
input _type_ $ _Name_ $ D I T;
cards;
Partial Correlation Matrix
N . 100 100 100
D I
corr D 1
D 1.0000 0.0196
corr I 0.5 1 I 0.0196 1.0000
corr T 0.7 0.7 1
;
proc princomp data=ice;
partial T;
run;
Cement strength–SAS program
Title "Correlations between cement measurements
based on 51 samples";
data cement (type=corr);
If only the correlation matrix and
infile cards missover;
input _type_ $ _Name_ $ C3S C3A BLAINE not the full dataset is available, we
Strgth3 Strgth28; must use input statements as shown.
cards;
N . 51 51 51 51 51
corr C3S 1
corr C3A -0.309 1
corr BLAINE 0.091 0.192 1
corr Strgth3 0.158 0.120 0.745 1
corr Strgth28 0.344 -0.166 0.320 0.464 1
;
Title "Correlation Matrix for the Cement Data";
proc print data=cement;
run;
We can not use Proc Corr to find the
Title "Eigenvalues of correlation matrix";
proc princomp data=cement; partial correlations if the correlation
run; matrix is inputted as above.
Title "Analysis conditioned on the fineness BLAINE";
Instead we must use Proc Princomp
proc princomp data=cement;
partial BLAINE; that also computes eigenvalues and
59 run; eigenvectors of the correlation matrix. 6. september 2021
Cement strength-Conventionel Wisdom
Cement strength
Correlation Matrix
C3S C3A BLAINE Strgth3 Strgth28
C3S 1.000 . . . .
C3A -0.309 1.000 . . .
BLAINE 0.091 0.192 1.000 . .
Strgth3 0.158 0.120 0.745 1.000 .
Strgth28 0.344 -0.166 0.320 0.464 1.000
Partial Correlation Matrix

C3S C3A BLAINE Strgth3 Strgth28
C3S 1.0000
C3A -.3340 1.0000
BLAINE
Strgth3 0.1358 -.0352 1.0000
Strgth28 0.3337 -.2446 0.3570 1.0000
Partial correlation coefficient II
SAS program –Pollution Data I
Data pollution;
input oecd hvols;
cards;
22 proc corr data=pollution
5 12
plots=scatter(nvar=2 alpha= .30 .50
15 4
16 21 .70)
16 41 cov;
19 14 var oecd hvols;
26 31 run;
24 29
16 31 proc kde data=pollution ;
36 8
bivar oecd hvols/plots=histogram (tilt=45
39 30
42 44 rotate=145)
44 26 surface(tilt=45
40 60 rotate=145)
42 34
42 34 histsurface(tilt=45 rotate=145);
50 14 run;
51 41
58 58
64 47
;
run;
SAS program –Pollution Data II
proc fcmp outlib=sasuser.funcs.trial; proc template;

function binor(x, y, r, sx, sy, mx, my); define statgraph g3grid_surface;
z=(exp(-(((x-mx)/sx)*((x-mx)/sx)-2*r*((x- begingraph;
mx)/sx)*
((y-my)/sy)+((y-my)/sy)*((y-
entrytitle "Bivariate Gaussian
my)/sy)/(2* Rho=0.62";
(1-r*r)))))/(6.283185*sx*sy*sqrt(1- layout overlay3d/tilt=45 rotate= 145
r*r)); cube= false;
return(z); surfaceplotparm x=x y=y z=z /
endsub;
options cmplib=sasuser.funcs; surfacetype=fillgrid;
data pdfgauss;
endlayout;
r=0.62;sx=17.64; endgraph;
sy=16.70;mx=32.35;my=29.05; end;
do x=0 to 65 by 1; run;
do y=0 to 65 by 1;
z=binor(x, y, r, sx, sy, mx, my); proc sgrender data=pdfgauss
output pdfgauss;
end;
end;
template=g3grid_surface;
run; run;
Output – pollution Data
Obs oecd hvols

1 2 2 Simple Statistics
2 5 12
Variable N Mean Std Dev Sum Minimum Maximum
3 15 4
oecd 20 32.35000 17.63750 647.00000 2.00000 64.00000
4 16 21
5 16 41 hvols 20 29.05000 16.70952 581.00000 2.00000 60.00000
6 19 14
7 26 31
Covariance Matrix, DF = 19
8 24 29
9 16 31
oecd hvols
10 36 8 oecd 311.0815789181.7710526
11 39 30 hvols 181.7710526279.2078947
12 42 44
13 44 26
14 40 60
Pearson Correlation Coefficients, N = 20
15 42 34 Prob > |r| under H0: Rho=0
16 42 34 OECD HVOLS
17 50 14 OECD 1 0.61677 (0.0038)
18 51 41 HVOLS 0.61677 (0.0038) 1
19 58 58
20 64 47
Output – pollution Data II
Prediction ellipse, Proc corr
histogram,
Proc fcmp
fitted bivariate Gaussian Data pdfgauss
Proc kde Proc template
Proc sgrender
Conditional Means and Main Ellipse Axis – Pollution Data
Exercises
1.3 is a ‘pen and paper’ exercise to explore

multiple and partial correlation
2.4 is a real data case on hay production,

that explores partial correlation
2.5 is a real data case on fitness data
Heiwei, height and weight data are under

‘SAS material’, if you want to try it out
Solutions are already uploaded

Presentation B 6 Sep 2021

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation B 6 Sep 2021

Uploaded by

Copyright:

Available Formats

02409 Multivariate Statistics

Presentation 7 Sep 2020

Jannick Jørgensen Lønver, Msc. Student

1. SAS revisited, Apodemus case & a bit about correlation

Simultaneous distribution of 25000

Mean and dispersion matrix of two-dimensional random variable

weight of mouse and weight of its reproductive organs

Some output from:

proc corr data=sasuser.apodemus;

The variance measures the degree of variablity around the mean

𝝈𝟐𝟏 = 𝟏 𝝈𝟐𝟐 = 𝟏𝟔 𝝈𝟐𝟑 = 𝟏

𝑉(𝑋 ) 𝐶𝑜𝑣(𝑋 , 𝑋 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝜎 𝜎 ⋯ 𝜎

𝑉 𝑋 ≥ 0 corresponds to 𝐷 𝐗 symmetric and positive semidefinite

𝐶𝑜𝑣 𝑋 + 𝑈, 𝑌 = 𝐶𝑜𝑣 𝑋, 𝑌 + 𝐶𝑜𝑣 𝑈, 𝑌 𝐶 𝑨𝑿, 𝑩𝒀 = 𝑨𝐶 𝑿, 𝒀 𝑩𝑇

𝐶𝑜𝑣 𝑋, 𝑌 + 𝑉 = 𝐶𝑜𝑣 𝑋, 𝑌 + 𝐶𝑜𝑣 𝑋, 𝑉 𝐶 𝑿 + 𝑼, 𝒀 = 𝐶 𝑿, 𝒀 + 𝐶 𝑼, 𝒀

p-dimensional random variable

Mean value or expectation

Frequency or density function

Contour ’lines’ are ellipsoids

• Consider exercise 1.1:

• Consider exercise 2.1:

Not possible! The positive

Probability Density Functions for bivariate Gaussians with

PDFs from previous slide truncated at z=0.01

For the two-dimensional normal distribution we get

Main axis of contour ellipse

proc univariate data=hw;

proc sgplot data=hw;

Goodness-of-Fit Tests for

ods graphics on;

For the two-dimensional normal distribution we get

Main axis of contour ellipse

We consider a three-dimensional random variable

with dispersion matrix

Find the squared multiple correlation between X and (Y, Z )T !

Interpretation: The more correlated the variables are,

We consider a two-dimensional random variable

where T is the temperature

Find the correlation between D and I conditioned on T

Example – solution, use from page 34 in book

We consider a three-dimensional random variable Let

where D is the number of drowning accidents, and I is

Find the dispersion, when conditioned upon

We consider a three-dimensional random variable Let

where D is the number of drowning accidents, and I is

Find the dispersion, when conditioned upon

data ice (type=corr);

Partial Correlation Matrix

proc fcmp outlib=sasuser.funcs.trial; proc template;

Obs oecd hvols

1.3 is a ‘pen and paper’ exercise to explore

2.4 is a real data case on hay production,

2.5 is a real data case on fitness data

Heiwei, height and weight data are under

Solutions are already uploaded

You might also like