You are on page 1of 68

02409 Multivariate Statistics

Presentation 7 Sep 2020


Anders Nymark Christensen, Assoc. Prof. DTU Compute, 324/110
anym@dtu.dk

Jannick Jørgensen Lønver, Msc. Student


Laurits Fromberg, Msc. Student
Nikolaj Normann Holm, PhD Student
Today’s lecture

1. SAS revisited, Apodemus case & a bit about correlation


2. Moments of multivariate random variables
3. Estimation of parameters
4. Conditional distributions
5. Hight-weight of children
6. Multiple correlation coefficient
7. Partial correlation coefficients
8. Case on pollution data

2 6. september 2021
Uni- and bivariate normal distributions
The univariate random variable X is normally distributed 𝑋 ∈ 𝑁 𝜇, 𝜎 if

1 1 1
𝑓 𝑥 =   exp − 𝑥−𝜇
2𝜋 𝜎 2𝜎

𝜇−𝜎 𝜇 𝜇+𝜎

𝑋
The bivariate random variable 𝑿 = is normally distributed 𝑿 ∈ 𝑁 𝝁, 𝚺 if
𝑋

1 1 1 𝟏
𝑓 𝒙 =   exp(− 𝒙−𝝁 𝚺 𝒙−𝝁 )
 
2𝜋 det 𝚺 2

3 6. september 2021
Marginal distributions and independence

Simultaneous distribution of 25000


corresponding values of height and weight
Marginal distribution of height
Are height and weight independent?
4
No, positively correlated 6. september 2021
Covariance and correlation

Mean and dispersion matrix of two-dimensional random variable

5 6. september 2021
Correlation between

weight of mouse and weight of its reproductive organs


in 149 male Apodemus caught between 28 March and 13 April 1937, 1938 and 1939.
data apodemus;
infile cards missover; This time we have several values of the variable organ for
input weight organ @;
do until (organ=.); each value of the variable weight.
output;
input organ @;
end;
This requires a different input statement.
cards;
16.0 1.6 The value ”.” means ”observation missing”.
17.5 0.7
18.5 1.0 1.4
19.0 1.7 1.8
19.5 1.2 1.3 1.5
20.0 1.5 1.7 1.8 1.9 2.0
20.5 1.1 1.2 1.3 1.4 1.4 1.4 1.6 2.1 /*We put the data into a permanent data
21.0 1.5 1.8 1.8 2.2
21.5 1.6 1.7 1.8 1.8 2.1
set home.apodemus.*/
22.0 1.5 1.5 1.6 1.7 1.7 1.8 1.9 1.9 2.2 2.3 2.7
22.5 1.3 1.6 1.6 1.7 1.7 1.8 1.8 1.9 2.0 2.0 2.0 2.0 2.0 2.1 2.1
2.1 2.1 2.2 2.3 2.3 2.4 data home.apodemus;
23.0 1.3 1.7 1.9 2.0 2.0 2.0 2.0 2.2 2.3 2.3 2.4 2.4 2.6
23.5 1.8 1.9 2.0 2.2 2.2 2.2 2.3 2.4 set apodemus;
24.0 1.8 1.8 1.9 2.3 2.3 2.3 2.4 2.4 2.4 2.4 2.6 3.0
24.5 1.8 1.8 1.9 2.0 2.0 2.0 2.1 2.2 2.2 2.3 2.3 2.3 2.5 run;
25.0 1.4 1.8 1.9 2.0 2.4 2.4
25.5 1.1 1.8 1.9 2.0 2.0 2.0 2.1 2.5 2.6 2.7
26.0 2.0 2.4 2.6 2.8
26.5 2.1 2.2 2.4 2.4 2.4 2.4 2.6 3.0 Title 'Mouse data, first 20 observations';
27.0 2.2 2.3 2.4 2.6 2.9
proc print data=home.apodemus (obs=20);
27.5 2.2 2.4 2.4 2.4
28.0 2.6 run;
29.0 2.4
29.5 2.7
;
run;

From: Fig. 6 in H. P. Hacker and H. S. Pearson (1944): The Growth, Survival, Wandering and Variation of
6 the Long-Tailed Field Mouse, 6. september 2021

Apodemus Sylvaticus . Biometrika, Vol. 33, No. 2 (Aug., 1944), pp. 136-162
Matrix of 2D-scatterplots
SAS code for producing a
matrix of 2-D scatterplots
for the variables considered,
here weight and organ.
In the diagonal we have the
histograms and fitted normal
pdf:

proc sgscatter
data=home.apodemus;
matrix weight organ/
ellipse=(type=predicted)
diagonal=(histogram normal);
run;
7 6. september 2021
'Univariate histogram/cdf and fitted gaussian

Some output from:


proc univariate
data=home.apodemus;
histogram/normal;
cdf/normal;
var weight organ;
run;

Goodness-of-Fit Tests for Normal Distribution Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value Test Statistic p Value
Kol.-Smi. 0.07419799 Pr > D 0.044 Kol.-Smi. 0.07681607 Pr > D 0.030
Cra.-v.Mis. 0.11030575 Pr > W-Sq 0.085 Cra.-v.Mis. 0.14813973 Pr > W-Sq 0.025
And.-Dar. 0.58514995 Pr > A-Sq 0.131 And.-Dar. 0.86813666 Pr > A-Sq 0.025
8 6. september 2021
Weight of mouse and weight of its reproductive organs
Pearson Correlation Coefficients, N = 149
Simple Statistics Prob > |r| under H0: Rho=0
Variable N Mean Std Dev Sum Minimum Maximum
weight organ
weight 149 23.37584 2.37378 3483 16.00000 29.50000
weight 1.00000 0.64963
organ 149 2.02148 0.41777 301.20000 0.70000 3.00000
<.0001
organ 0.64963 1.00000
<.0001

proc corr data=sasuser.apodemus;


var weight organ;
run;
proc sgplot data=sasuser.apodemus;
scatter x=weight y=organ/markerattrs=
(symbol=CircleFilled color=RED
size=10);
ellipse x=weight y=organ/alpha=0.1 ;
ellipse x=weight y=organ/alpha=0.25 ;
ellipse x=weight y=organ/alpha=0.50 ;
ellipse x=weight y=organ/alpha=0.75 ;
run;

9 6. september 2021
Test of correlation coefficient

10 6. september 2021
11 6. september 2021
Correlation does not imply causation I

ρ=0.998

http://www.tylervigen.com
/spurious-correlations

ρ=0.959

12 6. september 2021
Correlation does not imply causation II

• Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and
gesture furtively while mouthing 'look over there'.
13 6. september 2021
Moments of a random variable
The mean is the center of mass in a probability function:

The variance measures the degree of variablity around the mean


𝑉 𝑋 = 𝜎 =𝐸 𝑋−𝜇 =𝐸 𝑋 −𝜇

𝝁𝟏 = 𝟎 𝝁𝟐 = 𝟎 𝝁𝟑 = 𝟔

𝝈𝟐𝟏 = 𝟏 𝝈𝟐𝟐 = 𝟏𝟔 𝝈𝟐𝟑 = 𝟏

14 6. september 2021
Moments of multivariate random variables:
Mean, dispersion and covariance

𝐸(𝑋 ) 𝜇
𝐸 𝑿 =𝝁= ⋮ = ⋮
𝐸(𝑋 ) 𝜇

𝑉(𝑋 ) 𝐶𝑜𝑣(𝑋 , 𝑋 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝜎 𝜎 ⋯ 𝜎


𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝑉(𝑋 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑋 ) ⋯ 𝜎
𝐷 𝐗 =𝚺=𝐸 𝑿−𝝁 𝑿−𝝁 = = 𝜎 𝜎
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝐶𝑜𝑣(𝑋 , 𝑋 ) 𝑉(𝑋 ) 𝜎 𝜎 ⋯ 𝜎

𝐶𝑜𝑣(𝑋 , 𝑌 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑌 )
𝐶 𝑿, 𝒀 = 𝐸[(𝑿 − 𝝁)(𝒀 − 𝝂)′ ] = ⋮ ⋮
𝐶𝑜𝑣(𝑋 , 𝑌 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑌 )

𝑉 𝑋 ≥ 0 corresponds to 𝐷 𝐗 symmetric and positive semidefinite

15 6. september 2021
Rules for computing moments of simple functions

Univariate Multivariate

𝐸 𝑎 + 𝑏𝑋 = 𝑎 + 𝑏𝐸 𝑋 𝐸 𝑨+𝑿 =𝑨+𝐸 𝑿

𝐸 𝑋+𝑌 =𝐸 𝑋 +𝐸 𝑌 𝐸 𝑨𝑿 = 𝑨𝐸(𝑿)
𝐸 𝑿𝑩 = 𝐸 𝑿 𝑩
𝐸 𝑿 + 𝒀 = 𝐸 𝑿 + 𝐸(𝒀)
𝑉 𝑎 + 𝑏𝑋 = 𝑏 𝑉 𝑋 𝐷 𝒃 + 𝑿 = 𝐷(𝑿)
𝑉 𝑋 + 𝑌 = 𝑉 𝑋 + 𝑉 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌 𝐷 𝑨𝑿 = 𝑨𝐷 𝑿 𝑨𝑇
= 𝑉 𝑋 + 𝑉 𝑌 if 𝑋, 𝑌 independent
𝐷 𝑿 + 𝒀 = 𝐷 𝑿 + 𝐷 𝒀 + 𝐶 𝑿, 𝒀 + 𝐶 𝒀, 𝑿
= 𝐷 𝑿 + 𝐷 𝒀 if 𝑿, 𝒀 independent
𝐶𝑜𝑣 𝑋, 𝑋 = 𝑉(𝑋) 𝐶 𝑿, 𝑿 = 𝐷(𝑿)
𝐶𝑜𝑣 𝑎𝑋, 𝑏𝑌 = 𝑎𝑏𝐶𝑜𝑣 𝑋, 𝑌 𝐶 𝑿, 𝒀 = 𝐶 𝒀, 𝑿 𝑇

𝐶𝑜𝑣 𝑋 + 𝑈, 𝑌 = 𝐶𝑜𝑣 𝑋, 𝑌 + 𝐶𝑜𝑣 𝑈, 𝑌 𝐶 𝑨𝑿, 𝑩𝒀 = 𝑨𝐶 𝑿, 𝒀 𝑩𝑇

𝐶𝑜𝑣 𝑋, 𝑌 + 𝑉 = 𝐶𝑜𝑣 𝑋, 𝑌 + 𝐶𝑜𝑣 𝑋, 𝑉 𝐶 𝑿 + 𝑼, 𝒀 = 𝐶 𝑿, 𝒀 + 𝐶 𝑼, 𝒀


𝐶 𝑿, 𝒀 + 𝑽 = 𝐶 𝑿, 𝒀 + 𝐶 𝑿, 𝑽

16 6. september 2021
Multivariate Normal or Gaussian Distribution

p-dimensional random variable

Mean value or expectation

Dispersion or
variance-covariance matrix

Frequency or density function

Contour ’lines’ are ellipsoids


with main axes given by

17
eigenvectors of dispersion matrix 6. september 2021
Estimation of parameters I

The i’th
Observation
The mean
The empirical
variance- covariance
or dispersion matrix
𝑿 𝑋 ⋯ 𝑋
Observations collected 𝑿= ⋮ = ⋮ ⋮
𝑿 𝑋 ⋯ 𝑋
in data matrix

Other formulas
for mean and dispersion

18 6. september 2021
Estimation of parameters II

19 6. september 2021
Dispersion Matrix – positive semidefinite
• What does it mean?

• Consider exercise 1.1:

𝑋 1 𝜌 𝜌
• 𝑌 and 𝚺 = 𝜌 1 𝜌 , what happens if 𝜌 < −0.5 ?
𝑍 𝜌 𝜌 1

20 6. september 2021
21 6. september 2021
Dispersion Matrix – positive semidefinite
• What does it mean?

• Consider exercise 2.1:

𝑋 1 𝜌 𝜌
• 𝑌 and 𝚺 = 𝜌 1 𝜌 , what happens if 𝜌 < −0.5 ?
𝑍 𝜌 𝜌 1

Not possible! The positive


semidefinite dispersion matrices
are those that make sense!

22 6. september 2021
PDF for Bivariate Gaussian (rho=0 and rho=0.7)

23 6. september 2021
The influence of the correlation coefficient I

Probability Density Functions for bivariate Gaussians with


correlation coefficients 𝜌 = 0, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99

8 9 3𝜌
Distribution: 𝑁 ,
6 3𝜌 1
24 6. september 2021
The influence of the correlation coefficient II

PDFs from previous slide truncated at z=0.01

25 6. september 2021
Contour plots of Gaussian pdfs for
± 𝝆 = 0, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99

26 6. september 2021
27 6. september 2021
x

28 6. september 2021
x

x
x x

29 6. september 2021
x
x
x

30 6. september 2021
x
x
x
x

x
31 6. september 2021
x
x
x
x
x

32 6. september 2021
x
x
x
x
x
x

33 6. september 2021
34 6. september 2021
Conditional distribution I

For the two-dimensional normal distribution we get

we get

Main axis of contour ellipse

𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎

𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎

35 6. september 2021
25,000 records of human heights and weights
Source: Statistics Online Computational Ressource
http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_Hei
ghtsWeights.html

The dataset below contains 25,000 records of human heights and weights. These data were obtained in
1993 by a Growth Survey of 25,000 children from birth to 18 years of age recruited from Maternal and Child
Health Centres (MCHC) and schools and were used to develop Hong Kong's current growth charts for
weight, height, weight-for-age, weight-for-height and body mass index (BMI).

data home.heiwei;
input Index Height Weight;
datalines;
1 65.78331 112.9925
2 71.51521 136.4873 SAS program that reads the 25000 records
3 69.39874 153.0269
with corresponding values of Height and
.
. More data lines Weight.
. The data is stored in the permanent
24998 64.69855 118.2655
24999 67.52918 132.2682 dataset home.heiwei from where we may
25000 68.87761 124.8742
refer to it.
;
run;

36 6. september 2021
SAS program reading and analyzing Height-Weight data

Data hw;
set home.heiwei;

proc univariate data=hw;


histogram/normal;
cdf/normal;
var weight;
run;

proc sgplot data=hw;


scatter x=height y=weight;
run;
proc reg data=hw
PLOTS(MAXPOINTS=25000);
model weight=height;
run;
37 6. september 2021
Scatterplot of 25000 values of (Height, Weight)

38 6. september 2021
Weight Distribution Histogram, 25000 obs.

Goodness-of-Fit Tests for


Normal Distribution
Test p Value
Kolmogorov-
>0.150
Smirnov
Cramer-von
0.180
Mises
Anderson-
0.190
Darling
Chi-Square 0.447

39 6. september 2021
Cumulative Empirical Weight Distribution, 25000 obs.

40 6. september 2021
Fitted regression line

41 6. september 2021
Fitted regression line y = 57.57 + 0.082x

42 6. september 2021
y = 57.57 + 0.082x
Given a height of 70 the weight is: Given a weight of 133.02 the height is:
𝑤 = −82.58 + 3.08 70 = 133.02 ℎ = 57.57 + 0.082 133.02 = 68.48

43 6. september 2021
A random subsample of the data
253 observations
To improve the general
view:
A SAS program that picks
1% of the data randomly

ods graphics on;


Title 'Reduced
dataset';
data hwred;
set hw;
x=ranuni(1234);
if x < 0.01 then
output
hwred;
run;
proc reg data=hwred;
model weight=height;
run;
ods graphics off;

44 6. september 2021
Conditional distribution I

For the two-dimensional normal distribution we get

we get

Main axis of contour ellipse

𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎

𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎

45 6. september 2021
Conditional distribution II

Let

46 6. september 2021
Multiple correlation coefficient I

Let

47 6. september 2021
Multiple correlation coefficient II

48 6. september 2021
Example

We consider a three-dimensional random variable

𝑋
𝑌
𝑍

with dispersion matrix

1 𝜌 𝜌
𝚺= 𝜌 1 𝜌
𝜌 𝜌 1

Find the squared multiple correlation between X and (Y, Z )T !

49 6. september 2021
Example – solution, using theorem 1.40

1 𝜌 𝜌
1 𝜌
Σ = 𝜌 1 𝜌 , Σ = , 𝜎 =1
𝜌 1
𝜌 𝜌 1

det Σ
𝜌 | = 1−
𝜎 det Σ
1 1 1−𝜌 1 𝜌+𝜌 𝜌 𝜌−𝜌 𝜌 1+𝜌 𝜌 𝜌−𝜌 𝜌 1
= 1−
1 1 1−𝜌 𝜌
1 − 3𝜌 + 2𝜌
= 1−
1−𝜌

Interpretation: The more correlated the variables are,


the more we know about X given Y and Z.

50 6. september 2021
Multiple correlation coefficient III

51 6. september 2021
Partial correlation coefficient

The partial correlations between some variables given other variables are simply
correlations in the conditional distribution of the ’some’ variables given the ’other’
variables. It follows that

52 6. september 2021
Example: Ice cream - I

We consider a two-dimensional random variable

𝐷 1 0.5
,𝚺= ,
𝐼 0.5 1
where D is the number of drowning accidents, and I is the ice-cream sale.
Should ice-cream be prohibited? Is the ice-cream industry behind drownings
to get more sales?
𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1

where T is the temperature

Find the correlation between D and I conditioned on T


From page 34 in the book

53
SOLUTION ON NEXT SLIDE – no need to write down 6. september 2021
Example: Ice cream - II

Example – solution, use from page 34 in book

𝜌 −𝜌 𝜌
𝜌 | =  
(1 − 𝜌 )(1 − 𝜌 )
0.5 − 0.7 0.7
𝐷 1 0.5 0.7 = = 0.0196
 
𝐼 , 𝚺 = 0.5 1 0.7 (1 − 0.7 )(1 − 0.7 )
𝑇 0.7 0.7 1
There is very little correlation between
drowings and ice cream sale, once we
control for the temperatue!

54 6. september 2021
Example: Ice cream - III

We consider a three-dimensional random variable Let

𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1

where D is the number of drowning accidents, and I is


the ice-cream sale and T the temperature.

Find the dispersion, when conditioned upon


temperature

55 6. september 2021
56 6. september 2021
Example: Ice cream - III

We consider a three-dimensional random variable Let

𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1

where D is the number of drowning accidents, and I is


the ice-cream sale and T the temperature.

Find the dispersion, when conditioned upon


temperature

𝐷 1 0.5 0.7
𝐷 |𝑇 = − 1 0.7 0.7
𝐼 0.5 1 0.7
1 0.5 0.49 0.49 0.51 0.01
= − =
0.5 1 0.49 0.49 0.01 0.51
0.01
𝑐𝑜𝑟𝑟 𝐷, 𝐼 =   = 0.0196
0.51 0.51
57 6. september 2021
Example: Ice cream - IV

data ice (type=corr);


infile cards missover;
input _type_ $ _Name_ $ D I T;
cards;
Partial Correlation Matrix
N . 100 100 100
D I
corr D 1
D 1.0000 0.0196
corr I 0.5 1 I 0.0196 1.0000
corr T 0.7 0.7 1
;
proc princomp data=ice;
partial T;
run;

58 6. september 2021
Cement strength–SAS program
Title "Correlations between cement measurements
based on 51 samples";
data cement (type=corr);
If only the correlation matrix and
infile cards missover;
input _type_ $ _Name_ $ C3S C3A BLAINE not the full dataset is available, we
Strgth3 Strgth28; must use input statements as shown.
cards;
N . 51 51 51 51 51
corr C3S 1
corr C3A -0.309 1
corr BLAINE 0.091 0.192 1
corr Strgth3 0.158 0.120 0.745 1
corr Strgth28 0.344 -0.166 0.320 0.464 1
;
Title "Correlation Matrix for the Cement Data";
proc print data=cement;
run;
We can not use Proc Corr to find the
Title "Eigenvalues of correlation matrix";
proc princomp data=cement; partial correlations if the correlation
run; matrix is inputted as above.
Title "Analysis conditioned on the fineness BLAINE";
Instead we must use Proc Princomp
proc princomp data=cement;
partial BLAINE; that also computes eigenvalues and
59 run; eigenvectors of the correlation matrix. 6. september 2021
Cement strength-Conventionel Wisdom

60 6. september 2021
Cement strength
Correlation Matrix
C3S C3A BLAINE Strgth3 Strgth28
C3S 1.000 . . . .
C3A -0.309 1.000 . . .
BLAINE 0.091 0.192 1.000 . .
Strgth3 0.158 0.120 0.745 1.000 .
Strgth28 0.344 -0.166 0.320 0.464 1.000

Partial Correlation Matrix


C3S C3A BLAINE Strgth3 Strgth28
C3S 1.0000
C3A -.3340 1.0000
BLAINE
Strgth3 0.1358 -.0352 1.0000
Strgth28 0.3337 -.2446 0.3570 1.0000
61 6. september 2021
Partial correlation coefficient II

62 6. september 2021
SAS program –Pollution Data I

Data pollution;
input oecd hvols;
cards;
22 proc corr data=pollution
5 12
plots=scatter(nvar=2 alpha= .30 .50
15 4
16 21 .70)
16 41 cov;
19 14 var oecd hvols;
26 31 run;
24 29
16 31 proc kde data=pollution ;
36 8
bivar oecd hvols/plots=histogram (tilt=45
39 30
42 44 rotate=145)
44 26 surface(tilt=45
40 60 rotate=145)
42 34
42 34 histsurface(tilt=45 rotate=145);
50 14 run;
51 41
58 58
64 47
;
run;
63 6. september 2021
SAS program –Pollution Data II

proc fcmp outlib=sasuser.funcs.trial; proc template;


function binor(x, y, r, sx, sy, mx, my); define statgraph g3grid_surface;
z=(exp(-(((x-mx)/sx)*((x-mx)/sx)-2*r*((x- begingraph;
mx)/sx)*
((y-my)/sy)+((y-my)/sy)*((y-
entrytitle "Bivariate Gaussian
my)/sy)/(2* Rho=0.62";
(1-r*r)))))/(6.283185*sx*sy*sqrt(1- layout overlay3d/tilt=45 rotate= 145
r*r)); cube= false;
return(z); surfaceplotparm x=x y=y z=z /
endsub;
options cmplib=sasuser.funcs; surfacetype=fillgrid;
data pdfgauss;
endlayout;
r=0.62;sx=17.64; endgraph;
sy=16.70;mx=32.35;my=29.05; end;
do x=0 to 65 by 1; run;
do y=0 to 65 by 1;
z=binor(x, y, r, sx, sy, mx, my); proc sgrender data=pdfgauss
output pdfgauss;
end;
end;
template=g3grid_surface;
run; run;

64 6. september 2021
Output – pollution Data

Obs oecd hvols


1 2 2 Simple Statistics
2 5 12
Variable N Mean Std Dev Sum Minimum Maximum
3 15 4
oecd 20 32.35000 17.63750 647.00000 2.00000 64.00000
4 16 21
5 16 41 hvols 20 29.05000 16.70952 581.00000 2.00000 60.00000
6 19 14
7 26 31
Covariance Matrix, DF = 19
8 24 29
9 16 31
oecd hvols
10 36 8 oecd 311.0815789181.7710526
11 39 30 hvols 181.7710526279.2078947
12 42 44
13 44 26
14 40 60
Pearson Correlation Coefficients, N = 20
15 42 34 Prob > |r| under H0: Rho=0
16 42 34 OECD HVOLS
17 50 14 OECD 1 0.61677 (0.0038)
18 51 41 HVOLS 0.61677 (0.0038) 1
19 58 58
20 64 47
65 6. september 2021
Output – pollution Data II
Prediction ellipse, Proc corr
histogram,
Proc fcmp
fitted bivariate Gaussian Data pdfgauss
Proc kde Proc template
Proc sgrender

66 6. september 2021
Conditional Means and Main Ellipse Axis – Pollution Data

67 6. september 2021
Exercises

1.3 is a ‘pen and paper’ exercise to explore


multiple and partial correlation

2.4 is a real data case on hay production,


that explores partial correlation

2.5 is a real data case on fitness data

Heiwei, height and weight data are under


‘SAS material’, if you want to try it out

Solutions are already uploaded

68 6. september 2021

You might also like