Professional Documents
Culture Documents
Presentation B 6 Sep 2021
Presentation B 6 Sep 2021
2 6. september 2021
Uni- and bivariate normal distributions
The univariate random variable X is normally distributed 𝑋 ∈ 𝑁 𝜇, 𝜎 if
1 1 1
𝑓 𝑥 = exp − 𝑥−𝜇
2𝜋 𝜎 2𝜎
𝜇−𝜎 𝜇 𝜇+𝜎
𝑋
The bivariate random variable 𝑿 = is normally distributed 𝑿 ∈ 𝑁 𝝁, 𝚺 if
𝑋
1 1 1 𝟏
𝑓 𝒙 = exp(− 𝒙−𝝁 𝚺 𝒙−𝝁 )
2𝜋 det 𝚺 2
3 6. september 2021
Marginal distributions and independence
5 6. september 2021
Correlation between
From: Fig. 6 in H. P. Hacker and H. S. Pearson (1944): The Growth, Survival, Wandering and Variation of
6 the Long-Tailed Field Mouse, 6. september 2021
Apodemus Sylvaticus . Biometrika, Vol. 33, No. 2 (Aug., 1944), pp. 136-162
Matrix of 2D-scatterplots
SAS code for producing a
matrix of 2-D scatterplots
for the variables considered,
here weight and organ.
In the diagonal we have the
histograms and fitted normal
pdf:
proc sgscatter
data=home.apodemus;
matrix weight organ/
ellipse=(type=predicted)
diagonal=(histogram normal);
run;
7 6. september 2021
'Univariate histogram/cdf and fitted gaussian
Goodness-of-Fit Tests for Normal Distribution Goodness-of-Fit Tests for Normal Distribution
Test Statistic p Value Test Statistic p Value
Kol.-Smi. 0.07419799 Pr > D 0.044 Kol.-Smi. 0.07681607 Pr > D 0.030
Cra.-v.Mis. 0.11030575 Pr > W-Sq 0.085 Cra.-v.Mis. 0.14813973 Pr > W-Sq 0.025
And.-Dar. 0.58514995 Pr > A-Sq 0.131 And.-Dar. 0.86813666 Pr > A-Sq 0.025
8 6. september 2021
Weight of mouse and weight of its reproductive organs
Pearson Correlation Coefficients, N = 149
Simple Statistics Prob > |r| under H0: Rho=0
Variable N Mean Std Dev Sum Minimum Maximum
weight organ
weight 149 23.37584 2.37378 3483 16.00000 29.50000
weight 1.00000 0.64963
organ 149 2.02148 0.41777 301.20000 0.70000 3.00000
<.0001
organ 0.64963 1.00000
<.0001
9 6. september 2021
Test of correlation coefficient
10 6. september 2021
11 6. september 2021
Correlation does not imply causation I
ρ=0.998
http://www.tylervigen.com
/spurious-correlations
ρ=0.959
12 6. september 2021
Correlation does not imply causation II
• Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and
gesture furtively while mouthing 'look over there'.
13 6. september 2021
Moments of a random variable
The mean is the center of mass in a probability function:
𝝁𝟏 = 𝟎 𝝁𝟐 = 𝟎 𝝁𝟑 = 𝟔
14 6. september 2021
Moments of multivariate random variables:
Mean, dispersion and covariance
𝐸(𝑋 ) 𝜇
𝐸 𝑿 =𝝁= ⋮ = ⋮
𝐸(𝑋 ) 𝜇
𝐶𝑜𝑣(𝑋 , 𝑌 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑌 )
𝐶 𝑿, 𝒀 = 𝐸[(𝑿 − 𝝁)(𝒀 − 𝝂)′ ] = ⋮ ⋮
𝐶𝑜𝑣(𝑋 , 𝑌 ) ⋯ 𝐶𝑜𝑣(𝑋 , 𝑌 )
15 6. september 2021
Rules for computing moments of simple functions
Univariate Multivariate
𝐸 𝑎 + 𝑏𝑋 = 𝑎 + 𝑏𝐸 𝑋 𝐸 𝑨+𝑿 =𝑨+𝐸 𝑿
𝐸 𝑋+𝑌 =𝐸 𝑋 +𝐸 𝑌 𝐸 𝑨𝑿 = 𝑨𝐸(𝑿)
𝐸 𝑿𝑩 = 𝐸 𝑿 𝑩
𝐸 𝑿 + 𝒀 = 𝐸 𝑿 + 𝐸(𝒀)
𝑉 𝑎 + 𝑏𝑋 = 𝑏 𝑉 𝑋 𝐷 𝒃 + 𝑿 = 𝐷(𝑿)
𝑉 𝑋 + 𝑌 = 𝑉 𝑋 + 𝑉 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌 𝐷 𝑨𝑿 = 𝑨𝐷 𝑿 𝑨𝑇
= 𝑉 𝑋 + 𝑉 𝑌 if 𝑋, 𝑌 independent
𝐷 𝑿 + 𝒀 = 𝐷 𝑿 + 𝐷 𝒀 + 𝐶 𝑿, 𝒀 + 𝐶 𝒀, 𝑿
= 𝐷 𝑿 + 𝐷 𝒀 if 𝑿, 𝒀 independent
𝐶𝑜𝑣 𝑋, 𝑋 = 𝑉(𝑋) 𝐶 𝑿, 𝑿 = 𝐷(𝑿)
𝐶𝑜𝑣 𝑎𝑋, 𝑏𝑌 = 𝑎𝑏𝐶𝑜𝑣 𝑋, 𝑌 𝐶 𝑿, 𝒀 = 𝐶 𝒀, 𝑿 𝑇
16 6. september 2021
Multivariate Normal or Gaussian Distribution
Dispersion or
variance-covariance matrix
17
eigenvectors of dispersion matrix 6. september 2021
Estimation of parameters I
The i’th
Observation
The mean
The empirical
variance- covariance
or dispersion matrix
𝑿 𝑋 ⋯ 𝑋
Observations collected 𝑿= ⋮ = ⋮ ⋮
𝑿 𝑋 ⋯ 𝑋
in data matrix
Other formulas
for mean and dispersion
18 6. september 2021
Estimation of parameters II
19 6. september 2021
Dispersion Matrix – positive semidefinite
• What does it mean?
𝑋 1 𝜌 𝜌
• 𝑌 and 𝚺 = 𝜌 1 𝜌 , what happens if 𝜌 < −0.5 ?
𝑍 𝜌 𝜌 1
20 6. september 2021
21 6. september 2021
Dispersion Matrix – positive semidefinite
• What does it mean?
𝑋 1 𝜌 𝜌
• 𝑌 and 𝚺 = 𝜌 1 𝜌 , what happens if 𝜌 < −0.5 ?
𝑍 𝜌 𝜌 1
22 6. september 2021
PDF for Bivariate Gaussian (rho=0 and rho=0.7)
23 6. september 2021
The influence of the correlation coefficient I
8 9 3𝜌
Distribution: 𝑁 ,
6 3𝜌 1
24 6. september 2021
The influence of the correlation coefficient II
25 6. september 2021
Contour plots of Gaussian pdfs for
± 𝝆 = 0, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99
26 6. september 2021
27 6. september 2021
x
28 6. september 2021
x
x
x x
29 6. september 2021
x
x
x
30 6. september 2021
x
x
x
x
x
31 6. september 2021
x
x
x
x
x
32 6. september 2021
x
x
x
x
x
x
33 6. september 2021
34 6. september 2021
Conditional distribution I
we get
𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎
𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎
35 6. september 2021
25,000 records of human heights and weights
Source: Statistics Online Computational Ressource
http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_Hei
ghtsWeights.html
The dataset below contains 25,000 records of human heights and weights. These data were obtained in
1993 by a Growth Survey of 25,000 children from birth to 18 years of age recruited from Maternal and Child
Health Centres (MCHC) and schools and were used to develop Hong Kong's current growth charts for
weight, height, weight-for-age, weight-for-height and body mass index (BMI).
data home.heiwei;
input Index Height Weight;
datalines;
1 65.78331 112.9925
2 71.51521 136.4873 SAS program that reads the 25000 records
3 69.39874 153.0269
with corresponding values of Height and
.
. More data lines Weight.
. The data is stored in the permanent
24998 64.69855 118.2655
24999 67.52918 132.2682 dataset home.heiwei from where we may
25000 68.87761 124.8742
refer to it.
;
run;
36 6. september 2021
SAS program reading and analyzing Height-Weight data
Data hw;
set home.heiwei;
38 6. september 2021
Weight Distribution Histogram, 25000 obs.
39 6. september 2021
Cumulative Empirical Weight Distribution, 25000 obs.
40 6. september 2021
Fitted regression line
41 6. september 2021
Fitted regression line y = 57.57 + 0.082x
42 6. september 2021
y = 57.57 + 0.082x
Given a height of 70 the weight is: Given a weight of 133.02 the height is:
𝑤 = −82.58 + 3.08 70 = 133.02 ℎ = 57.57 + 0.082 133.02 = 68.48
43 6. september 2021
A random subsample of the data
253 observations
To improve the general
view:
A SAS program that picks
1% of the data randomly
44 6. september 2021
Conditional distribution I
we get
𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎
𝜎
𝐸 𝑋 𝑋 =𝑥 =𝜇 +𝜌 (𝑥 − 𝜇 )
𝜎
45 6. september 2021
Conditional distribution II
Let
46 6. september 2021
Multiple correlation coefficient I
Let
47 6. september 2021
Multiple correlation coefficient II
48 6. september 2021
Example
𝑋
𝑌
𝑍
1 𝜌 𝜌
𝚺= 𝜌 1 𝜌
𝜌 𝜌 1
49 6. september 2021
Example – solution, using theorem 1.40
1 𝜌 𝜌
1 𝜌
Σ = 𝜌 1 𝜌 , Σ = , 𝜎 =1
𝜌 1
𝜌 𝜌 1
det Σ
𝜌 | = 1−
𝜎 det Σ
1 1 1−𝜌 1 𝜌+𝜌 𝜌 𝜌−𝜌 𝜌 1+𝜌 𝜌 𝜌−𝜌 𝜌 1
= 1−
1 1 1−𝜌 𝜌
1 − 3𝜌 + 2𝜌
= 1−
1−𝜌
50 6. september 2021
Multiple correlation coefficient III
51 6. september 2021
Partial correlation coefficient
The partial correlations between some variables given other variables are simply
correlations in the conditional distribution of the ’some’ variables given the ’other’
variables. It follows that
52 6. september 2021
Example: Ice cream - I
𝐷 1 0.5
,𝚺= ,
𝐼 0.5 1
where D is the number of drowning accidents, and I is the ice-cream sale.
Should ice-cream be prohibited? Is the ice-cream industry behind drownings
to get more sales?
𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1
53
SOLUTION ON NEXT SLIDE – no need to write down 6. september 2021
Example: Ice cream - II
𝜌 −𝜌 𝜌
𝜌 | =
(1 − 𝜌 )(1 − 𝜌 )
0.5 − 0.7 0.7
𝐷 1 0.5 0.7 = = 0.0196
𝐼 , 𝚺 = 0.5 1 0.7 (1 − 0.7 )(1 − 0.7 )
𝑇 0.7 0.7 1
There is very little correlation between
drowings and ice cream sale, once we
control for the temperatue!
54 6. september 2021
Example: Ice cream - III
𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1
55 6. september 2021
56 6. september 2021
Example: Ice cream - III
𝐷 1 0.5 0.7
𝐼 , 𝚺 = 0.5 1 0.7
𝑇 0.7 0.7 1
𝐷 1 0.5 0.7
𝐷 |𝑇 = − 1 0.7 0.7
𝐼 0.5 1 0.7
1 0.5 0.49 0.49 0.51 0.01
= − =
0.5 1 0.49 0.49 0.01 0.51
0.01
𝑐𝑜𝑟𝑟 𝐷, 𝐼 = = 0.0196
0.51 0.51
57 6. september 2021
Example: Ice cream - IV
58 6. september 2021
Cement strength–SAS program
Title "Correlations between cement measurements
based on 51 samples";
data cement (type=corr);
If only the correlation matrix and
infile cards missover;
input _type_ $ _Name_ $ C3S C3A BLAINE not the full dataset is available, we
Strgth3 Strgth28; must use input statements as shown.
cards;
N . 51 51 51 51 51
corr C3S 1
corr C3A -0.309 1
corr BLAINE 0.091 0.192 1
corr Strgth3 0.158 0.120 0.745 1
corr Strgth28 0.344 -0.166 0.320 0.464 1
;
Title "Correlation Matrix for the Cement Data";
proc print data=cement;
run;
We can not use Proc Corr to find the
Title "Eigenvalues of correlation matrix";
proc princomp data=cement; partial correlations if the correlation
run; matrix is inputted as above.
Title "Analysis conditioned on the fineness BLAINE";
Instead we must use Proc Princomp
proc princomp data=cement;
partial BLAINE; that also computes eigenvalues and
59 run; eigenvectors of the correlation matrix. 6. september 2021
Cement strength-Conventionel Wisdom
60 6. september 2021
Cement strength
Correlation Matrix
C3S C3A BLAINE Strgth3 Strgth28
C3S 1.000 . . . .
C3A -0.309 1.000 . . .
BLAINE 0.091 0.192 1.000 . .
Strgth3 0.158 0.120 0.745 1.000 .
Strgth28 0.344 -0.166 0.320 0.464 1.000
62 6. september 2021
SAS program –Pollution Data I
Data pollution;
input oecd hvols;
cards;
22 proc corr data=pollution
5 12
plots=scatter(nvar=2 alpha= .30 .50
15 4
16 21 .70)
16 41 cov;
19 14 var oecd hvols;
26 31 run;
24 29
16 31 proc kde data=pollution ;
36 8
bivar oecd hvols/plots=histogram (tilt=45
39 30
42 44 rotate=145)
44 26 surface(tilt=45
40 60 rotate=145)
42 34
42 34 histsurface(tilt=45 rotate=145);
50 14 run;
51 41
58 58
64 47
;
run;
63 6. september 2021
SAS program –Pollution Data II
64 6. september 2021
Output – pollution Data
66 6. september 2021
Conditional Means and Main Ellipse Axis – Pollution Data
67 6. september 2021
Exercises
68 6. september 2021