You are on page 1of 56

Introduction to Normal Distribution

Nathaniel E. Helwig

Assistant Professor of Psychology and Statistics


University of Minnesota (Twin Cities)

Updated 17-Jan-2017

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 1


Copyright

Copyright
c 2017 by Nathaniel E. Helwig

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 2


Outline of Notes
1) Univariate Normal: 3) Multivariate Normal:
Distribution form Distribution form
Standard normal Probability calculations
Probability calculations Affine transformations
Affine transformations Conditional distributions
Parameter estimation Parameter estimation

2) Bivariate Normal: 4) Sampling Distributions:


Distribution form Univariate case
Probability calculations Multivariate case
Affine transformations
Conditional distributions
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 3
Univariate Normal

Univariate Normal

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 4


Univariate Normal Distribution Form

Normal Density Function (Univariate)

Given a variable x ∈ R, the normal probability density function (pdf) is

1 (x−µ)2

f (x) = √ e 2σ2
σ 2π
(1)
(x − µ)2
 
1
= √ exp −
σ 2π 2σ 2

where
µ ∈ R is the mean
σ > 0 is the standard deviation (σ 2 is the variance)
e ≈ 2.71828 is base of the natural logarithm

Write X ∼ N(µ, σ 2 ) to denote that X follows a normal distribution.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 5


Univariate Normal Standard Normal

Standard Normal Distribution


If X ∼ N(0, 1), then X follows a standard normal distribution:
1 2
f (x) = √ e−x /2 (2)

0.4
f (x )
0.2 0.0

−4 −2 0 2 4
x
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 6
Univariate Normal Probability Calculations

Probabilities and Distribution Functions

Probabilities relate to the area under the pdf:


Z b
P(a ≤ X ≤ b) = f (x)dx
a (3)
= F (b) − F (a)

where Z x
F (x) = f (u)du (4)
−∞

is the cumulative distribution function (cdf).

Note: F (x) = P(X ≤ x) =⇒ 0 ≤ F (x) ≤ 1

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 7


Univariate Normal Probability Calculations

Normal Probabilities

Helpful figure of normal probabilities:


0.4
0.3
0.2

34.1% 34.1%
0.1

2.1% 2.1%
0.1% 13.6% 13.6% 0.1%
0.0

−3σ −2σ −1σ µ 1σ 2σ 3σ

From http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 8


Univariate Normal Probability Calculations

Normal Distribution Functions (Univariate)

Helpful figures of normal pdfs and cdfs:


1.0 1.0
μ = 0, σ 2 = 0.2, μ = 0, σ 2 = 0.2,
μ = 0, σ 2 = 1.0, μ = 0, σ 2 = 1.0,
0.8 μ = 0, σ 2 = 5.0, 0.8 μ = 0, σ 2 = 5.0,
μ = −2, σ 2 = 0.5, μ = −2, σ 2 = 0.5,
0.6 0.6

Φμ,σ (x)
φμ,σ (x)

2
2

0.4 -3 -2 -1 0.4 -3 -2 -1

0.2 0.2

0.0 0.0

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg http://en.wikipedia.org/wiki/File:Normal_Distribution_CDF.svg

Note that the cdf has an elongated “S” shape, referred to as an ogive.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 9


Univariate Normal Affine Transformations

Affine Transformations of Normal (Univariate)

Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0.

If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ).

Suppose that X ∼ N(1, 2). Determine the distributions of. . .


Y =X +3
Y = 2X + 3
Y = 3X + 2

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10


Univariate Normal Affine Transformations

Affine Transformations of Normal (Univariate)

Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0.

If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ).

Suppose that X ∼ N(1, 2). Determine the distributions of. . .


Y =X +3 =⇒ Y ∼ N(1(1) + 3, 12 (2)) ≡ N(4, 2)
Y = 2X + 3 =⇒ Y ∼ N(2(1) + 3, 22 (2)) ≡ N(5, 8)
Y = 3X + 2 =⇒ Y ∼ N(3(1) + 2, 32 (2)) ≡ N(5, 18)

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10


Univariate Normal Parameter Estimation

Likelihood Function

Suppose that x = (x1 , . . . , xn ) is an iid sample of data from a normal


iid
distribution with mean µ and variance σ 2 , i.e., xi ∼ N(µ, σ 2 ).

The likelihood function for the parameters (given the data) has the form
n n
(xi − µ)2
 
2
Y Y 1
L(µ, σ |x) = f (xi ) = √ exp −
2πσ 2 2σ 2
i=1 i=1

and the log-likelihood function is given by


n n
X n n 1 X
LL(µ, σ 2 |x) = log(f (xi )) = − log(2π)− log(σ 2 )− 2 (xi −µ)2
2 2 2σ
i=1 i=1

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 11


Univariate Normal Parameter Estimation

Maximum Likelihood Estimate of the Mean

The MLE of the mean is the value of µ that minimizes


n
X n
X
2
(xi − µ) = xi2 − 2nx̄µ + nµ2
i=1 i=1
Pn
where x̄ = (1/n) i=1 xi is the sample mean.

Taking the derivative with respect to µ we find that

∂ ni=1 (xi − µ)2


P
= −2nx̄ + 2nµ ←→ x̄ = µ̂
∂µ

i.e., the sample mean x̄ is the MLE of the population mean µ.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 12


Univariate Normal Parameter Estimation

Maximum Likelihood Estimate of the Variance


The MLE of the variance is the value of σ 2 that minimizes
n Pn
n 2 1 X 2 n 2 x 2 nx̄ 2
log(σ ) + 2 (xi − µ̂) = log(σ ) + i=12 i − 2
2 2σ 2 2σ 2σ
i=1
Pn
where x̄ = (1/n) i=1 xi is the sample mean.

Taking the derivative with respect to σ 2 we find that


Pn n
∂ n2 log(σ 2 ) + 1
2σ 2 i=1 (xi − µ̂)2 n 1 X
= − (xi − µ̂)2
∂σ 2 2σ 2 2σ 4
i=1
Pn
which implies that the sample variance σ̂ 2 = (1/n) i=1 (xi − x̄)2 is the
MLE of the population variance σ 2 .
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 13
Bivariate Normal

Bivariate Normal

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 14


Bivariate Normal Distribution Form

Normal Density Function (Bivariate)

Given two variables x, y ∈ R, the bivariate normal pdf is


(y −µy )2
n h io
1 (x−µx )2 2ρ(x−µx )(y −µy )
exp − 2(1−ρ 2) 2
σx
+ σ 2 − σx σy
f (x, y ) = p y (5)
2πσx σy 1 − ρ 2

where
µx ∈ R and µy ∈ R are the marginal means
σx ∈ R+ and σy ∈ R+ are the marginal standard deviations
0 ≤ |ρ| < 1 is the correlation coefficient

X and Y are marginally normal: X ∼ N(µx , σx2 ) and Y ∼ N(µy , σy2 )

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 15


Bivariate Normal Distribution Form

Example: µx = µy = 0, σx2 = 1, σy2 = 2, ρ = 0.6/ 2

0.12
4

0.10
0.12

0.08
0.10

f (x, y )
0.08

0.06
y
0
f (x, y )

0.06

0.04
0.04

−2
4
0.02
2

0.02
−4
0
−4
−2 −2 y

0.00
0
2 −4 −4 −2 0 2 4
4
x x

http://en.wikipedia.org/wiki/File:MultivariateNormal.png
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 16
Bivariate Normal Distribution Form

Example: Different Means

µx = 0, µy = 0 µx = 1, µy = 2 µx = −1, µy = −1
0.12

0.12
0.12
4

4
0.10

0.10
0.10
2

2
0.08

0.08
0.08
f (x, y )

f (x, y )

f (x, y )
0.06

0.06

0.06
y

y
0

0
0.04

0.04

0.04
−2

−2

−2
0.02

0.02

0.02
−4

−4

−4
0.00

0.00

0.00
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x


Note: for all three plots σx2 = 1, σy2 = 2, and ρ = 0.6/ 2.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 17


Bivariate Normal Distribution Form

Example: Different Correlations

ρ = −0.6/ 2 ρ=0 ρ = 1.2/ 2


0.12

0.20
0.10
4

4
0.10

0.08

0.15
2

2
0.08

f (x, y )

f (x, y )

f (x, y )
0.06
0.06
y

0.10
0

0
0.04
0.04
−2

−2

−2

0.05
0.02
0.02
−4

−4

−4
0.00

0.00

0.00
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x

Note: for all three plots µx = µy = 0, σx2 = 1, and σy2 = 2.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 18


Bivariate Normal Distribution Form

Example: Different Variances

σy = 1 σy = 2 σy = 2

0.12

0.08
4

4
0.10
0.15

0.06
2

2
0.08
f (x, y )

f (x, y )

f (x, y )
0.10

0.06

0.04
y

y
0

0
0.04
−2

−2

−2
0.05

0.02
0.02
−4

−4

−4
0.00

0.00

0.00
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x

Note: for all three plots µx = µy = 0, σx2 = 1, and ρ = 0.6/(σx σy ).

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 19


Bivariate Normal Probability Calculations

Probabilities and Multiple Integration

Probabilities still relate to the area under the pdf:


Z bx Z by
P([ax ≤ X ≤ bx ] and [ay ≤ Y ≤ by ]) = f (x, y )dy dx (6)
ax ay
RR
where f (x, y )dy dx denotes the multiple integral of the pdf f (x, y ).

Defining z = (x, y ), we can still define the cdf:

F (z) = P(X ≤ x and Y ≤ y )


Z x Z y
(7)
= f (u, v )dv du
−∞ −∞

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 20


Bivariate Normal Probability Calculations

Normal Distribution Functions (Bivariate)


Helpful figures of bivariate normal pdf and cdf:

1.0
0.12
4

4
0.10

0.8
2

2
0.08

0.6

F (x, y )
f (x, y )
0.06
y

y
0

0.4
0.04
−2

−2

0.2
0.02
−4

−4
0.00

0.0
−4 −2 0 2 4 −4 −2 0 2 4
x x


Note: µx = µy = 0, σx2 = 1, σy2 = 2, and ρ = 0.6/ 2

Note that the cdf still has an ogive shape (now in two-dimensions).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 21
Bivariate Normal Affine Transformations

Affine Transformations of Normal (Bivariate)

Given z = (x, y )0 , suppose that z ∼ N(µ, Σ) where


µ = (µx , µy )0 is the 2 × 1 mean vector
 2 
σx ρσx σy
Σ= is the 2 × 2 covariance matrix
ρσx σy σy2

     
a11 a12 b1 0 0
Let A = and b = with A 6= 02×2 = .
a21 a22 b2 0 0

If we define w = Az + b, then w ∼ N(Aµ + b, AΣA0 ).

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 22


Bivariate Normal Conditional Distributions

Conditional Normal (Bivariate)

The conditional distribution of a variable Y given X = x is

fXY (x, y )
fY |X (y |X = x) = (8)
fX (x)

where
fXY (x, y ) is the joint pdf of X and Y
fX (x) is the marginal pdf of X

In the bivariate normal case, we have that

Y |X ∼ N(µ∗ , σ∗2 ) (9)


σ
where µ∗ = µy + ρ σyx (x − µx ) and σ∗2 = σy2 (1 − ρ2 )

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 23


Bivariate Normal Conditional Distributions

Derivation of Conditional Normal


To prove Equation (9), simply write out the definition and simplify:
fXY (x, y )
fY |X (y |X = x) =
fX (x)
( " #)
(x−µx )2 (y −µy )2 2ρ(x−µx )(y −µy )
 p 
exp − 1 + − / 2πσx σy 1 − ρ2
2(1−ρ2 ) σx2 σy2 σx σy
=
√ 
  
(x−µx )2
exp − 2 / σx 2π
2σx
( " # )
1 (x−µx )2 (y −µy )2 2ρ(x−µx )(y −µy ) (x−µx )2
exp − + − +
2(1−ρ2 ) σx2 σy2 σx σy 2σx2
= √ p
2πσy 1 − ρ2
σ2
( " #)
σ
exp − 1 ρ2 y2 (x − µx )2 + (y − µy )2 − 2ρ σy (x − µx )(y − µy )
2σy2 (1−ρ2 ) σx x
= √ p
2πσy 1 − ρ2
( )
h σ
i2
exp − 2 1 2 y − µy − ρ σy (x − µx )
2σy (1−ρ ) x
= √ p
2πσy 1 − ρ2

which completes the proof.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 24


Bivariate Normal Conditional Distributions

Statistical Independence for Bivariate Normal


Two variables X and Y are statistically independent if

fXY (x, y ) = fX (x)fY (y ) (10)

where fXY (x, y ) is joint pdf, and fX (x) and fY (y ) are marginals pdfs.

Note that if X and Y are independent, then

fXY (x, y ) fX (x)fY (y )


fY |X (y |X = x) = = = fY (y ) (11)
fX (x) fX (x)

so conditioning on X = x does not change the distribution of Y .

If X and Y are bivariate normal, what is the necessary and sufficient


condition for X and Y to be independent? Hint: see Equation (9)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 25
Bivariate Normal Conditional Distributions

Example #1
A statistics class takes two exams X (Exam 1) and Y (Exam 2) where
the scores follow a bivariate normal distribution with parameters:
µx = 70 and µy = 60 are the marginal means
σx = 10 and σy = 15 are the marginal standard deviations
ρ = 0.6 is the correlation coefficient

Suppose we select a student at random. What is the probability that. . .


(a) the student scores over 75 on Exam 2?
(b) the student scores over 75 on Exam 2, given that the student
scored X = 80 on Exam 1?
(c) the sum of his/her Exam 1 and Exam 2 scores is over 150?
(d) the student did better on Exam 1 than Exam 2?
(e) P(5X − 4Y > 150)?
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 26
Bivariate Normal Conditional Distributions

Example #1: Part (a)


Answer for 1(a):
Note that Y ∼ N(60, 152 ), so the probability that the student scores
over 75 on Exam 2 is
 
75 − 60
P(Y > 75) = P Z >
15
= P(Z > 1)
= 1 − P(Z < 1)
= 1 − Φ(1)
= 1 − 0.8413447
= 0.1586553
Rx 2
where Φ(x) = −∞ f (z)dz with f (x) = √1 e−x /2 denoting the standard

normal pdf (see R code for use of pnorm to calculate this quantity).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 27
Bivariate Normal Conditional Distributions

Example #1: Part (b)


Answer for 1(b):
Note that (Y |X = 80) ∼ N(µ∗ , σ∗2 ) where
µ∗ = µY + ρ σσYX (x − µX ) = 60 + (0.6)(15/10)(80 − 70) = 69
σ∗2 = σY2 (1 − ρ2 ) = 152 (1 − 0.62 ) = 144

If a student scored X = 80 on Exam 1, the probability that the student


scores over 75 on Exam 2 is
 
75 − 69
P(Y > 75|X = 80) = P Z >
12
= P (Z > 0.5)
= 1 − Φ(0.5)
= 1 − 0.6914625
= 0.3085375

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 28


Bivariate Normal Conditional Distributions

Example #1: Part (c)


Answer for 1(c):
Note that (X + Y ) ∼ N(µ∗ , σ∗2 ) where
µ∗ = µX + µY = 70 + 60 = 130
σ∗2 = σX2 + σY2 + 2ρσX σY = 102 + 152 + 2(0.6)(10)(15) = 505

The probability that the sum of Exam 1 and Exam 2 is above 150 is
 
150 − 130
P(X + Y > 150) = P Z > √
505
= P (Z > 0.8899883)
= 1 − Φ(0.8899883)
= 1 − 0.8132639
= 0.1867361

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 29


Bivariate Normal Conditional Distributions

Example #1: Part (d)


Answer for 1(d):
Note that (X − Y ) ∼ N(µ∗ , σ∗2 ) where
µ∗ = µX − µY = 70 − 60 = 10
σ∗2 = σX2 + σY2 − 2ρσX σY = 102 + 152 − 2(0.6)(10)(15) = 145

The probability that the student did better on Exam 1 than Exam 2 is
P(X > Y ) = P(X − Y > 0)
 
0 − 10
=P Z > √
145
= P (Z > −0.8304548)
= 1 − Φ(−0.8304548)
= 1 − 0.2031408
= 0.7968592

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 30


Bivariate Normal Conditional Distributions

Example #1: Part (e)


Answer for 1(e):
Note that (5X − 4Y ) ∼ N(µ∗ , σ∗2 ) where
µ∗ = 5µX − 4µY = 5(70) − 4(60) = 110
σ∗2 = 52 σX2 + (−4)2 σY2 + 2(5)(−4)ρσX σY =
25(102 ) + 16(152 ) − 2(20)(0.6)(10)(15) = 2500

Thus, the needed probability can be obtained using


 
150 − 110
P(5X − 4Y > 150) = P Z > √
2500
= P (Z > 0.8)
= 1 − Φ(0.8)
= 1 − 0.7881446
= 0.2118554

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 31


Bivariate Normal Conditional Distributions

Example #1: R Code

# Example 1a # Example 1d
> pnorm(1,lower=F) > pnorm(-10/sqrt(145),lower=F)
[1] 0.1586553 [1] 0.7968592
> pnorm(75,mean=60,sd=15,lower=F) > pnorm(0,mean=10,sd=sqrt(145),lower=F)
[1] 0.1586553 [1] 0.7968592

# Example 1b # Example 1e
> pnorm(0.5,lower=F) > pnorm(0.8,lower=F)
[1] 0.3085375 [1] 0.2118554
> pnorm(75,mean=69,sd=12,lower=F) > pnorm(150,mean=110,sd=50,lower=F)
[1] 0.3085375 [1] 0.2118554

# Example 1c
> pnorm(20/sqrt(505),lower=F)
[1] 0.1867361
> pnorm(150,mean=130,sd=sqrt(505),lower=F)
[1] 0.1867361

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 32


Multivariate Normal

Multivariate Normal

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 33


Multivariate Normal Distribution Form

Normal Density Function (Multivariate)

Given x = (x1 , . . . , xp )0 with xj ∈ R ∀j, the multivariate normal pdf is


 
1 1 0 −1
f (x) = exp − (x − µ) Σ (x − µ) (12)
(2π)p/2 |Σ|1/2 2

where
µ = (µ1 , . . . , µp )0 is the p × 1 mean vector
 
σ11 σ12 · · · σ1p
σ21 σ22 · · · σ2p 
Σ= . ..  is the p × p covariance matrix
 
.. ..
 .. . . . 
σp1 σp2 · · · σpp

Write x ∼ N(µ, Σ) or x ∼ Np (µ, Σ) to denote x is multivariate normal.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 34


Multivariate Normal Distribution Form

Some Multivariate Normal Properties

The mean and covariance parameters have the following restrictions:


µj ∈ R for all j
σjj > 0 for all j

σij = ρij σii σjj where ρij is correlation between Xi and Xj
σij2 ≤ σii σjj for any i, j ∈ {1, . . . , p} (Cauchy-Schwarz)

Σ is assumed to be positive definite so that Σ−1 exists.

Marginals are normal: Xj ∼ N(µj , σjj ) for all j ∈ {1, . . . , p}.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 35


Multivariate Normal Probability Calculations

Multivariate Normal Probabilities

Probabilities still relate to the area under the pdf:


Z b1 Z bp
P(aj ≤ Xj ≤ bj ∀j) = ··· f (x)dxp · · · dx1 (13)
a1 ap
R R
where ··· f (x)dxp · · · dx1 denotes the multiple integral f (x).

We can still define the cdf of x = (x1 , . . . , xp )0

F (x) = P(Xj ≤ xj ∀j)


Z x1 Z xp
(14)
= ··· f (u)dup · · · du1
−∞ −∞

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 36


Multivariate Normal Affine Transformations

Affine Transformations of Normal (Multivariate)

Suppose that x = (x1 , . . . , xp )0 and that x ∼ N(µ, Σ) where


µ = {µj }p×1 is the mean vector
Σ = {σij }p×p is the covariance matrix

Let A = {aij }n×p and b = {bi }n×1 with A 6= 0n×p .

If we define w = Ax + b, then w ∼ N(Aµ + b, AΣA0 ).

Note: linear combinations of normal variables are normally distributed.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 37


Multivariate Normal Conditional Distributions

Multivariate Conditional Distributions

Given variables x = (x1 , . . . , xp )0 and y = (y1 , . . . , yq )0 , we have

fXY (x, y)
fY |X (y|X = x) = (15)
fX (x)

where
fY |X (y|X = x) is the conditional distribution of y given x
fXY (x, y) is the joint pdf of x and y
fX (x) is the marginal pdf of x

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 38


Multivariate Normal Conditional Distributions

Conditional Normal (Multivariate)


Suppose that z ∼ N(µ, Σ) where
z = (x0 , y0 )0 = (x1 , . . . , xp , y1 , . . . , yq )0
µ = (µ0x , µ0y )0 = (µ1x , . . . , µpx , µ1y , . . . , µqy )0
Note: µx is mean vector of x, and µy is mean vector of y
 
Σxx Σxy
Σ= where (Σxx )p×p , (Σyy )q×q , and (Σxy )p×q ,
Σ0xy Σyy
Note: Σxx is covariance matrix of x, Σyy is covariance matrix of y,
and Σxy is covariance matrix of x and y

In the multivariate normal case, we have that

y|x ∼ N(µ∗ , Σ∗ ) (16)

where µ∗ = µy + Σ0xy Σ−1 0 −1


xx (x − µx ) and Σ∗ = Σyy − Σxy Σxx Σxy
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 39
Multivariate Normal Conditional Distributions

Statistical Independence for Multivariate Normal

Using Equation (16), we have that

y|x ∼ N(µ∗ , Σ∗ ) ≡ N(µy , Σyy ) (17)

if and only if Σxy = 0p×q (a matrix of zeros).

Note that Σxy = 0p×q implies that the p elements of x are uncorrelated
with the q elements of y.
For multivariate normal variables: uncorrelated → independent
For non-normal variables: uncorrelated 6→ independent

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 40


Multivariate Normal Conditional Distributions

Example #2
Each Delicious Candy Company store makes 3 size candy bars:
regular (X1 ), fun size (X2 ), and big size (X3 ).

Assume the weight (in ounces) of the candy bars (X1 , X2 , X3 ) follow a
multivariate normal distribution with parameters:
   
5 4 −1 0
µ = 3 and Σ = −1 4 2
7 0 2 9

Suppose we select a store at random. What is the probability that. . .


(a) the weight of a regular candy bar is greater than 8 oz?
(b) the weight of a regular candy bar is greater than 8 oz, given that
the fun size bar weighs 1 oz and the big size bar weighs 10 oz?
(c) P(4X1 − 3X2 + 5X3 < 63)?
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 41
Multivariate Normal Conditional Distributions

Example #2: Part (a)

Answer for 2(a):


Note that X1 ∼ N(5, 4)

So, the probability that the regular bar is more than 8 oz is


 
8−5
P(X1 > 8) = P Z >
2
= P(Z > 1.5)
= 1 − Φ(1.5)
= 1 − 0.9331928
= 0.0668072

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 42


Multivariate Normal Conditional Distributions

Example #2: Part (b)


Answer for 2(b):
(X1 |X2 = 1, X3 = 10) is normally distributed, see Equation (16).

The conditional mean of (X1 |X2 = 1, X3 = 10) is given by

µ∗ = µX1 + Σ012 Σ−1


22 (x̃ − µ̃)
 4 2 −1 1 − 3
   
= 5 + −1 0
2 9 10 − 7
  
 1 9 −2 −2
= 5 + −1 0
32 −2 4 3
= 5 + 24/32
= 5.75

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 43


Multivariate Normal Conditional Distributions

Example #2: Part (b) continued

Answer for 2(b) continued:


The conditional variance of (X1 |X2 = 1, X3 = 10) is given by

σ∗2 = σX2 1 − Σ012 Σ−1


22 Σ12
 4 2 −1 −1
   
= 4 − −1 0
2 9 0
  
 1 9 −2 −1
= 4 − −1 0
32 −2 4 0
= 4 − 9/32
= 3.71875

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 44


Multivariate Normal Conditional Distributions

Example #2: Part (b) continued

Answer for 2(b) continued:


So, if the fun size bar weighs 1 oz and the big size bar weighs 10 oz,
the probability that the regular bar is more than 8 oz is
 
8 − 5.75
P(X1 > 8|X2 = 1, X3 = 10) = P Z > √
3.71875
= P(Z > 1.166767)
= 1 − Φ(1.166767)
= 1 − 0.8783477
= 0.1216523

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 45


Multivariate Normal Conditional Distributions

Example #2: Part (c)

Answer for 2(c):


(4X1 − 3X2 + 5X3 ) is normally distributed.

The expectation of (4X1 − 3X2 + 5X3 ) is given by

µ∗ = 4µX1 − 3µX2 + 5µX3


= 4(5) − 3(3) + 5(7)
= 46

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 46


Multivariate Normal Conditional Distributions

Example #2: Part (c) continued

Answer for 2(c) continued:


The variance of (4X1 − 3X2 + 5X3 ) is given by
 
4
2

σ∗ = 4 −3 5 Σ −3 
5
  
 4 −1 0 4
= 4 −3 5  −1 4 2  −3
0 2 9 5
 
 19
= 4 −3 5 −6
39
= 289

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 47


Multivariate Normal Conditional Distributions

Example #2: Part (c) continued

Answer for 2(c) continued:


So, the needed probability can be obtained as
 
63 − 46
P(4X1 − 3X2 + 5X3 < 63) = P Z < √
289
= P(Z < 1)
= Φ(1)
= 0.8413447

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 48


Multivariate Normal Conditional Distributions

Example #2: R Code


# Example 2a
> pnorm(1.5,lower=F)
[1] 0.0668072
> pnorm(8,mean=5,sd=2,lower=F)
[1] 0.0668072

# Example 2b
> pnorm(2.25/sqrt(119/32),lower=F)
[1] 0.1216523
> pnorm(8,mean=5.75,sd=sqrt(119/32),lower=F)
[1] 0.1216523

# Example 2c
> pnorm(1)
[1] 0.8413447
> pnorm(63,mean=46,sd=17)
[1] 0.8413447
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 49
Multivariate Normal Parameter Estimation

Likelihood Function

Suppose that xi = (xi1 , . . . , xip ) is a sample from a normal distribution


iid
with mean vector µ and covariance matrix Σ, i.e., xi ∼ N(µ, Σ).

The likelihood function for the parameters (given the data) has the form
n n  
Y Y 1 1
L(µ, Σ|X) = f (xi ) = exp − (xi − µ)0 Σ−1 (xi − µ)
(2π)p/2 |Σ|1/2 2
i=1 i=1

and the log-likelihood function is given by


n
np n 1X
LL(µ, Σ|X) = − log(2π) − log(|Σ|) − (xi − µ)0 Σ−1 (xi − µ)
2 2 2
i=1

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 50


Multivariate Normal Parameter Estimation

Maximum Likelihood Estimate of Mean Vector

The MLE of the mean vector is the value of µ that minimizes


n
X n
X
(xi − µ)0 Σ−1 (xi − µ) = x0i Σ−1 xi − 2nx̄0 Σ−1 µ + nµ0 Σ−1 µ
i=1 i=1
Pn
where x̄ = (1/n) i=1 xi is the sample mean vector.

Taking the derivative with respect to µ we find that

∂LL(µ, Σ|X)
= −2nΣ−1 x̄ + 2nΣ−1 µ ←→ x̄ = µ̂
∂µ

The sample mean vector x̄ is the MLE of the population mean µ vector.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 51


Multivariate Normal Parameter Estimation

Maximum Likelihood Estimate of Covariance Matrix


The MLE of the covariance matrix is the value of Σ that minimizes
n
X
−n log(|Σ−1 |) + tr{Σ−1 (xi − µ̂)(xi − µ̂)0 }
i=1
Pn
where µ̂ = x̄ = (1/n) i=1 xi is the sample mean.

Taking the derivative with respect to Σ−1 we find that


n
∂LL(µ, Σ|X) X
= −nΣ + (xi − µ̂)(xi − µ̂)0
∂Σ−1 i=1

i.e., the sample covariance matrix Σ̂ = (1/n) ni=1 (xi − x̄)(xi − x̄)0 is
P
the MLE of the population covariance matrix Σ.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 52
Sampling Distributions

Sampling Distributions

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 53


Sampling Distributions Univariate Case

Univariate Sampling Distributions: x̄ and s2

In the univariate normal case, we have that


x̄ = (1/n) ni=1 xi ∼ N(µ, σ 2 /n)
P

(n − 1)s2 = ni=1 (xi − x̄)2 ∼ σ 2 χ2n−1


P

χ2k denotes a chi-square variable with k degrees of freedom.

Pk iid
σ 2 χ2k = 2
i=1 zi where zi ∼ N(0, σ 2 )

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 54


Sampling Distributions Multivariate Case

Multivariate Sampling Distributions: x̄ and S

In the multivariate normal case, we have that


x̄ = (1/n) ni=1 xi ∼ N(µ, Σ/n)
P

(n − 1)S = ni=1 (xi − x̄)(xi − x̄)0 ∼ Wn−1 (Σ)


P

Wk (Σ) denotes a Wishart variable with k degrees of freedom.

Pk 0 iid
Wk (Σ) = i=1 zi zi where zi ∼ N(0p , Σ)

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 55

You might also like