Introduction To Normal Distribution: Nathaniel E. Helwig

Introduction to Normal Distribution
Nathaniel E. Helwig
Assistant Professor of Psychology and Statistics

University of Minnesota (Twin Cities)
Updated 17-Jan-2017
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 1

Copyright
Copyright
c 2017 by Nathaniel E. Helwig

Outline of Notes
1) Univariate Normal: 3) Multivariate Normal:
Distribution form Distribution form
Standard normal Probability calculations
Probability calculations Affine transformations
Affine transformations Conditional distributions
Parameter estimation Parameter estimation
2) Bivariate Normal: 4) Sampling Distributions:

Distribution form Univariate case
Probability calculations Multivariate case
Affine transformations
Conditional distributions
Univariate Normal
Univariate Normal

Univariate Normal Distribution Form
Normal Density Function (Univariate)
Given a variable x ∈ R, the normal probability density function (pdf) is
1 (x−µ)2
−
f (x) = √ e 2σ2
σ 2π
(1)
(x − µ)2

1
= √ exp −
σ 2π 2σ 2
where
µ ∈ R is the mean
σ > 0 is the standard deviation (σ 2 is the variance)
e ≈ 2.71828 is base of the natural logarithm
Write X ∼ N(µ, σ 2 ) to denote that X follows a normal distribution.

Univariate Normal Standard Normal
Standard Normal Distribution

If X ∼ N(0, 1), then X follows a standard normal distribution:
1 2
f (x) = √ e−x /2 (2)
2π
0.4
f (x )
0.2 0.0
−4 −2 0 2 4
x
Univariate Normal Probability Calculations
Probabilities and Distribution Functions
Probabilities relate to the area under the pdf:

Z b
P(a ≤ X ≤ b) = f (x)dx
a (3)
= F (b) − F (a)
where Z x
F (x) = f (u)du (4)
−∞
is the cumulative distribution function (cdf).
Note: F (x) = P(X ≤ x) =⇒ 0 ≤ F (x) ≤ 1

Normal Probabilities
Helpful figure of normal probabilities:

0.4
0.3
0.2
34.1% 34.1%
0.1
2.1% 2.1%
0.1% 13.6% 13.6% 0.1%
0.0
−3σ −2σ −1σ µ 1σ 2σ 3σ
From http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg

Normal Distribution Functions (Univariate)
Helpful figures of normal pdfs and cdfs:

1.0 1.0
μ = 0, σ 2 = 0.2, μ = 0, σ 2 = 0.2,
μ = 0, σ 2 = 1.0, μ = 0, σ 2 = 1.0,
0.8 μ = 0, σ 2 = 5.0, 0.8 μ = 0, σ 2 = 5.0,
μ = −2, σ 2 = 0.5, μ = −2, σ 2 = 0.5,
0.6 0.6
Φμ,σ (x)
φμ,σ (x)
2
2
0.4 -3 -2 -1 0.4 -3 -2 -1
0.2 0.2
0.0 0.0
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x
http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg http://en.wikipedia.org/wiki/File:Normal_Distribution_CDF.svg
Note that the cdf has an elongated “S” shape, referred to as an ogive.

Univariate Normal Affine Transformations
Affine Transformations of Normal (Univariate)
Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0.
If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ).
Suppose that X ∼ N(1, 2). Determine the distributions of. . .

Y =X +3
Y = 2X + 3
Y = 3X + 2

Univariate Normal Affine Transformations
Affine Transformations of Normal (Univariate)
Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0.
If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ).
Suppose that X ∼ N(1, 2). Determine the distributions of. . .

Y =X +3 =⇒ Y ∼ N(1(1) + 3, 12 (2)) ≡ N(4, 2)
Y = 2X + 3 =⇒ Y ∼ N(2(1) + 3, 22 (2)) ≡ N(5, 8)
Y = 3X + 2 =⇒ Y ∼ N(3(1) + 2, 32 (2)) ≡ N(5, 18)

Univariate Normal Parameter Estimation
Likelihood Function
Suppose that x = (x1 , . . . , xn ) is an iid sample of data from a normal

iid
distribution with mean µ and variance σ 2 , i.e., xi ∼ N(µ, σ 2 ).
The likelihood function for the parameters (given the data) has the form
n n
(xi − µ)2

2
Y Y 1
L(µ, σ |x) = f (xi ) = √ exp −
2πσ 2 2σ 2
i=1 i=1
and the log-likelihood function is given by

n n
X n n 1 X
LL(µ, σ 2 |x) = log(f (xi )) = − log(2π)− log(σ 2 )− 2 (xi −µ)2
2 2 2σ
i=1 i=1

Maximum Likelihood Estimate of the Mean
The MLE of the mean is the value of µ that minimizes

n
X n
X
2
(xi − µ) = xi2 − 2nx̄µ + nµ2
i=1 i=1
Pn
where x̄ = (1/n) i=1 xi is the sample mean.
Taking the derivative with respect to µ we find that
∂ ni=1 (xi − µ)2

P
= −2nx̄ + 2nµ ←→ x̄ = µ̂
∂µ
i.e., the sample mean x̄ is the MLE of the population mean µ.

Maximum Likelihood Estimate of the Variance

The MLE of the variance is the value of σ 2 that minimizes
n Pn
n 2 1 X 2 n 2 x 2 nx̄ 2
log(σ ) + 2 (xi − µ̂) = log(σ ) + i=12 i − 2
2 2σ 2 2σ 2σ
i=1
Pn
where x̄ = (1/n) i=1 xi is the sample mean.
Taking the derivative with respect to σ 2 we find that

Pn n
∂ n2 log(σ 2 ) + 1
2σ 2 i=1 (xi − µ̂)2 n 1 X
= − (xi − µ̂)2
∂σ 2 2σ 2 2σ 4
i=1
Pn
which implies that the sample variance σ̂ 2 = (1/n) i=1 (xi − x̄)2 is the
MLE of the population variance σ 2 .
Bivariate Normal
Bivariate Normal

Bivariate Normal Distribution Form
Normal Density Function (Bivariate)
Given two variables x, y ∈ R, the bivariate normal pdf is

(y −µy )2
n h io
1 (x−µx )2 2ρ(x−µx )(y −µy )
exp − 2(1−ρ 2) 2
σx
+ σ 2 − σx σy
f (x, y ) = p y (5)
2πσx σy 1 − ρ 2
where
µx ∈ R and µy ∈ R are the marginal means
σx ∈ R+ and σy ∈ R+ are the marginal standard deviations
0 ≤ |ρ| < 1 is the correlation coefficient
X and Y are marginally normal: X ∼ N(µx , σx2 ) and Y ∼ N(µy , σy2 )

√
Example: µx = µy = 0, σx2 = 1, σy2 = 2, ρ = 0.6/ 2
0.12
4
0.10
0.12
0.08
0.10
f (x, y )
0.08
0.06
y
0
f (x, y )
0.06
0.04
0.04
−2
4
0.02
2
0.02
−4
0
−4
−2 −2 y
0.00
0
2 −4 −4 −2 0 2 4
4
x x
http://en.wikipedia.org/wiki/File:MultivariateNormal.png
Example: Different Means
µx = 0, µy = 0 µx = 1, µy = 2 µx = −1, µy = −1
0.12
0.12
0.12
4
4
0.10
0.10
0.10
2
2
0.08
0.08
0.08
f (x, y )
f (x, y )
f (x, y )
0.06
0.06
0.06
y
y
0
0
0.04
0.04
0.04
−2
−2
−2
0.02
0.02
0.02
−4
−4
−4
0.00
0.00
0.00
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x
√
Note: for all three plots σx2 = 1, σy2 = 2, and ρ = 0.6/ 2.

Example: Different Correlations
ρ = −0.6/ 2 ρ=0 ρ = 1.2/ 2

0.12
0.20
0.10
4
4
0.10
0.08
0.15
2
2
0.08
f (x, y )
f (x, y )
f (x, y )
0.06
0.06
y
0.10
0
0
0.04
0.04
−2
−2
−2
0.05
0.02
0.02
−4
−4
−4
0.00
0.00
0.00
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x
Note: for all three plots µx = µy = 0, σx2 = 1, and σy2 = 2.

Example: Different Variances
σy = 1 σy = 2 σy = 2
0.12
0.08
4
4
0.10
0.15
0.06
2
2
0.08
f (x, y )
f (x, y )
f (x, y )
0.10
0.06
0.04
y
y
0
0
0.04
−2
−2
−2
0.05
0.02
0.02
−4
−4
−4
0.00
0.00
0.00
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x
Note: for all three plots µx = µy = 0, σx2 = 1, and ρ = 0.6/(σx σy ).

Bivariate Normal Probability Calculations
Probabilities and Multiple Integration
Probabilities still relate to the area under the pdf:

Z bx Z by
P([ax ≤ X ≤ bx ] and [ay ≤ Y ≤ by ]) = f (x, y )dy dx (6)
ax ay
RR
where f (x, y )dy dx denotes the multiple integral of the pdf f (x, y ).
Defining z = (x, y ), we can still define the cdf:
F (z) = P(X ≤ x and Y ≤ y )

Z x Z y
(7)
= f (u, v )dv du
−∞ −∞

Bivariate Normal Probability Calculations
Normal Distribution Functions (Bivariate)

Helpful figures of bivariate normal pdf and cdf:
1.0
0.12
4
4
0.10
0.8
2
2
0.08
0.6
F (x, y )
f (x, y )
0.06
y
y
0
0.4
0.04
−2
−2
0.2
0.02
−4
−4
0.00
0.0
−4 −2 0 2 4 −4 −2 0 2 4
x x
√
Note: µx = µy = 0, σx2 = 1, σy2 = 2, and ρ = 0.6/ 2
Note that the cdf still has an ogive shape (now in two-dimensions).
Bivariate Normal Affine Transformations
Affine Transformations of Normal (Bivariate)
Given z = (x, y )0 , suppose that z ∼ N(µ, Σ) where

µ = (µx , µy )0 is the 2 × 1 mean vector
2
σx ρσx σy
Σ= is the 2 × 2 covariance matrix
ρσx σy σy2

a11 a12 b1 0 0
Let A = and b = with A 6= 02×2 = .
a21 a22 b2 0 0
If we define w = Az + b, then w ∼ N(Aµ + b, AΣA0 ).

Bivariate Normal Conditional Distributions
Conditional Normal (Bivariate)
The conditional distribution of a variable Y given X = x is
fXY (x, y )
fY |X (y |X = x) = (8)
fX (x)
where
fXY (x, y ) is the joint pdf of X and Y
fX (x) is the marginal pdf of X
In the bivariate normal case, we have that
Y |X ∼ N(µ∗ , σ∗2 ) (9)

σ
where µ∗ = µy + ρ σyx (x − µx ) and σ∗2 = σy2 (1 − ρ2 )

Derivation of Conditional Normal

To prove Equation (9), simply write out the definition and simplify:
fXY (x, y )
fY |X (y |X = x) =
fX (x)
( " #)
(x−µx )2 (y −µy )2 2ρ(x−µx )(y −µy )
p
exp − 1 + − / 2πσx σy 1 − ρ2
2(1−ρ2 ) σx2 σy2 σx σy
=
√

(x−µx )2
exp − 2 / σx 2π
2σx
( " # )
1 (x−µx )2 (y −µy )2 2ρ(x−µx )(y −µy ) (x−µx )2
exp − + − +
2(1−ρ2 ) σx2 σy2 σx σy 2σx2
= √ p
2πσy 1 − ρ2
σ2
( " #)
σ
exp − 1 ρ2 y2 (x − µx )2 + (y − µy )2 − 2ρ σy (x − µx )(y − µy )
2σy2 (1−ρ2 ) σx x
= √ p
2πσy 1 − ρ2
( )
h σ
i2
exp − 2 1 2 y − µy − ρ σy (x − µx )
2σy (1−ρ ) x
= √ p
2πσy 1 − ρ2
which completes the proof.

Statistical Independence for Bivariate Normal

Two variables X and Y are statistically independent if
fXY (x, y ) = fX (x)fY (y ) (10)
where fXY (x, y ) is joint pdf, and fX (x) and fY (y ) are marginals pdfs.
Note that if X and Y are independent, then
fXY (x, y ) fX (x)fY (y )

fY |X (y |X = x) = = = fY (y ) (11)
fX (x) fX (x)
so conditioning on X = x does not change the distribution of Y .
If X and Y are bivariate normal, what is the necessary and sufficient

condition for X and Y to be independent? Hint: see Equation (9)
Example #1
A statistics class takes two exams X (Exam 1) and Y (Exam 2) where
the scores follow a bivariate normal distribution with parameters:
µx = 70 and µy = 60 are the marginal means
σx = 10 and σy = 15 are the marginal standard deviations
ρ = 0.6 is the correlation coefficient
Suppose we select a student at random. What is the probability that. . .

(a) the student scores over 75 on Exam 2?
(b) the student scores over 75 on Exam 2, given that the student
scored X = 80 on Exam 1?
(c) the sum of his/her Exam 1 and Exam 2 scores is over 150?
(d) the student did better on Exam 1 than Exam 2?
(e) P(5X − 4Y > 150)?
Example #1: Part (a)

Answer for 1(a):
Note that Y ∼ N(60, 152 ), so the probability that the student scores
over 75 on Exam 2 is

75 − 60
P(Y > 75) = P Z >
15
= P(Z > 1)
= 1 − P(Z < 1)
= 1 − Φ(1)
= 1 − 0.8413447
= 0.1586553
Rx 2
where Φ(x) = −∞ f (z)dz with f (x) = √1 e−x /2 denoting the standard
2π
normal pdf (see R code for use of pnorm to calculate this quantity).
Example #1: Part (b)

Answer for 1(b):
Note that (Y |X = 80) ∼ N(µ∗ , σ∗2 ) where
µ∗ = µY + ρ σσYX (x − µX ) = 60 + (0.6)(15/10)(80 − 70) = 69
σ∗2 = σY2 (1 − ρ2 ) = 152 (1 − 0.62 ) = 144
If a student scored X = 80 on Exam 1, the probability that the student

scores over 75 on Exam 2 is

75 − 69
P(Y > 75|X = 80) = P Z >
12
= P (Z > 0.5)
= 1 − Φ(0.5)
= 1 − 0.6914625
= 0.3085375

Example #1: Part (c)

Answer for 1(c):
Note that (X + Y ) ∼ N(µ∗ , σ∗2 ) where
µ∗ = µX + µY = 70 + 60 = 130
σ∗2 = σX2 + σY2 + 2ρσX σY = 102 + 152 + 2(0.6)(10)(15) = 505
The probability that the sum of Exam 1 and Exam 2 is above 150 is

150 − 130
P(X + Y > 150) = P Z > √
505
= P (Z > 0.8899883)
= 1 − Φ(0.8899883)
= 1 − 0.8132639
= 0.1867361

Example #1: Part (d)

Answer for 1(d):
Note that (X − Y ) ∼ N(µ∗ , σ∗2 ) where
µ∗ = µX − µY = 70 − 60 = 10
σ∗2 = σX2 + σY2 − 2ρσX σY = 102 + 152 − 2(0.6)(10)(15) = 145
The probability that the student did better on Exam 1 than Exam 2 is
P(X > Y ) = P(X − Y > 0)

0 − 10
=P Z > √
145
= P (Z > −0.8304548)
= 1 − Φ(−0.8304548)
= 1 − 0.2031408
= 0.7968592

Example #1: Part (e)

Answer for 1(e):
Note that (5X − 4Y ) ∼ N(µ∗ , σ∗2 ) where
µ∗ = 5µX − 4µY = 5(70) − 4(60) = 110
σ∗2 = 52 σX2 + (−4)2 σY2 + 2(5)(−4)ρσX σY =
25(102 ) + 16(152 ) − 2(20)(0.6)(10)(15) = 2500
Thus, the needed probability can be obtained using

150 − 110
P(5X − 4Y > 150) = P Z > √
2500
= P (Z > 0.8)
= 1 − Φ(0.8)
= 1 − 0.7881446
= 0.2118554

Example #1: R Code
# Example 1a # Example 1d
> pnorm(1,lower=F) > pnorm(-10/sqrt(145),lower=F)
[1] 0.1586553 [1] 0.7968592
> pnorm(75,mean=60,sd=15,lower=F) > pnorm(0,mean=10,sd=sqrt(145),lower=F)
[1] 0.1586553 [1] 0.7968592
# Example 1b # Example 1e
> pnorm(0.5,lower=F) > pnorm(0.8,lower=F)
[1] 0.3085375 [1] 0.2118554
> pnorm(75,mean=69,sd=12,lower=F) > pnorm(150,mean=110,sd=50,lower=F)
[1] 0.3085375 [1] 0.2118554
# Example 1c
> pnorm(20/sqrt(505),lower=F)
[1] 0.1867361
> pnorm(150,mean=130,sd=sqrt(505),lower=F)
[1] 0.1867361

Multivariate Normal
Multivariate Normal

Multivariate Normal Distribution Form
Normal Density Function (Multivariate)
Given x = (x1 , . . . , xp )0 with xj ∈ R ∀j, the multivariate normal pdf is

1 1 0 −1
f (x) = exp − (x − µ) Σ (x − µ) (12)
(2π)p/2 |Σ|1/2 2
where
µ = (µ1 , . . . , µp )0 is the p × 1 mean vector
 
σ11 σ12 · · · σ1p
σ21 σ22 · · · σ2p 
Σ= . ..  is the p × p covariance matrix
 
.. ..
 .. . . . 
σp1 σp2 · · · σpp
Write x ∼ N(µ, Σ) or x ∼ Np (µ, Σ) to denote x is multivariate normal.

Multivariate Normal Distribution Form
Some Multivariate Normal Properties
The mean and covariance parameters have the following restrictions:

µj ∈ R for all j
σjj > 0 for all j
√
σij = ρij σii σjj where ρij is correlation between Xi and Xj
σij2 ≤ σii σjj for any i, j ∈ {1, . . . , p} (Cauchy-Schwarz)
Σ is assumed to be positive definite so that Σ−1 exists.
Marginals are normal: Xj ∼ N(µj , σjj ) for all j ∈ {1, . . . , p}.

Multivariate Normal Probability Calculations
Multivariate Normal Probabilities
Probabilities still relate to the area under the pdf:

Z b1 Z bp
P(aj ≤ Xj ≤ bj ∀j) = ··· f (x)dxp · · · dx1 (13)
a1 ap
R R
where ··· f (x)dxp · · · dx1 denotes the multiple integral f (x).
We can still define the cdf of x = (x1 , . . . , xp )0
F (x) = P(Xj ≤ xj ∀j)

Z x1 Z xp
(14)
= ··· f (u)dup · · · du1
−∞ −∞

Multivariate Normal Affine Transformations
Affine Transformations of Normal (Multivariate)
Suppose that x = (x1 , . . . , xp )0 and that x ∼ N(µ, Σ) where

µ = {µj }p×1 is the mean vector
Σ = {σij }p×p is the covariance matrix
Let A = {aij }n×p and b = {bi }n×1 with A 6= 0n×p .
If we define w = Ax + b, then w ∼ N(Aµ + b, AΣA0 ).
Note: linear combinations of normal variables are normally distributed.

Multivariate Normal Conditional Distributions
Multivariate Conditional Distributions
Given variables x = (x1 , . . . , xp )0 and y = (y1 , . . . , yq )0 , we have
fXY (x, y)
fY |X (y|X = x) = (15)
fX (x)
where
fY |X (y|X = x) is the conditional distribution of y given x
fXY (x, y) is the joint pdf of x and y
fX (x) is the marginal pdf of x

Conditional Normal (Multivariate)

Suppose that z ∼ N(µ, Σ) where
z = (x0 , y0 )0 = (x1 , . . . , xp , y1 , . . . , yq )0
µ = (µ0x , µ0y )0 = (µ1x , . . . , µpx , µ1y , . . . , µqy )0
Note: µx is mean vector of x, and µy is mean vector of y

Σxx Σxy
Σ= where (Σxx )p×p , (Σyy )q×q , and (Σxy )p×q ,
Σ0xy Σyy
Note: Σxx is covariance matrix of x, Σyy is covariance matrix of y,
and Σxy is covariance matrix of x and y
In the multivariate normal case, we have that
y|x ∼ N(µ∗ , Σ∗ ) (16)
where µ∗ = µy + Σ0xy Σ−1 0 −1

xx (x − µx ) and Σ∗ = Σyy − Σxy Σxx Σxy
Statistical Independence for Multivariate Normal
Using Equation (16), we have that
y|x ∼ N(µ∗ , Σ∗ ) ≡ N(µy , Σyy ) (17)
if and only if Σxy = 0p×q (a matrix of zeros).
Note that Σxy = 0p×q implies that the p elements of x are uncorrelated
with the q elements of y.
For multivariate normal variables: uncorrelated → independent
For non-normal variables: uncorrelated 6→ independent

Example #2
Each Delicious Candy Company store makes 3 size candy bars:
regular (X1 ), fun size (X2 ), and big size (X3 ).
Assume the weight (in ounces) of the candy bars (X1 , X2 , X3 ) follow a
multivariate normal distribution with parameters:
   
5 4 −1 0
µ = 3 and Σ = −1 4 2
7 0 2 9
Suppose we select a store at random. What is the probability that. . .

(a) the weight of a regular candy bar is greater than 8 oz?
(b) the weight of a regular candy bar is greater than 8 oz, given that
the fun size bar weighs 1 oz and the big size bar weighs 10 oz?
(c) P(4X1 − 3X2 + 5X3 < 63)?
Example #2: Part (a)
Answer for 2(a):

Note that X1 ∼ N(5, 4)
So, the probability that the regular bar is more than 8 oz is

8−5
P(X1 > 8) = P Z >
2
= P(Z > 1.5)
= 1 − Φ(1.5)
= 1 − 0.9331928
= 0.0668072

Example #2: Part (b)

Answer for 2(b):
(X1 |X2 = 1, X3 = 10) is normally distributed, see Equation (16).
The conditional mean of (X1 |X2 = 1, X3 = 10) is given by
µ∗ = µX1 + Σ012 Σ−1

22 (x̃ − µ̃)
4 2 −1 1 − 3

= 5 + −1 0
2 9 10 − 7

1 9 −2 −2
= 5 + −1 0
32 −2 4 3
= 5 + 24/32
= 5.75

Example #2: Part (b) continued
Answer for 2(b) continued:

The conditional variance of (X1 |X2 = 1, X3 = 10) is given by
σ∗2 = σX2 1 − Σ012 Σ−1

22 Σ12
4 2 −1 −1

= 4 − −1 0
2 9 0

1 9 −2 −1
= 4 − −1 0
32 −2 4 0
= 4 − 9/32
= 3.71875

Example #2: Part (b) continued
Answer for 2(b) continued:

So, if the fun size bar weighs 1 oz and the big size bar weighs 10 oz,
the probability that the regular bar is more than 8 oz is

8 − 5.75
P(X1 > 8|X2 = 1, X3 = 10) = P Z > √
3.71875
= P(Z > 1.166767)
= 1 − Φ(1.166767)
= 1 − 0.8783477
= 0.1216523

Example #2: Part (c)
Answer for 2(c):

(4X1 − 3X2 + 5X3 ) is normally distributed.
The expectation of (4X1 − 3X2 + 5X3 ) is given by
µ∗ = 4µX1 − 3µX2 + 5µX3

= 4(5) − 3(3) + 5(7)
= 46

Example #2: Part (c) continued
Answer for 2(c) continued:

The variance of (4X1 − 3X2 + 5X3 ) is given by
 
4
2

σ∗ = 4 −3 5 Σ −3 
5
  
4 −1 0 4
= 4 −3 5  −1 4 2  −3
0 2 9 5
 
19
= 4 −3 5 −6
39
= 289

Example #2: Part (c) continued
Answer for 2(c) continued:

So, the needed probability can be obtained as

63 − 46
P(4X1 − 3X2 + 5X3 < 63) = P Z < √
289
= P(Z < 1)
= Φ(1)
= 0.8413447

Example #2: R Code

# Example 2a
> pnorm(1.5,lower=F)
[1] 0.0668072
> pnorm(8,mean=5,sd=2,lower=F)
[1] 0.0668072
# Example 2b
> pnorm(2.25/sqrt(119/32),lower=F)
[1] 0.1216523
> pnorm(8,mean=5.75,sd=sqrt(119/32),lower=F)
[1] 0.1216523
# Example 2c
> pnorm(1)
[1] 0.8413447
> pnorm(63,mean=46,sd=17)
[1] 0.8413447
Multivariate Normal Parameter Estimation
Likelihood Function
Suppose that xi = (xi1 , . . . , xip ) is a sample from a normal distribution

iid
with mean vector µ and covariance matrix Σ, i.e., xi ∼ N(µ, Σ).
The likelihood function for the parameters (given the data) has the form
n n
Y Y 1 1
L(µ, Σ|X) = f (xi ) = exp − (xi − µ)0 Σ−1 (xi − µ)
(2π)p/2 |Σ|1/2 2
i=1 i=1
and the log-likelihood function is given by

n
np n 1X
LL(µ, Σ|X) = − log(2π) − log(|Σ|) − (xi − µ)0 Σ−1 (xi − µ)
2 2 2
i=1

Maximum Likelihood Estimate of Mean Vector
The MLE of the mean vector is the value of µ that minimizes

n
X n
X
(xi − µ)0 Σ−1 (xi − µ) = x0i Σ−1 xi − 2nx̄0 Σ−1 µ + nµ0 Σ−1 µ
i=1 i=1
Pn
where x̄ = (1/n) i=1 xi is the sample mean vector.
Taking the derivative with respect to µ we find that
∂LL(µ, Σ|X)
= −2nΣ−1 x̄ + 2nΣ−1 µ ←→ x̄ = µ̂
∂µ
The sample mean vector x̄ is the MLE of the population mean µ vector.

Maximum Likelihood Estimate of Covariance Matrix

The MLE of the covariance matrix is the value of Σ that minimizes
n
X
−n log(|Σ−1 |) + tr{Σ−1 (xi − µ̂)(xi − µ̂)0 }
i=1
Pn
where µ̂ = x̄ = (1/n) i=1 xi is the sample mean.
Taking the derivative with respect to Σ−1 we find that

n
∂LL(µ, Σ|X) X
= −nΣ + (xi − µ̂)(xi − µ̂)0
∂Σ−1 i=1
i.e., the sample covariance matrix Σ̂ = (1/n) ni=1 (xi − x̄)(xi − x̄)0 is
P
the MLE of the population covariance matrix Σ.
Sampling Distributions
Sampling Distributions

Sampling Distributions Univariate Case
Univariate Sampling Distributions: x̄ and s2
In the univariate normal case, we have that

x̄ = (1/n) ni=1 xi ∼ N(µ, σ 2 /n)
P
(n − 1)s2 = ni=1 (xi − x̄)2 ∼ σ 2 χ2n−1

P
χ2k denotes a chi-square variable with k degrees of freedom.
Pk iid
σ 2 χ2k = 2
i=1 zi where zi ∼ N(0, σ 2 )

Sampling Distributions Multivariate Case
Multivariate Sampling Distributions: x̄ and S
In the multivariate normal case, we have that

x̄ = (1/n) ni=1 xi ∼ N(µ, Σ/n)
P
(n − 1)S = ni=1 (xi − x̄)(xi − x̄)0 ∼ Wn−1 (Σ)

P
Wk (Σ) denotes a Wishart variable with k degrees of freedom.
Pk 0 iid
Wk (Σ) = i=1 zi zi where zi ∼ N(0p , Σ)

Introduction To Normal Distribution: Nathaniel E. Helwig

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Normal Distribution: Nathaniel E. Helwig

Uploaded by

Copyright:

Available Formats

Introduction to Normal Distribution

Assistant Professor of Psychology and Statistics

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 1

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 2

2) Bivariate Normal: 4) Sampling Distributions:

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 4

Normal Density Function (Univariate)

Given a variable x ∈ R, the normal probability density function (pdf) is

Write X ∼ N(µ, σ 2 ) to denote that X follows a normal distribution.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 5

Standard Normal Distribution

Probabilities and Distribution Functions

Probabilities relate to the area under the pdf:

is the cumulative distribution function (cdf).

Note: F (x) = P(X ≤ x) =⇒ 0 ≤ F (x) ≤ 1

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 7

Helpful figure of normal probabilities:

−3σ −2σ −1σ µ 1σ 2σ 3σ

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 8

Normal Distribution Functions (Univariate)

Helpful figures of normal pdfs and cdfs:

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 9

Affine Transformations of Normal (Univariate)

Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0.

If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ).

Suppose that X ∼ N(1, 2). Determine the distributions of. . .

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10

Affine Transformations of Normal (Univariate)

Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0.

If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ).

Suppose that X ∼ N(1, 2). Determine the distributions of. . .

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10

Suppose that x = (x1 , . . . , xn ) is an iid sample of data from a normal

and the log-likelihood function is given by

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 11

Maximum Likelihood Estimate of the Mean

The MLE of the mean is the value of µ that minimizes

Taking the derivative with respect to µ we find that

∂ ni=1 (xi − µ)2

i.e., the sample mean x̄ is the MLE of the population mean µ.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 12

Maximum Likelihood Estimate of the Variance

Taking the derivative with respect to σ 2 we find that

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 14

Normal Density Function (Bivariate)

Given two variables x, y ∈ R, the bivariate normal pdf is

X and Y are marginally normal: X ∼ N(µx , σx2 ) and Y ∼ N(µy , σy2 )

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 15

Example: Different Means

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 17

Example: Different Correlations

ρ = −0.6/ 2 ρ=0 ρ = 1.2/ 2

Note: for all three plots µx = µy = 0, σx2 = 1, and σy2 = 2.

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 18

Example: Different Variances

Note: for all three plots µx = µy = 0, σx2 = 1, and ρ = 0.6/(σx σy ).

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 19

Probabilities and Multiple Integration

Probabilities still relate to the area under the pdf:

Defining z = (x, y ), we can still define the cdf:

F (z) = P(X ≤ x and Y ≤ y )

Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 20

Normal Distribution Functions (Bivariate)