My Notes For Discrete and Continuous Distributions 987654

V: Discrete and continuous distributions
A modern crash course in intermediate Statistics and Probability
Paul Rognon
Barcelona School of Economics

Universitat Pompeu Fabra
Universitat Politècnica de Catalunya
1 / 22
Discrete distributions
Bernoulli distribution
The Bernoulli distribution models an experiment with a single binary

outcome (e.g.: success or fail, head or tail). We say that X has a
Bernoulli distribution with parameter p > 0, and write X ∼ Bern(p), if
fX (x) = p x (1 − p)1−x for x = 0, 1
Related distribution and model

• the logistic regression for binary outcome is a generalized linear
model for response with Bernoulli distribution.
• if X ∼ Bern(p), then 2X − 1 has a Rademacher distribution, a
distribution that occurs in machine learning theory.
2 / 22
Binomial distribution
The binomial distribution models an experiment where we count the
number of successes in n independent Bernoulli experiments, all of which
having the same success probability p. We say that Y has a binomial
distribution, and write Y ∼ Bin(n, p). It takes values y = 0, 1, . . . , n
with probability
fY (y ) = yn p y (1 − p)n−y

Properties
• If Y1 ∼ Bin(n1 , p), Y2 ∼ Bin(n2 , p) and are independent, then
Y1 + Y2 ∼ Bin(n1 + n2 , p).
P
• If ∀i = 1 . . . n, Xi are i.i.d Bern(p) then i Xi ∼ Bin(n, p)
Indeed: Qn P P
P(X1 = x , . . . , X = x ) = p xi (1 − p)1−xi = p i xi (1 − p)n− i xi .
1 2 2 i=1
so P( i Xi = k) = kn p k (1 − p)n−k
P
The number of ways to choose k out of n objects is kn := (n−k)!k! n!

.
3 / 22
Poisson distribution
Poisson distribution is used to model counts of rare random events: shark
attacks, big meteors hitting the Earth, etc. We say that X has a Poisson
distribution with parameter λ > 0 if
λx
fX (x) = e −λ for x = 0, 1, 2, . . . .
x!
Here X = X (Ω) is discrete but infinite.
Related model and properties

• mean and variance are equal to λ. A strong limitation in modelling,
alternatives are negative binomial or adjustments for overdispersion.
• X1 ∼ Pois(λ1 ), X2 ∼ Pois(λ2 ) independent, then
X1 + X2 ∼ Pois(λ1 + λ2 ).
• Poisson process, a stochastic process that counts the number of
occurrences of event by time t.
4 / 22
Multinomial distribution
The multinomial distribution is a multivariate extension of the binomial

distribution. It models an experiment where n independent trials with a
finite number of possible outcomes (larger than 2) are run. We say that
X
Pkhas a Multinomial distribution with parameter (n, p) where
j=1 pj = 1 if:
k
n! X
fX (x1 , . . . , xk ) = p x1 . . . pkxk where xj = n
x1 ! . . . xk ! 1
j=1
Related model and properties

• when n = 1, it is called the categorical distribution
• frequently appears in clustering and dimension reduction models.
5 / 22
Continuous distributions
Normal (Gaussian) distribution
We say that X has a Gaussian distribution with mean µ ∈ R and
variance σ 2 > 0, denoted by N(µ, σ 2 ), if its density function is

1 1 2
fX (x) = √ exp − 2 (x − µ) for x ∈ R.
2πσ 2σ
The Gaussian distribution approximates many real phenomena: see
Galton’s board, central limit theorem, etc.
Basic properties
• X ∼ N(µ, σ 2 ) then X σ−µ ∼ N(0, 1) with CDF Φ, so, in particular,

a−µ X −µ b−µ b−µ a−µ
P(a ≤ X ≤ b) = P ≤ ≤ =Φ −Φ
σ σ σ σ σ
• Xi ∼ N(µi , σi2 ) independent, then Xi ∼ N( µi , σi2 ).

P P P
R∞ x2
−2
Useful exercise: Compute the integral −∞ e dx (use polar coordinates).
6 / 22
Gamma and Beta distributions
They are based on the gamma function:
Z ∞
Γ(z) = x z−1 e −x dx, z > 0 and Γ(n) = (n − 1)! n ∈ N⋆
0
We say that X has a Gamma distribution with parameters α and β,

denoted by X ∼ Gamma(α, β), if
1
fX (x) = x α−1 e −x/β , x > 0 where α, β > 0
β α Γ(α)
We say that X has a Beta distribution with parameters α and β, denoted

by X ∼ Beta(α, β), if
Γ(α + β) α−1
fX (x) = x (1 − x)β−1 , x ∈ (0, 1) where α, β > 0
Γ(α)Γ(β)
Those and related distributions frequently appear in Bayesian statistics.

7 / 22
Other continuous distributions
The uniform distribution over [a, b], U(a, b)
1
The density function is fX (x) = b−a 11[a,b] (x).
Exponential distribution, Exp(λ)

The density function is fX (x) = λe −λx for λ > 0, x > 0.
Exponential distribution is used to model waiting times between

occurences in a Poisson process. It is a special case of the gamma
distribution, it is a Gamma(1, λ1 ).
Chi-square distribution, χ2p

If Z1 , . . . , Zp are independent N(0, 1) then
X = Z12 + Z22 + · · · + Zp2 ∼ χ2p .
The natural number p is called the degrees of freedom. It is also a

special case of the gamma distribution, it is a Gamma( p2 , 2).
8 / 22
Multivariate Gaussian: definition
Standard multivariate normal distribution
Let Z1 , . . . , Zp be independent identically distributed (iid) N(0, 1)
variables. Their joint distribution is:
p
Y 1 1 2 1 1
f (z1 , . . . , zp ) = √ exp(− zi ) = p/2
exp(− z T z).
2π 2 (2π) 2
i=1
Let Z = (Z1 , . . . , Zp ). Z is a random vector of p standard normal

random variables with mean vector:
µ = 0p
and covariance matrix:

Σ = Ip
We say Z has a standard multivariate normal distribution and note:
Z ∼ Np (0p , Ip )
9 / 22
Standard multivariate normal distribution
10 / 22
Bivariate normal distribution
We now define a case without independence for two variables. Let

σ12 ρσ1 σ2

2
µ ∈ R and Σ = definite positive.
ρσ1 σ2 σ22
We say the vector X = (X1 , X2 ) has a bivariate normal distribution and
note X ∼ N2 (µ, Σ), if:
( "
x1 − µ1 2

1 1 1
f (x1 , x2 ) = exp −
2π σ1 σ2 (1 − ρ2 ) 12 2 (1 − ρ2 ) σ1
#)
x2 − µ2 2

x1 − µ1 x2 − µ2
+ − 2ρ
σ2 σ1 σ2
11 / 22
Contours of bivariate normal distribution when ρ = 0
12 / 22
Contours of bivariate normal distribution when ρ ≈ 0.5
13 / 22
Contours of bivariate normal distribution when ρ → 1
14 / 22
Exercise
Qp
1. Show that √1 exp(− 21 zi2 ) = 1
exp(− 12 z T z).
i=1 2π (2π)p/2
σ12 ρσ1 σ2

2. Let X ∼ N2 (µ, Σ) with µ = (µ1 , µ2 ) and Σ = .
ρσ1 σ2 σ22
Show that
f (x1 , x2 ) = (2π)1 p/2 (det Σ)−1/2 exp − 12 (x − µ)T Σ−1 (x − µ)

3. Find
a sufficient and necessary condition on ρ for
σ12 ρσ1 σ2

Σ= to be positive definite.
ρσ1 σ2 σ22
4. Why are the contours of the bivariate normal ellipse? What are the
principal axes of the ellipse?
15 / 22
Multivariate normal distribution (general case)
Let µ ∈ Rp and Σ a symmetric positive definite p × p matrix. We say

the vector X = (X1 , X2 , . . . , Xp ) has (non-degenerate) multivariate
normal distribution and note X ∼ Np (µ, Σ) when it has density:

1 −1/2 1 T −1
f (x) = (det Σ) exp − (x − µ) Σ (x − µ) .
(2π)p/2 2
Its characteristic function is:

1
φX (t) = exp itT µ exp − tT Σt ,

∀t
2
Its moment generating function is:

1 T
MX (t) = exp tT µ exp

t Σt
2
16 / 22
Multivariate Gaussian: linear
transformations
Linear transformations
Multivariate normal distribution is closed under linear transformations. It
is a defining property of the multivariate normal distribution:
If X ∼ Np (µ, Σ), for any A ∈ Rm×p (m ≤ p), AX ∼ Nm (Aµ, AΣAT )
Corollary
Σ is positive definite then there exists V orthogonal and Λ diagonal such
that: Σ = V ΛV T . We define Σ1/2 = V Λ1/2 V T and
Σ−1/2 = V Λ−1/2 V T .
• If Z ∼ Np (0p , Ip ) and X = µ + Σ1/2 Z then X ∼ Np (µ, Σ).
• If X ∼ Np (µ, Σ) then Σ−1/2 (X − µ) ∼ Np (0p , Ip ).
Exercise
Let X ∼ Np (µ, Σ), what is the distribution of (X − µ)T Σ−1 (X − µ)?
17 / 22
Example: Change of variables, Cholesky decomposition
and multivariate normal distribution
Suppose that X has a standard multivariate normal distribution. Let Σ
be a symmetric positive definite matrix and let Σ = LT L be its Cholesky
decomposition. What’s the distribution of Y = LT X ?
Since x = L−T y then |J(y )| = det(L−T ) = 1/ det(L) and

1 1
fX (L−T y ) = p/2
exp{− (L−T y )T (L−T y )}
(2π) 2
1 1 T −1
= exp{− y Σ y }
(2π)p/2 2
Noting that det(L) = (det Σ)1/2 we get

1 1 T −1
fY (y ) = fX (L−T y )|J(y )| = (det Σ)−1/2
exp − y Σ y .
(2π)p/2 2
That is Y ∼ N(0, Σ)
18 / 22
Multivariate Gaussian: marginal
and conditional distributions and
independence
Marginal and conditional distributions
The multivariate normal distribution is closed under marginalization and
conditioning. Split X into two blocks X = (XA , XB ). Denote:

ΣAA ΣAB
µ = (µA , µB ) and Σ=
ΣBA ΣBB
Marginal distribution
XA ∼ N|A| (µA , ΣAA )

XB ∼ N|B| (µB , ΣBB )
Where |A| and |B| are the dimension of vectors XA and XB

Conditional distribution
∼ N|A| µA + ΣAB Σ−1 −1

XA |XB = xB BB (xB − µB ), ΣAA − ΣAB ΣBB ΣBA
∼ N|B| µB + ΣBA Σ−1 −1

XB |XA = xA AA (xA − µA ), ΣBB − ΣBA ΣAA ΣAB
19 / 22
Independence
The covariance matrix Σ of a multivariate normal vector and its inverse
K = Σ−1 encode independence relations. K is called the precision or
concentration matrix.
Two by two independence
Xi ⊥
⊥ Xj ⇔ Σij = 0
Conditional independence
⊥ Xj |Xrest ⇔ Σij = ΣiR Σ−1
Xi ⊥ −1
RR ΣRj ⇔ (Σ )ij = 0
The conditional independence properties of the precision matrix give rise
to an entire family of models called Gaussian graphical models.
Block matrix inversion:
−1
(M/D)−1 −(M/D)−1 BD −1

A B
M= ==
C D −D −1 C (M/D)−1 D −1 + D −1 C (M/D)−1 BD −1
Where M/D := A − BD −1 C and M/A := D − CA−1 B are called respectively

the Schur complements of block D and of block A.
20 / 22
Exercise

1 0.7
1. Consider a bivariate normal with µ = (0, 2) and Σ = .
0.7 1
Find E[X1 |X2 ] and var(X1 |X2 ).
 
1.98 −1.4 −0.14
2. Consider the covariance matrix Σ = −1.40 2.0 0.20  of X a
−0.14 0.2 1.02
Gaussian vector. Are there components of X that are independent? Are
there components of X that are conditionally independent?
21 / 22
Wishart distribution
Wishart distribution
Can we define a distribution over the set of all p × p symmetric positive
definite matrices? Yes in the Gaussian case.
iid
Let X1 , . . . , Xn −→ Np (0p , Σ), then
n
X
Y := nSn = Xi XiT has Wishart distribution Wp (Σ, n)
i=1
Denote K = Σ−1 . Then the density of the Wishart distribution is
(det K )n/2 n−p−1 1

f (Y ) = np/2
(det Y ) 2 e − 2 trace(KY ) ,
2 Γp (n/2)
which is well defined for any real n > p − 1.
We have E(Y ) = nΣ.
22 / 22

My Notes For Discrete and Continuous Distributions 987654

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

My Notes For Discrete and Continuous Distributions 987654

Uploaded by

Copyright:

Available Formats

V: Discrete and continuous distributions

A modern crash course in intermediate Statistics and Probability

Barcelona School of Economics

The Bernoulli distribution models an experiment with a single binary

fX (x) = p x (1 − p)1−x for x = 0, 1

Related distribution and model

The number of ways to choose k out of n objects is kn := (n−k)!k! n!

Related model and properties

The multinomial distribution is a multivariate extension of the binomial

Related model and properties

• Xi ∼ N(µi , σi2 ) independent, then Xi ∼ N( µi , σi2 ).

We say that X has a Gamma distribution with parameters α and β,

We say that X has a Beta distribution with parameters α and β, denoted

Those and related distributions frequently appear in Bayesian statistics.

Exponential distribution, Exp(λ)

Exponential distribution is used to model waiting times between

Chi-square distribution, χ2p

X = Z12 + Z22 + · · · + Zp2 ∼ χ2p .

The natural number p is called the degrees of freedom. It is also a

Let Z = (Z1 , . . . , Zp ). Z is a random vector of p standard normal

and covariance matrix:

We now define a case without independence for two variables. Let

Let µ ∈ Rp and Σ a symmetric positive definite p × p matrix. We say

Its characteristic function is:

Its moment generating function is:

If X ∼ Np (µ, Σ), for any A ∈ Rm×p (m ≤ p), AX ∼ Nm (Aµ, AΣAT )

• If Z ∼ Np (0p , Ip ) and X = µ + Σ1/2 Z then X ∼ Np (µ, Σ).

• If X ∼ Np (µ, Σ) then Σ−1/2 (X − µ) ∼ Np (0p , Ip ).

Since x = L−T y then |J(y )| = det(L−T ) = 1/ det(L) and

Noting that det(L) = (det Σ)1/2 we get

XA ∼ N|A| (µA , ΣAA )

Where |A| and |B| are the dimension of vectors XA and XB

∼ N|B| µB + ΣBA Σ−1 −1

Where M/D := A − BD −1 C and M/A := D − CA−1 B are called respectively

Denote K = Σ−1 . Then the density of the Wishart distribution is

(det K )n/2 n−p−1 1

which is well defined for any real n > p − 1.

We have E(Y ) = nΣ.

You might also like