Professional Documents
Culture Documents
Paul Rognon
1 / 22
Discrete distributions
Bernoulli distribution
2 / 22
Binomial distribution
The binomial distribution models an experiment where we count the
number of successes in n independent Bernoulli experiments, all of which
having the same success probability p. We say that Y has a binomial
distribution, and write Y ∼ Bin(n, p). It takes values y = 0, 1, . . . , n
with probability
fY (y ) = yn p y (1 − p)n−y
Properties
• If Y1 ∼ Bin(n1 , p), Y2 ∼ Bin(n2 , p) and are independent, then
Y1 + Y2 ∼ Bin(n1 + n2 , p).
P
• If ∀i = 1 . . . n, Xi are i.i.d Bern(p) then i Xi ∼ Bin(n, p)
Indeed: Qn P P
P(X1 = x , . . . , X = x ) = p xi (1 − p)1−xi = p i xi (1 − p)n− i xi .
1 2 2 i=1
so P( i Xi = k) = kn p k (1 − p)n−k
P
4 / 22
Multinomial distribution
k
n! X
fX (x1 , . . . , xk ) = p x1 . . . pkxk where xj = n
x1 ! . . . xk ! 1
j=1
5 / 22
Continuous distributions
Normal (Gaussian) distribution
We say that X has a Gaussian distribution with mean µ ∈ R and
variance σ 2 > 0, denoted by N(µ, σ 2 ), if its density function is
1 1 2
fX (x) = √ exp − 2 (x − µ) for x ∈ R.
2πσ 2σ
The Gaussian distribution approximates many real phenomena: see
Galton’s board, central limit theorem, etc.
Basic properties
• X ∼ N(µ, σ 2 ) then X σ−µ ∼ N(0, 1) with CDF Φ, so, in particular,
a−µ X −µ b−µ b−µ a−µ
P(a ≤ X ≤ b) = P ≤ ≤ =Φ −Φ
σ σ σ σ σ
R∞ x2
−2
Useful exercise: Compute the integral −∞ e dx (use polar coordinates).
6 / 22
Gamma and Beta distributions
They are based on the gamma function:
Z ∞
Γ(z) = x z−1 e −x dx, z > 0 and Γ(n) = (n − 1)! n ∈ N⋆
0
1
fX (x) = x α−1 e −x/β , x > 0 where α, β > 0
β α Γ(α)
Γ(α + β) α−1
fX (x) = x (1 − x)β−1 , x ∈ (0, 1) where α, β > 0
Γ(α)Γ(β)
µ = 0p
Z ∼ Np (0p , Ip )
9 / 22
Standard multivariate normal distribution
10 / 22
Bivariate normal distribution
( "
x1 − µ1 2
1 1 1
f (x1 , x2 ) = exp −
2π σ1 σ2 (1 − ρ2 ) 12 2 (1 − ρ2 ) σ1
#)
x2 − µ2 2
x1 − µ1 x2 − µ2
+ − 2ρ
σ2 σ1 σ2
11 / 22
Contours of bivariate normal distribution when ρ = 0
12 / 22
Contours of bivariate normal distribution when ρ ≈ 0.5
13 / 22
Contours of bivariate normal distribution when ρ → 1
14 / 22
Exercise
Qp
1. Show that √1 exp(− 21 zi2 ) = 1
exp(− 12 z T z).
i=1 2π (2π)p/2
σ12 ρσ1 σ2
2. Let X ∼ N2 (µ, Σ) with µ = (µ1 , µ2 ) and Σ = .
ρσ1 σ2 σ22
Show that
f (x1 , x2 ) = (2π)1 p/2 (det Σ)−1/2 exp − 12 (x − µ)T Σ−1 (x − µ)
3. Find
a sufficient and necessary condition on ρ for
σ12 ρσ1 σ2
Σ= to be positive definite.
ρσ1 σ2 σ22
4. Why are the contours of the bivariate normal ellipse? What are the
principal axes of the ellipse?
15 / 22
Multivariate normal distribution (general case)
16 / 22
Multivariate Gaussian: linear
transformations
Linear transformations
Multivariate normal distribution is closed under linear transformations. It
is a defining property of the multivariate normal distribution:
Corollary
Σ is positive definite then there exists V orthogonal and Λ diagonal such
that: Σ = V ΛV T . We define Σ1/2 = V Λ1/2 V T and
Σ−1/2 = V Λ−1/2 V T .
Exercise
Let X ∼ Np (µ, Σ), what is the distribution of (X − µ)T Σ−1 (X − µ)?
17 / 22
Example: Change of variables, Cholesky decomposition
and multivariate normal distribution
Suppose that X has a standard multivariate normal distribution. Let Σ
be a symmetric positive definite matrix and let Σ = LT L be its Cholesky
decomposition. What’s the distribution of Y = LT X ?
That is Y ∼ N(0, Σ)
18 / 22
Multivariate Gaussian: marginal
and conditional distributions and
independence
Marginal and conditional distributions
The multivariate normal distribution is closed under marginalization and
conditioning. Split X into two blocks X = (XA , XB ). Denote:
ΣAA ΣAB
µ = (µA , µB ) and Σ=
ΣBA ΣBB
Marginal distribution
19 / 22
Independence
The covariance matrix Σ of a multivariate normal vector and its inverse
K = Σ−1 encode independence relations. K is called the precision or
concentration matrix.
Two by two independence
Xi ⊥
⊥ Xj ⇔ Σij = 0
Conditional independence
⊥ Xj |Xrest ⇔ Σij = ΣiR Σ−1
Xi ⊥ −1
RR ΣRj ⇔ (Σ )ij = 0
The conditional independence properties of the precision matrix give rise
to an entire family of models called Gaussian graphical models.
Block matrix inversion:
−1
(M/D)−1 −(M/D)−1 BD −1
A B
M= ==
C D −D −1 C (M/D)−1 D −1 + D −1 C (M/D)−1 BD −1
1 0.7
1. Consider a bivariate normal with µ = (0, 2) and Σ = .
0.7 1
Find E[X1 |X2 ] and var(X1 |X2 ).
1.98 −1.4 −0.14
2. Consider the covariance matrix Σ = −1.40 2.0 0.20 of X a
−0.14 0.2 1.02
Gaussian vector. Are there components of X that are independent? Are
there components of X that are conditionally independent?
21 / 22
Wishart distribution
Wishart distribution
Can we define a distribution over the set of all p × p symmetric positive
definite matrices? Yes in the Gaussian case.
iid
Let X1 , . . . , Xn −→ Np (0p , Σ), then
n
X
Y := nSn = Xi XiT has Wishart distribution Wp (Σ, n)
i=1
22 / 22