This action might not be possible to undo. Are you sure you want to continue?
Cheat Sheet for Probability Theory
Ingmar Land
1 Scalarvalued Random Variables
Consider two realvalued random variables (RV) X and Y with the individual probabil
ity distributions p
X
(x) and p
Y
(y), and the joint distribution p
X,Y
(x, y). The probability
distributions are probability mass functions (pmf) if the random variables take discrete
values, and they are probability density functions (ptf) if the random variables are con
tinuous. Some authors use f() instead of p(), especially for continuous RVs.
In the following, the RVs are assumed to be continuous. (For discrete RVs, the integrals
have simply to be replaced by sums.)
• Marginal distributions:
p
X
(x) =
p
X,Y
(x, y) d y p
Y
(y) =
p
X,Y
(x, y) d x
• Conditional distributions:
p
XY
(xy) =
p
X,Y
(x, y)
p
Y
(y)
p
Y X
(yx) =
p
X,Y
(x, y)
p
X
(x)
for p
X
(x) = 0 and p
Y
(y) = 0
• Bayes’ rule:
p
XY
(xy) =
p
X,Y
(x, y)
p
X,Y
(x
, y) d x
p
Y X
(yx) =
p
X,Y
(x, y)
p
X,Y
(x
, y) d y
• Expected values (expectations):
E
g
1
(X)
:=
g
1
(x) p
X
(x) d x
E
g
2
(Y )
:=
g
2
(y) p
Y
(y) d y
E
g
3
(X, Y )
:=
g
3
(x, y) p
X,Y
(x, y) d xd y
for any functions g
1
(.), g
2
(.), g
3
(., .)
Ingmar Land, October 8, 2005 2
• Some special expected values:
– Means (mean values):
µ
X
:= E
X
=
x p
X
(x) d x µ
Y
:= E
Y
=
y p
Y
(y) d y
– Variances:
σ
2
X
≡ Σ
XX
:= E
(X − µ
X
)
2
=
(x − µ
X
)
2
p
X
(x) d x
σ
2
Y
≡ Σ
Y Y
:= E
(Y − µ
Y
)
2
=
(y − µ
Y
)
2
p
Y
(y) d y
Remark: The variance measures the “width” of a distribution. A small variance
means that most of the probability mass is concentrated around the mean value.
– Covariance:
σ
XY
≡ Σ
XY
:= E
(X − µ
X
)(Y − µ
Y
)
=
(x − µ
X
)(y − µ
Y
) p
X,Y
(x, y) d xd y
Remark: The covariance measures how “related” two RVs are. Two indepen
dent RVs have covariance zero.
– Correlation coeﬃcient:
ρ
XY
:=
σ
XY
σ
X
σ
Y
– Relations:
E
X
2
= Σ
XX
+ µ
2
X
E
Y
2
= Σ
Y Y
+ µ
2
Y
E
X · Y
= Σ
XY
+ µ
X
· µ
Y
– Proof of last relation:
E
XY
= E
((X − µ
X
) + µ
X
)((Y − µ
Y
) + µ
Y
)
= E
(X − µ
X
)(Y − µ
Y
)
− E
(X − µ
X
)µ
Y
− E
µ
X
(Y − µ
Y
)
+
+ E
µ
X
µ
Y
= Σ
XY
− (E[X] − µ
X
)µ
Y
− (E[Y ] − µ
Y
)µ
X
+ µ
X
µ
Y
= Σ
XY
+ µ
X
µ
Y
This method of proof is typical.
Ingmar Land, October 8, 2005 3
• Conditional expectations:
E
g(X)Y = y
:=
g(x)p
XY
(xy) d x
Law of total expectation:
E
E
g(X)Y
=
E
g(X)Y = y
p
Y
(y) d y
=
g(x)p
XY
(xy) d xp
Y
(y) d y
=
g(x)
p
XY
(xy)p
Y
(y) d y d x
=
g(x)p
X
(x) d x
= E
g(X)
• Special conditional expectations:
– Conditional mean:
µ
XY =y
:= E
XY = y
=
xp
XY
(xy) d x
– Conditional variance:
Σ
XXY =y
:= E
(X − µ
X
)
2
Y = y
=
(x − µ
X
)
2
p
XY
(xy) d x
– Relation:
E
X
2
Y = y
= Σ
XXY =y
+ µ
2
XY =y
• Sum of two random variables: Let Z := X + Y ; then
p
Z
(z) = p
X
(z) ∗ p
Y
(z)
where ∗ denotes convolution. The proof uses the characteristic functions.
• The RVs are called independent if
p
X,Y
(x, y) = p
X
(x) · p
Y
(y).
This condition is equivalently to the condition that
E
g
1
(X) · g
2
(Y )
= E
g
1
(X)
· E
g
2
(Y )
for all (!) functions g
1
(.) and g
2
(.).
Ingmar Land, October 8, 2005 4
• The RVs are called uncorrelated if
σ
XY
≡ Σ
XY
= E
(X − µ
X
)(Y − µ
Y
)
= 0.
Remark: If RVs are independent, they are also uncorrelated. The reverse holds only
for Gaussian RVs (see below).
• Two RVs X and Y are called orthogonal if E[XY ] = 0.
Remark: The RVs with ﬁnite energy, E[X
2
] < ∞, form a vector space with scalar
product X, Y = E[XY ] and norm X =
E[X
2
]. (This is used in MMSE
estimation.)
These relations for scalarvalued RVs are generalized to vectorvalued RVs in the
following.
2 Vectorvalued Random Variables
Consider two realvalued vectorvalued random variables (RV)
X =
¸
X
1
X
2
, Y =
¸
Y
1
Y
2
with the individual probability distributions p
X
(x) and p
Y
(y), and the joint distribution
p
X,Y
(x, y). (The following considerations can be generalized to longer vectors, of course.)
The probability distributions are probability mass functions (pmf) if the random vari
ables take discrete values, and they are probability density functions (pmf) if the random
variables are continuous. Some authors use f() instead of p(), especially for continuous
RVs.
In the following, the RVs are assumed to be continuous. (For discrete RVs, the integrals
have simply to be replaced by sums.)
Remark: The following matrix notations may seem to be cumbersome at the ﬁrst
glance, but they turn out to be quite handy and convenient (once you got used to).
• Marginal distributions, conditional distributions, Bayes’ rule, expected values work
as in the scalar case.
• Some special expected values:
– Mean vector (vector of mean values):
µ
X
:= E
X
= E
¸
X
1
X
2
=
¸
E[X
1
]
E[X
2
]
=
¸
µ
X
1
µ
X
2
Ingmar Land, October 8, 2005 5
– Covariance matrix (autocovariance matrix):
Σ
XX
:= E
(X − µ
X
)(X − µ
X
)
T
= E
¸¸
X
1
− µ
1
X
2
− µ
2
X
1
− µ
1
X
2
− µ
2
=
¸
E
(X
1
− µ
1
)(X
1
− µ
1
)
E
(X
1
− µ
1
)(X
2
− µ
2
)
E
(X
2
− µ
2
)(X
1
− µ
1
)
E
(X
2
− µ
2
)(X
2
− µ
2
)
=
¸
Σ
X
1
X
1
Σ
X
1
X
2
Σ
X
2
X
1
Σ
X
2
X
2
– Covariance matrix (crosscovariance matrix):
Σ
XY
:= E
(X − µ
X
)(Y − µ
Y
)
T
=
¸
Σ
X
1
Y
1
Σ
X
1
Y
2
Σ
X
2
Y
1
Σ
X
2
Y
2
Remark: This matrix contains the covariance of each element of the ﬁrst vector
with each element of the second vector.
– Relations:
E
XX
T
= Σ
XX
+ µ
X
µ
T
X
E
XY
T
= Σ
XY
+ µ
X
µ
T
Y
Remark: This result is not too surprising when you know the result for the
scalar case.
3 Gaussian Random Variables
• A Gaussian RV X with mean µ
X
and variance σ
2
X
is a continuous random variable
with a Gaussian pdf, i.e., with
p
X
(x) =
1
2πσ
2
X
· exp
−
(x − µ
X
)
2
2σ
2
X
The often used symbolic notation
X ∼ N(µ
X
, σ
2
X
)
may be read as: X is (distributed) Gaussian with mean µ
X
and variance σ
2
X
.
• A Gaussian distribution with mean zero and variance one is called a normal distri
bution:
p(x) =
1
√
2π
e
−
x
2
2
.
Ingmar Land, October 8, 2005 6
Some authors use the term “normal distribution” equivalently with “Gaussian dis
tribution”. So, use the term “normal distribution” with caution.
The integral of a normal pdf cannot be solved in closed form and is therefore often
expressed using the Qfunction
Q(z) :=
1
√
2π
∞
z
e
−
x
2
2
d x
Notice the limits of the integral. The integral of any Gaussian pdf can also be
expressed using the Qfunction, of course.
• A vectorvalued RV X with mean µ
X
and covariance matrix Σ
XX
,
X =
¸
X
1
X
2
, µ
X
=
¸
µ
X
1
µ
X
2
, Σ
XX
=
¸
Σ
X
1
X
1
Σ
X
1
X
2
Σ
X
2
X
1
Σ
X
2
X
2
,
is called Gaussian if its components have a jointly Gaussian pdf
p
X
(x) =
1
√
2π
2
Σ
XX

1/2
· exp
−
1
2
(X − µ
X
)
T
Σ
XX
(X − µ
X
)
.
(There is nothing to understand, this is just a deﬁnition.) The marginal distri
butions and the conditional distributions of a Gaussian vector are again Gaussian
distributions. (But this can be proved.)
The corresponding symbolic notation is
X ∼ N(µ
X
, Σ
XX
)
This can be generalized to longer vectors, of course.
• Gaussian RVs are completely described by mean and covariance. Therefore, if they
are uncorrelated, they are also independent. This holds only for Gaussian RVs. (Cf.
above.)