This action might not be possible to undo. Are you sure you want to continue?
Cheat Sheet for Probability Theory
Ingmar Land
1 Scalarvalued Random Variables
Consider two realvalued random variables (RV) X and Y with the individual probabil
ity distributions p
X
(x) and p
Y
(y), and the joint distribution p
X,Y
(x, y). The probability
distributions are probability mass functions (pmf) if the random variables take discrete
values, and they are probability density functions (ptf) if the random variables are con
tinuous. Some authors use f() instead of p(), especially for continuous RVs.
In the following, the RVs are assumed to be continuous. (For discrete RVs, the integrals
have simply to be replaced by sums.)
• Marginal distributions:
p
X
(x) =
p
X,Y
(x, y) d y p
Y
(y) =
p
X,Y
(x, y) d x
• Conditional distributions:
p
XY
(xy) =
p
X,Y
(x, y)
p
Y
(y)
p
Y X
(yx) =
p
X,Y
(x, y)
p
X
(x)
for p
X
(x) = 0 and p
Y
(y) = 0
• Bayes’ rule:
p
XY
(xy) =
p
X,Y
(x, y)
p
X,Y
(x
, y) d x
p
Y X
(yx) =
p
X,Y
(x, y)
p
X,Y
(x
, y) d y
• Expected values (expectations):
E
g
1
(X)
:=
g
1
(x) p
X
(x) d x
E
g
2
(Y )
:=
g
2
(y) p
Y
(y) d y
E
g
3
(X, Y )
:=
g
3
(x, y) p
X,Y
(x, y) d xd y
for any functions g
1
(.), g
2
(.), g
3
(., .)
Ingmar Land, October 8, 2005 2
• Some special expected values:
– Means (mean values):
µ
X
:= E
X
=
x p
X
(x) d x µ
Y
:= E
Y
=
y p
Y
(y) d y
– Variances:
σ
2
X
≡ Σ
XX
:= E
(X − µ
X
)
2
=
(x − µ
X
)
2
p
X
(x) d x
σ
2
Y
≡ Σ
Y Y
:= E
(Y − µ
Y
)
2
=
(y − µ
Y
)
2
p
Y
(y) d y
Remark: The variance measures the “width” of a distribution. A small variance
means that most of the probability mass is concentrated around the mean value.
– Covariance:
σ
XY
≡ Σ
XY
:= E
(X − µ
X
)(Y − µ
Y
)
=
(x − µ
X
)(y − µ
Y
) p
X,Y
(x, y) d xd y
Remark: The covariance measures how “related” two RVs are. Two indepen
dent RVs have covariance zero.
– Correlation coeﬃcient:
ρ
XY
:=
σ
XY
σ
X
σ
Y
– Relations:
E
X
2
= Σ
XX
+ µ
2
X
E
Y
2
= Σ
Y Y
+ µ
2
Y
E
X · Y
= Σ
XY
+ µ
X
· µ
Y
– Proof of last relation:
E
XY
= E
((X − µ
X
) + µ
X
)((Y − µ
Y
) + µ
Y
)
= E
(X − µ
X
)(Y − µ
Y
)
− E
(X − µ
X
)µ
Y
− E
µ
X
(Y − µ
Y
)
+
+ E
µ
X
µ
Y
= Σ
XY
− (E[X] − µ
X
)µ
Y
− (E[Y ] − µ
Y
)µ
X
+ µ
X
µ
Y
= Σ
XY
+ µ
X
µ
Y
This method of proof is typical.
Ingmar Land, October 8, 2005 3
• Conditional expectations:
E
g(X)Y = y
:=
g(x)p
XY
(xy) d x
Law of total expectation:
E
E
g(X)Y
=
E
g(X)Y = y
p
Y
(y) d y
=
g(x)p
XY
(xy) d xp
Y
(y) d y
=
g(x)
p
XY
(xy)p
Y
(y) d y d x
=
g(x)p
X
(x) d x
= E
g(X)
• Special conditional expectations:
– Conditional mean:
µ
XY =y
:= E
XY = y
=
xp
XY
(xy) d x
– Conditional variance:
Σ
XXY =y
:= E
(X − µ
X
)
2
Y = y
=
(x − µ
X
)
2
p
XY
(xy) d x
– Relation:
E
X
2
Y = y
= Σ
XXY =y
+ µ
2
XY =y
• Sum of two random variables: Let Z := X + Y ; then
p
Z
(z) = p
X
(z) ∗ p
Y
(z)
where ∗ denotes convolution. The proof uses the characteristic functions.
• The RVs are called independent if
p
X,Y
(x, y) = p
X
(x) · p
Y
(y).
This condition is equivalently to the condition that
E
g
1
(X) · g
2
(Y )
= E
g
1
(X)
· E
g
2
(Y )
for all (!) functions g
1
(.) and g
2
(.).
Ingmar Land, October 8, 2005 4
• The RVs are called uncorrelated if
σ
XY
≡ Σ
XY
= E
(X − µ
X
)(Y − µ
Y
)
= 0.
Remark: If RVs are independent, they are also uncorrelated. The reverse holds only
for Gaussian RVs (see below).
• Two RVs X and Y are called orthogonal if E[XY ] = 0.
Remark: The RVs with ﬁnite energy, E[X
2
] < ∞, form a vector space with scalar
product X, Y = E[XY ] and norm X =
E[X
2
]. (This is used in MMSE
estimation.)
These relations for scalarvalued RVs are generalized to vectorvalued RVs in the
following.
2 Vectorvalued Random Variables
Consider two realvalued vectorvalued random variables (RV)
X =
¸
X
1
X
2
, Y =
¸
Y
1
Y
2
with the individual probability distributions p
X
(x) and p
Y
(y), and the joint distribution
p
X,Y
(x, y). (The following considerations can be generalized to longer vectors, of course.)
The probability distributions are probability mass functions (pmf) if the random vari
ables take discrete values, and they are probability density functions (pmf) if the random
variables are continuous. Some authors use f() instead of p(), especially for continuous
RVs.
In the following, the RVs are assumed to be continuous. (For discrete RVs, the integrals
have simply to be replaced by sums.)
Remark: The following matrix notations may seem to be cumbersome at the ﬁrst
glance, but they turn out to be quite handy and convenient (once you got used to).
• Marginal distributions, conditional distributions, Bayes’ rule, expected values work
as in the scalar case.
• Some special expected values:
– Mean vector (vector of mean values):
µ
X
:= E
X
= E
¸
X
1
X
2
=
¸
E[X
1
]
E[X
2
]
=
¸
µ
X
1
µ
X
2
Ingmar Land, October 8, 2005 5
– Covariance matrix (autocovariance matrix):
Σ
XX
:= E
(X − µ
X
)(X − µ
X
)
T
= E
¸¸
X
1
− µ
1
X
2
− µ
2
X
1
− µ
1
X
2
− µ
2
=
¸
E
(X
1
− µ
1
)(X
1
− µ
1
)
E
(X
1
− µ
1
)(X
2
− µ
2
)
E
(X
2
− µ
2
)(X
1
− µ
1
)
E
(X
2
− µ
2
)(X
2
− µ
2
)
=
¸
Σ
X
1
X
1
Σ
X
1
X
2
Σ
X
2
X
1
Σ
X
2
X
2
– Covariance matrix (crosscovariance matrix):
Σ
XY
:= E
(X − µ
X
)(Y − µ
Y
)
T
=
¸
Σ
X
1
Y
1
Σ
X
1
Y
2
Σ
X
2
Y
1
Σ
X
2
Y
2
Remark: This matrix contains the covariance of each element of the ﬁrst vector
with each element of the second vector.
– Relations:
E
XX
T
= Σ
XX
+ µ
X
µ
T
X
E
XY
T
= Σ
XY
+ µ
X
µ
T
Y
Remark: This result is not too surprising when you know the result for the
scalar case.
3 Gaussian Random Variables
• A Gaussian RV X with mean µ
X
and variance σ
2
X
is a continuous random variable
with a Gaussian pdf, i.e., with
p
X
(x) =
1
2πσ
2
X
· exp
−
(x − µ
X
)
2
2σ
2
X
The often used symbolic notation
X ∼ N(µ
X
, σ
2
X
)
may be read as: X is (distributed) Gaussian with mean µ
X
and variance σ
2
X
.
• A Gaussian distribution with mean zero and variance one is called a normal distri
bution:
p(x) =
1
√
2π
e
−
x
2
2
.
Ingmar Land, October 8, 2005 6
Some authors use the term “normal distribution” equivalently with “Gaussian dis
tribution”. So, use the term “normal distribution” with caution.
The integral of a normal pdf cannot be solved in closed form and is therefore often
expressed using the Qfunction
Q(z) :=
1
√
2π
∞
z
e
−
x
2
2
d x
Notice the limits of the integral. The integral of any Gaussian pdf can also be
expressed using the Qfunction, of course.
• A vectorvalued RV X with mean µ
X
and covariance matrix Σ
XX
,
X =
¸
X
1
X
2
, µ
X
=
¸
µ
X
1
µ
X
2
, Σ
XX
=
¸
Σ
X
1
X
1
Σ
X
1
X
2
Σ
X
2
X
1
Σ
X
2
X
2
,
is called Gaussian if its components have a jointly Gaussian pdf
p
X
(x) =
1
√
2π
2
Σ
XX

1/2
· exp
−
1
2
(X − µ
X
)
T
Σ
XX
(X − µ
X
)
.
(There is nothing to understand, this is just a deﬁnition.) The marginal distri
butions and the conditional distributions of a Gaussian vector are again Gaussian
distributions. (But this can be proved.)
The corresponding symbolic notation is
X ∼ N(µ
X
, Σ
XX
)
This can be generalized to longer vectors, of course.
• Gaussian RVs are completely described by mean and covariance. Therefore, if they
are uncorrelated, they are also independent. This holds only for Gaussian RVs. (Cf.
above.)
– Correlation coeﬃcient: ρXY := – Relations: E X 2 = ΣXX + µ2 X E Y 2 = Σ Y Y + µ2 Y E X · Y = ΣXY + µX · µY – Proof of last relation: E XY = E ((X − µX ) + µX )((Y − µY ) + µY ) σXY σX σY + E µX µY = ΣXY − (E[X] − µX )µY − (E[Y ] − µY )µX + µX µY = ΣXY + µX µY This method of proof is typical. – Covariance: σXY ≡ ΣXY := E (X − µX )(Y − µY ) = (x − µX )(y − µY ) pX. 2005 2 • Some special expected values: – Means (mean values): µX := E X = – Variances: 2 σX ≡ ΣXX := E (X − µX )2 = 2 σY ≡ ΣY Y := E (Y − µY )2 = x pX (x) d x µY := E Y = y pY (y) d y (x − µX )2 pX (x) d x (y − µY )2 pY (y) d y Remark: The variance measures the “width” of a distribution. = E (X − µX )(Y − µY ) − E (X − µX )µY − E µX (Y − µY ) + . October 8.Ingmar Land. A small variance means that most of the probability mass is concentrated around the mean value. y) d x d y Remark: The covariance measures how “related” two RVs are.Y (x. Two independent RVs have covariance zero.
Ingmar Land. The proof uses the characteristic functions.Y (x.). October 8. This condition is equivalently to the condition that E g1 (X) · g2 (Y ) = E g1 (X) · E g2 (Y ) for all (!) functions g1 (. 2005 3 • Conditional expectations: E g(X)Y = y := Law of total expectation: E E g(X)Y = = = = E g(X)Y = y pY (y) d y g(x)pXY (xy) d xpY (y) d y g(x) pXY (xy)pY (y) d y d x g(x)pXY (xy) d x g(x)pX (x) d x = E g(X) • Special conditional expectations: – Conditional mean: µXY =y := E XY = y = – Conditional variance: ΣXXY =y := E (X − µX )2 Y = y = – Relation: 2 E X 2 Y = y = ΣXXY =y + µXY =y xpXY (xy) d x (x − µX )2 pXY (xy) d x • Sum of two random variables: Let Z := X + Y .) and g2 (. . • The RVs are called independent if pX. y) = pX (x) · pY (y). then pZ (z) = pX (z) ∗ pY (z) where ∗ denotes convolution.
• Some special expected values: – Mean vector (vector of mean values): µX := E X = E E[X1 ] X1 µ = = X1 X2 E[X2 ] µ X2 . they are also uncorrelated. 2005 4 • The RVs are called uncorrelated if σXY ≡ ΣXY = E (X − µX )(Y − µY ) = 0. but they turn out to be quite handy and convenient (once you got used to). X2 Y = Y1 Y2 with the individual probability distributions pX (x) and pY (y).Y (x. 2 Vectorvalued Random Variables Consider two realvalued vectorvalued random variables (RV) X= X1 . Some authors use f () instead of p(). Y = E[XY ] and norm X = E[X 2 ]. form a vector space with scalar product X. (For discrete RVs. • Two RVs X and Y are called orthogonal if E[XY ] = 0. Bayes’ rule. The reverse holds only for Gaussian RVs (see below). (The following considerations can be generalized to longer vectors. and the joint distribution pX. • Marginal distributions.Ingmar Land.) The probability distributions are probability mass functions (pmf) if the random variables take discrete values. the integrals have simply to be replaced by sums.) Remark: The following matrix notations may seem to be cumbersome at the ﬁrst glance. Remark: The RVs with ﬁnite energy. In the following. of course. (This is used in MMSE estimation. E[X 2 ] < ∞. October 8. Remark: If RVs are independent. conditional distributions. y). and they are probability density functions (pmf) if the random variables are continuous. expected values work as in the scalar case. the RVs are assumed to be continuous.) These relations for scalarvalued RVs are generalized to vectorvalued RVs in the following. especially for continuous RVs.
Ingmar Land. 2π . with pX (x) = The often used symbolic notation 1 2 2πσX · exp − (x − µX )2 2 2σX 2 X ∼ N (µX . i.e. – Relations: E XX T = ΣXX + µX µT X E XY T = ΣXY + µX µT Y Remark: This result is not too surprising when you know the result for the scalar case. 2005 5 – Covariance matrix (autocovariance matrix): ΣXX := E (X − µX )(X − µX )T =E = = X1 − µ 1 X2 − µ 2 X1 − µ 1 X2 − µ 2 E (X1 − µ1 )(X2 − µ2 ) E (X2 − µ2 )(X2 − µ2 ) Σ X1 X1 Σ X1 X2 Σ X2 X1 Σ X2 X2 E (X1 − µ1 )(X1 − µ1 ) E (X2 − µ2 )(X1 − µ1 ) – Covariance matrix (crosscovariance matrix): ΣXY := E (X − µX )(Y − µY )T = Σ X1 Y 1 Σ X1 Y 2 Σ X2 Y 1 Σ X2 Y 2 Remark: This matrix contains the covariance of each element of the ﬁrst vector with each element of the second vector.. October 8. σX ) 2 may be read as: X is (distributed) Gaussian with mean µX and variance σX . 3 Gaussian Random Variables 2 • A Gaussian RV X with mean µX and variance σX is a continuous random variable with a Gaussian pdf. • A Gaussian distribution with mean zero and variance one is called a normal distribution: x2 1 p(x) = √ e− 2 .
(But this can be proved. October 8.Ingmar Land. of course.) The marginal distributions and the conditional distributions of a Gaussian vector are again Gaussian distributions. • Gaussian RVs are completely described by mean and covariance. The integral of any Gaussian pdf can also be expressed using the Qfunction. X2 µX = µ X1 . of course. this is just a deﬁnition. This holds only for Gaussian RVs. So. The integral of a normal pdf cannot be solved in closed form and is therefore often expressed using the Qfunction 1 Q(z) := √ 2π ∞ z e− 2 d x x2 Notice the limits of the integral. Therefore. if they are uncorrelated. they are also independent. µ X2 ΣXX = Σ X 1 X 1 Σ X 1 X2 . 2005 6 Some authors use the term “normal distribution” equivalently with “Gaussian distribution”.) . • A vectorvalued RV X with mean µX and covariance matrix ΣXX . Σ X 2 X 1 Σ X 2 X2 is called Gaussian if its components have a jointly Gaussian pdf pX (x) = √ 1 2π ΣXX 2 1/2 · exp − 1 (X − µX )T ΣXX (X − µX ) . ΣXX ) This can be generalized to longer vectors. X= X1 . use the term “normal distribution” with caution.) The corresponding symbolic notation is X ∼ N (µX . 2 (There is nothing to understand. (Cf. above.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.