This action might not be possible to undo. Are you sure you want to continue?
https://www.scribd.com/doc/38962852/Probability
10/08/2010
text
original
Cheat Sheet for Probability Theory
Ingmar Land
1 Scalarvalued Random Variables
Consider two realvalued random variables (RV) X and Y with the individual probabil
ity distributions p
X
(x) and p
Y
(y), and the joint distribution p
X,Y
(x, y). The probability
distributions are probability mass functions (pmf) if the random variables take discrete
values, and they are probability density functions (ptf) if the random variables are con
tinuous. Some authors use f() instead of p(), especially for continuous RVs.
In the following, the RVs are assumed to be continuous. (For discrete RVs, the integrals
have simply to be replaced by sums.)
• Marginal distributions:
p
X
(x) =
p
X,Y
(x, y) d y p
Y
(y) =
p
X,Y
(x, y) d x
• Conditional distributions:
p
XY
(xy) =
p
X,Y
(x, y)
p
Y
(y)
p
Y X
(yx) =
p
X,Y
(x, y)
p
X
(x)
for p
X
(x) = 0 and p
Y
(y) = 0
• Bayes’ rule:
p
XY
(xy) =
p
X,Y
(x, y)
p
X,Y
(x
, y) d x
p
Y X
(yx) =
p
X,Y
(x, y)
p
X,Y
(x
, y) d y
• Expected values (expectations):
E
g
1
(X)
:=
g
1
(x) p
X
(x) d x
E
g
2
(Y )
:=
g
2
(y) p
Y
(y) d y
E
g
3
(X, Y )
:=
g
3
(x, y) p
X,Y
(x, y) d xd y
for any functions g
1
(.), g
2
(.), g
3
(., .)
Ingmar Land, October 8, 2005 2
• Some special expected values:
– Means (mean values):
µ
X
:= E
X
=
x p
X
(x) d x µ
Y
:= E
Y
=
y p
Y
(y) d y
– Variances:
σ
2
X
≡ Σ
XX
:= E
(X − µ
X
)
2
=
(x − µ
X
)
2
p
X
(x) d x
σ
2
Y
≡ Σ
Y Y
:= E
(Y − µ
Y
)
2
=
(y − µ
Y
)
2
p
Y
(y) d y
Remark: The variance measures the “width” of a distribution. A small variance
means that most of the probability mass is concentrated around the mean value.
– Covariance:
σ
XY
≡ Σ
XY
:= E
(X − µ
X
)(Y − µ
Y
)
=
(x − µ
X
)(y − µ
Y
) p
X,Y
(x, y) d xd y
Remark: The covariance measures how “related” two RVs are. Two indepen
dent RVs have covariance zero.
– Correlation coeﬃcient:
ρ
XY
:=
σ
XY
σ
X
σ
Y
– Relations:
E
X
2
= Σ
XX
+ µ
2
X
E
Y
2
= Σ
Y Y
+ µ
2
Y
E
X · Y
= Σ
XY
+ µ
X
· µ
Y
– Proof of last relation:
E
XY
= E
((X − µ
X
) + µ
X
)((Y − µ
Y
) + µ
Y
)
= E
(X − µ
X
)(Y − µ
Y
)
− E
(X − µ
X
)µ
Y
− E
µ
X
(Y − µ
Y
)
+
+ E
µ
X
µ
Y
= Σ
XY
− (E[X] − µ
X
)µ
Y
− (E[Y ] − µ
Y
)µ
X
+ µ
X
µ
Y
= Σ
XY
+ µ
X
µ
Y
This method of proof is typical.
Ingmar Land, October 8, 2005 3
• Conditional expectations:
E
g(X)Y = y
:=
g(x)p
XY
(xy) d x
Law of total expectation:
E
E
g(X)Y
=
E
g(X)Y = y
p
Y
(y) d y
=
g(x)p
XY
(xy) d xp
Y
(y) d y
=
g(x)
p
XY
(xy)p
Y
(y) d y d x
=
g(x)p
X
(x) d x
= E
g(X)
• Special conditional expectations:
– Conditional mean:
µ
XY =y
:= E
XY = y
=
xp
XY
(xy) d x
– Conditional variance:
Σ
XXY =y
:= E
(X − µ
X
)
2
Y = y
=
(x − µ
X
)
2
p
XY
(xy) d x
– Relation:
E
X
2
Y = y
= Σ
XXY =y
+ µ
2
XY =y
• Sum of two random variables: Let Z := X + Y ; then
p
Z
(z) = p
X
(z) ∗ p
Y
(z)
where ∗ denotes convolution. The proof uses the characteristic functions.
• The RVs are called independent if
p
X,Y
(x, y) = p
X
(x) · p
Y
(y).
This condition is equivalently to the condition that
E
g
1
(X) · g
2
(Y )
= E
g
1
(X)
· E
g
2
(Y )
for all (!) functions g
1
(.) and g
2
(.).
Ingmar Land, October 8, 2005 4
• The RVs are called uncorrelated if
σ
XY
≡ Σ
XY
= E
(X − µ
X
)(Y − µ
Y
)
= 0.
Remark: If RVs are independent, they are also uncorrelated. The reverse holds only
for Gaussian RVs (see below).
• Two RVs X and Y are called orthogonal if E[XY ] = 0.
Remark: The RVs with ﬁnite energy, E[X
2
] < ∞, form a vector space with scalar
product X, Y = E[XY ] and norm X =
E[X
2
]. (This is used in MMSE
estimation.)
These relations for scalarvalued RVs are generalized to vectorvalued RVs in the
following.
2 Vectorvalued Random Variables
Consider two realvalued vectorvalued random variables (RV)
X =
¸
X
1
X
2
, Y =
¸
Y
1
Y
2
with the individual probability distributions p
X
(x) and p
Y
(y), and the joint distribution
p
X,Y
(x, y). (The following considerations can be generalized to longer vectors, of course.)
The probability distributions are probability mass functions (pmf) if the random vari
ables take discrete values, and they are probability density functions (pmf) if the random
variables are continuous. Some authors use f() instead of p(), especially for continuous
RVs.
In the following, the RVs are assumed to be continuous. (For discrete RVs, the integrals
have simply to be replaced by sums.)
Remark: The following matrix notations may seem to be cumbersome at the ﬁrst
glance, but they turn out to be quite handy and convenient (once you got used to).
• Marginal distributions, conditional distributions, Bayes’ rule, expected values work
as in the scalar case.
• Some special expected values:
– Mean vector (vector of mean values):
µ
X
:= E
X
= E
¸
X
1
X
2
=
¸
E[X
1
]
E[X
2
]
=
¸
µ
X
1
µ
X
2
Ingmar Land, October 8, 2005 5
– Covariance matrix (autocovariance matrix):
Σ
XX
:= E
(X − µ
X
)(X − µ
X
)
T
= E
¸¸
X
1
− µ
1
X
2
− µ
2
X
1
− µ
1
X
2
− µ
2
=
¸
E
(X
1
− µ
1
)(X
1
− µ
1
)
E
(X
1
− µ
1
)(X
2
− µ
2
)
E
(X
2
− µ
2
)(X
1
− µ
1
)
E
(X
2
− µ
2
)(X
2
− µ
2
)
=
¸
Σ
X
1
X
1
Σ
X
1
X
2
Σ
X
2
X
1
Σ
X
2
X
2
– Covariance matrix (crosscovariance matrix):
Σ
XY
:= E
(X − µ
X
)(Y − µ
Y
)
T
=
¸
Σ
X
1
Y
1
Σ
X
1
Y
2
Σ
X
2
Y
1
Σ
X
2
Y
2
Remark: This matrix contains the covariance of each element of the ﬁrst vector
with each element of the second vector.
– Relations:
E
XX
T
= Σ
XX
+ µ
X
µ
T
X
E
XY
T
= Σ
XY
+ µ
X
µ
T
Y
Remark: This result is not too surprising when you know the result for the
scalar case.
3 Gaussian Random Variables
• A Gaussian RV X with mean µ
X
and variance σ
2
X
is a continuous random variable
with a Gaussian pdf, i.e., with
p
X
(x) =
1
2πσ
2
X
· exp
−
(x − µ
X
)
2
2σ
2
X
The often used symbolic notation
X ∼ N(µ
X
, σ
2
X
)
may be read as: X is (distributed) Gaussian with mean µ
X
and variance σ
2
X
.
• A Gaussian distribution with mean zero and variance one is called a normal distri
bution:
p(x) =
1
√
2π
e
−
x
2
2
.
Ingmar Land, October 8, 2005 6
Some authors use the term “normal distribution” equivalently with “Gaussian dis
tribution”. So, use the term “normal distribution” with caution.
The integral of a normal pdf cannot be solved in closed form and is therefore often
expressed using the Qfunction
Q(z) :=
1
√
2π
∞
z
e
−
x
2
2
d x
Notice the limits of the integral. The integral of any Gaussian pdf can also be
expressed using the Qfunction, of course.
• A vectorvalued RV X with mean µ
X
and covariance matrix Σ
XX
,
X =
¸
X
1
X
2
, µ
X
=
¸
µ
X
1
µ
X
2
, Σ
XX
=
¸
Σ
X
1
X
1
Σ
X
1
X
2
Σ
X
2
X
1
Σ
X
2
X
2
,
is called Gaussian if its components have a jointly Gaussian pdf
p
X
(x) =
1
√
2π
2
Σ
XX

1/2
· exp
−
1
2
(X − µ
X
)
T
Σ
XX
(X − µ
X
)
.
(There is nothing to understand, this is just a deﬁnition.) The marginal distri
butions and the conditional distributions of a Gaussian vector are again Gaussian
distributions. (But this can be proved.)
The corresponding symbolic notation is
X ∼ N(µ
X
, Σ
XX
)
This can be generalized to longer vectors, of course.
• Gaussian RVs are completely described by mean and covariance. Therefore, if they
are uncorrelated, they are also independent. This holds only for Gaussian RVs. (Cf.
above.)
= E (X − µX )(Y − µY ) − E (X − µX )µY − E µX (Y − µY ) + . – Covariance: σXY ≡ ΣXY := E (X − µX )(Y − µY ) = (x − µX )(y − µY ) pX. Two independent RVs have covariance zero. y) d x d y Remark: The covariance measures how “related” two RVs are.Y (x. 2005 2 • Some special expected values: – Means (mean values): µX := E X = – Variances: 2 σX ≡ ΣXX := E (X − µX )2 = 2 σY ≡ ΣY Y := E (Y − µY )2 = x pX (x) d x µY := E Y = y pY (y) d y (x − µX )2 pX (x) d x (y − µY )2 pY (y) d y Remark: The variance measures the “width” of a distribution.Ingmar Land. October 8. A small variance means that most of the probability mass is concentrated around the mean value. – Correlation coeﬃcient: ρXY := – Relations: E X 2 = ΣXX + µ2 X E Y 2 = Σ Y Y + µ2 Y E X · Y = ΣXY + µX · µY – Proof of last relation: E XY = E ((X − µX ) + µX )((Y − µY ) + µY ) σXY σX σY + E µX µY = ΣXY − (E[X] − µX )µY − (E[Y ] − µY )µX + µX µY = ΣXY + µX µY This method of proof is typical.
October 8. y) = pX (x) · pY (y).Y (x. • The RVs are called independent if pX. 2005 3 • Conditional expectations: E g(X)Y = y := Law of total expectation: E E g(X)Y = = = = E g(X)Y = y pY (y) d y g(x)pXY (xy) d xpY (y) d y g(x) pXY (xy)pY (y) d y d x g(x)pXY (xy) d x g(x)pX (x) d x = E g(X) • Special conditional expectations: – Conditional mean: µXY =y := E XY = y = – Conditional variance: ΣXXY =y := E (X − µX )2 Y = y = – Relation: 2 E X 2 Y = y = ΣXXY =y + µXY =y xpXY (xy) d x (x − µX )2 pXY (xy) d x • Sum of two random variables: Let Z := X + Y .).Ingmar Land. then pZ (z) = pX (z) ∗ pY (z) where ∗ denotes convolution. This condition is equivalently to the condition that E g1 (X) · g2 (Y ) = E g1 (X) · E g2 (Y ) for all (!) functions g1 (. The proof uses the characteristic functions. .) and g2 (.
conditional distributions.Y (x.) The probability distributions are probability mass functions (pmf) if the random variables take discrete values. and they are probability density functions (pmf) if the random variables are continuous. expected values work as in the scalar case. Remark: If RVs are independent. Bayes’ rule. (The following considerations can be generalized to longer vectors. E[X 2 ] < ∞. X2 Y = Y1 Y2 with the individual probability distributions pX (x) and pY (y). October 8. the RVs are assumed to be continuous. (For discrete RVs. (This is used in MMSE estimation. and the joint distribution pX. In the following. • Two RVs X and Y are called orthogonal if E[XY ] = 0. Remark: The RVs with ﬁnite energy.) These relations for scalarvalued RVs are generalized to vectorvalued RVs in the following. The reverse holds only for Gaussian RVs (see below). • Marginal distributions. Y = E[XY ] and norm X = E[X 2 ]. • Some special expected values: – Mean vector (vector of mean values): µX := E X = E E[X1 ] X1 µ = = X1 X2 E[X2 ] µ X2 . form a vector space with scalar product X. Some authors use f () instead of p().Ingmar Land. 2005 4 • The RVs are called uncorrelated if σXY ≡ ΣXY = E (X − µX )(Y − µY ) = 0. especially for continuous RVs. they are also uncorrelated. y). 2 Vectorvalued Random Variables Consider two realvalued vectorvalued random variables (RV) X= X1 . of course. the integrals have simply to be replaced by sums.) Remark: The following matrix notations may seem to be cumbersome at the ﬁrst glance. but they turn out to be quite handy and convenient (once you got used to).
i. 2π . with pX (x) = The often used symbolic notation 1 2 2πσX · exp − (x − µX )2 2 2σX 2 X ∼ N (µX .e. October 8.Ingmar Land. 2005 5 – Covariance matrix (autocovariance matrix): ΣXX := E (X − µX )(X − µX )T =E = = X1 − µ 1 X2 − µ 2 X1 − µ 1 X2 − µ 2 E (X1 − µ1 )(X2 − µ2 ) E (X2 − µ2 )(X2 − µ2 ) Σ X1 X1 Σ X1 X2 Σ X2 X1 Σ X2 X2 E (X1 − µ1 )(X1 − µ1 ) E (X2 − µ2 )(X1 − µ1 ) – Covariance matrix (crosscovariance matrix): ΣXY := E (X − µX )(Y − µY )T = Σ X1 Y 1 Σ X1 Y 2 Σ X2 Y 1 Σ X2 Y 2 Remark: This matrix contains the covariance of each element of the ﬁrst vector with each element of the second vector. σX ) 2 may be read as: X is (distributed) Gaussian with mean µX and variance σX . • A Gaussian distribution with mean zero and variance one is called a normal distribution: x2 1 p(x) = √ e− 2 . – Relations: E XX T = ΣXX + µX µT X E XY T = ΣXY + µX µT Y Remark: This result is not too surprising when you know the result for the scalar case.. 3 Gaussian Random Variables 2 • A Gaussian RV X with mean µX and variance σX is a continuous random variable with a Gaussian pdf.
they are also independent.Ingmar Land. use the term “normal distribution” with caution. this is just a deﬁnition.) . October 8. • A vectorvalued RV X with mean µX and covariance matrix ΣXX . 2005 6 Some authors use the term “normal distribution” equivalently with “Gaussian distribution”. of course. • Gaussian RVs are completely described by mean and covariance. So. if they are uncorrelated. Σ X 2 X 1 Σ X 2 X2 is called Gaussian if its components have a jointly Gaussian pdf pX (x) = √ 1 2π ΣXX 2 1/2 · exp − 1 (X − µX )T ΣXX (X − µX ) .) The corresponding symbolic notation is X ∼ N (µX .) The marginal distributions and the conditional distributions of a Gaussian vector are again Gaussian distributions. (But this can be proved. 2 (There is nothing to understand. This holds only for Gaussian RVs. X2 µX = µ X1 . µ X2 ΣXX = Σ X 1 X 1 Σ X 1 X2 . above. The integral of a normal pdf cannot be solved in closed form and is therefore often expressed using the Qfunction 1 Q(z) := √ 2π ∞ z e− 2 d x x2 Notice the limits of the integral. X= X1 . ΣXX ) This can be generalized to longer vectors. of course. The integral of any Gaussian pdf can also be expressed using the Qfunction. (Cf. Therefore.
This action might not be possible to undo. Are you sure you want to continue?