Professional Documents
Culture Documents
Introduction
These notes reflect what can be found in Chapter 2 of the textbook.
We are going to cover these topics during the period from August 24 to 31.
3. For any finite or countable collection, {Ak : k = 1, 2, . . . , n, . . . }, the union is also an event:
[
A= Ak ∈ A
k
1.2 Probability
The probability is assigned to events A ∈ A in such a way that the following conditions are satisfied.
2. P[Ω] = 1.
{Ak : k = 1, 2, . . . , n, . . .}
1
the probability of their union equals to the sum of probabilities:
[ X
P[ Ak ] = P[Ak ].
k k
B = ∅, then
T
In particular, if two events, A and B are disjoint, which means that A
[
P[A B] = P[A] + P[B].
P[A|B] = P[A|B 0 ],
which means that conditionally, whether B occurs or does not, the probability of A does not depend on
it. Using the definition of conditional probability, this can be rephrased in a symmetric form:
\
P[A B] = P[A] · P[B],
2
1.4 Partitions and Bayes Theorems
A finite or countable collection of pairwise disjoint events,
{Bk : k = 1, 2, . . . , n, . . .}
Using the rules of probability and definitions presented above, conclude that for any event A, the following
statement is valid.
Theorem. X
P[A] = P[Bk ] · P[A|Bk ] (2)
k
Probabilities assigned to events Bk in the partition are called prior. The other theorem discovered by
Bayes allows to introduce the posterior probabilities as follows.
Theorem. For any event A and for any partition {Bk : k = 1, 2, . . .}, the conditional probabilities of
events in the partition can be evaluated as
The probabilities described in this formula are posterior, so given that event A occurred, we recalculate
probabilities of the partition according to the observed event.
3
2 Joint, Marginal, and Conditional Distributions
Consider a pair of two random variables, (X, Y ). Their joint distribution is explored in this section.
1. [f X,Y (x, y) ≥ 0 and f X,Y (x, y) > 0 at a finite or countable set of points on the plane.
f X,Y (x, y) = 1.
P P
2. x y
Notice that this double integral formally is taken over the entire plane, R2 .
4
Using (2), it is possible to introduce conditional distributions as follows:
f X,Y (x, y)
f X|Y (x|y) = P[X = x|Y = y] = ,
f Y (y)
provided the denominator is positive. If f Y (y) = 0, we set conditional probability function as zero.
Similarly, under the same convention about zero denominator,
f X,Y (x, y)
f Y |X (y|x) = P[Y = y|X = x] = .
f X (x)
The conditional densities step-by-step follow the same scenario, hence the distribution of X, given Y = y
is described by the formula:
Regardless of the distribution type, the rule ”marginal times conditional” is valid:
Remark Conditional distributions, discrete or continuous, will be used when we want to proceed with
conditional expectations. Formally, equations (4) and (5) look alike, yet they apply to different types of
variables.
5
3 Mixed Case
A mixed situation occurs when two components, now denoted as (X, Y = N ) are such that X is continuous
and N is discrete. Assume that Y = N takes integer values.
The joint distribution of (X, N ) can be described in terms of the probability function, f X,N (x, n) ≥ 0
that combines properties of density and mass function as follows.
f X,N (x, n)
f X|N (x|n) = ,
f N (n)
for x ∈ R. Notice that in this case, the given value N = n, can be viewed as a parameter of this
distribution.
f X,N (x, n)
f N |X (n|x) = ,
f N (n)
for integer n, while the given value of X = x is viewed as a parameter of this distribution.
Similar to (4) and (5), restoring the initial notation, Y = N, we conclude that
6
Remarks
• Equations (4), (5) and the just mentioned (6) all look similar to each other, despite the differences
in types of random variables involved.
4 Conditional Expectations
When variables, (X, Y ), are considered as a single case, their joint distribution can be either determined
explicitly, via f X,Y (x, y), or implicitly, using marginal for one and conditional for other. In all these
situations, equations (4), (5), and (6) help us proceed with conditional and marginal expectations.
which will help establish the important general double expectation rule. Conditioning on X results in
y · f X,Y (x, y)
P
Y |X y
X
E[Y |X = x] = y·f (y|x) = P X,Y (x, y)
.
y y f
In other words, given X = x, the conditional expectation of T coincides with that for T (x, Y ) with respect
to the distribution of (Y |X = x). In particular, conditional variance of (Y |X = x) is
7
4.2 Purely Continuous Case
Again, (X, Y ) are of the same nature and considerations are presented for conditioning on X only. Let
f (x, y) denote the joint density function. Recall the identity (5), so the conditional expectation of (Y |X =
x) is R∞
y · f X,Y (x, y) dy
Z ∞
E[Y |X = x] = y·f Y |X
(y|x) dy = −∞
R∞
X,Y (x, y) dy
.
−∞ −∞ f
Similarly, if T = T (X, Y ) is a transformed variable, its conditional expectation is
Z ∞ R∞
Y |X −∞ T (x, y) · f (x, y) dy
E[Y |X = x] = T (x, y) · f (y|x) dy = R∞ .
−∞ −∞ f (x, y) dy
Again, the interpretation is similar to the purely discrete case and leads to
E[T (X, Y )|X = x] = E[T (x, Y )|X = x]. (8)
As before, conditional variance of (Y |X = x) is
Var [Y |X = x] = E [Y 2 |X = x] − (E [Y |X = x])2 ,
4.3.1 Conditioning on N
Given N = n, the distribution of (X|N ) has the density,
f X,N (x, n)
f X|N (x|n) = ,
f N (n)
which was mentioned as (6) Thus given N = n, the (conditional) expected value of X is
Z ∞ R∞ X,N (x, n) dx
X|N −∞ x · f
E[X|N = n] = x·f (x|n) dx = R∞
X,N (x, n) dx
.
−∞ −∞ f
8
4.3.2 Conditioning on X
In this case, the conditional distribution of (N |X = x) is needed, so the evaluation of conditional expecta-
tion is performed in terms of
f X,N (x, n)
f N |X (n|x) = P[N = n|X = x] =
f X (x)
T (x, n) · p(x, n)
P
n
E[T |X = x] = P ,
n p(x, n)
9
5 Examples for Self-Studies
5.1 Poisson and Conditional Binomial
A discrete random variable (N ) is Poisson distributed with intensity (or rate) parameter λ > 0, therefore,
λn −λ
f N (n) = P[N = n] = ·e ,
n!
where n = 0, 1, . . . , . Given N = n, a random variable X|N = n has the binomial distribution, Bin[n, q],
where parameter q > 0 is specified. Assume that when N = 0, the value of X = 0 with probability 1. Also
notice that X can take values 0 ≤ x ≤ n.
Marginal Distribution of X. To determine P[X = x] one needs to evaluate the following sum:
X n!
P[X = x] = E[P[X = x|N = n]] = · q x · (1 − q)n−x .
n≥x
x! (n − x)!
Since sum is taken over n ≥ x, substitute the summation variable by m = n − x ≥ 0. After carrying
out common factors that do not depend on n, this equation results in
q x −λ X 1
P[X = x] = ·e · · (1 − q)m ,
x! m≥0
m!
Conditional Distribution of (N |X). Notice that the joint p.m.f. of the pair (X, N ) is the product of
marginal function f N (n) and conditional p.m.f. f X|N (x|n), which after simplifications results in
λn −λ n!
p(x, n) = f N (n) · f X|N (x|n) = ·e · · q x · (1 − q)n−x ,
n! x! · (n − x)!
where integer values are: n ≥ 0 and 0 ≤ x ≤ n. Given X = x, use the same observation that N can
vary from x to ∞, and therefore, its conditional distribution is characterized as follows:
valid for n ≥ x. This can be interpreted in terms of the variable, W = N − X, which conditionally,
given X = x, has the Poisson distribution with the rate equal to λ (1 − q) .
This can be interpreted by saying that (Y = N − X|X = x) does not depend on X.
10
Conditional Expectation and Variance of (N |X = x) First notice that conditioning on N results in
because (X|N = n) has the binomial distribution, Bin[n, q]. However, as was shown before, (W =
N − X|X = x) has the Poisson distribution, with rate equal to λ · (1 − q), which allows us to
immediately find the conditional expectation as follows:
The conditional variance of (N − x)|X = x equals to its mean, which is λ(1 − q). Finally, the
conditional variance of (N |X = x) is the same, that is λ(1 − q).
Joint Distribution of (X, Z). Notice that Z = X + Y ≥ X due to the fact that Y ≥ 0. Therefore the
joint p.m.f. of (X, Z) is positive only for z ≥ x. Also,
\ \
P[(X = x) (Z = z ≥ x)] = P[(X = x) (Y = z − x)] = P[X = x] · P[Y = z − x],
λx −λ µz−x
f X,Z (x, z) = [ ·e ]·[ ] · e−µ .
x! (z − x)!
Conditional Distribution of (X|Z). Division of the joint p.m.f by the p.m.f of Z results in
x z−x
z! λ µ
f X|Z (x|z) = · · ,
x! (z − x)! λ+µ λ+µ
which means that conditionally (X|Z = z) has the binomial distribution, Bin[z, q] with
λ
q= .
λ+µ
.
11
Conditional Covariance between (X, Y ) Recall that the covariance is defined as
Conditioning on Z = z can be handled similarly, so since (X|Z = z) has the binomial distribution, its
conditional expectation is z · q and similarly, E[Y |Z = z] = z · (1 − q). To find conditional expectation
of the product (X · Y |Z = z), notice that given (Z = z), it is the same as for the product, X(z − X).
Thus obtain:
E[X · Y |Z = z] = z 2 · q(1 − q) − z · q(1 − q) = z(z − 1)q(1 − q).
Finally, the covariance is
Alternatively, notice that Var [Z|Z = z] = 0 and using the identity valid for the variance of a sum,
conclude that
1
Cov [(X, Y )|Z = z] = − · (Var [X|Z = z] + Var [Y |Z = z]) = −zq(1 − q)
2
5.3 Gamma-Distributions
Assume first that (X, Y ) are independent with densities
1 1
f X (x) = · xa−1 · exp(−x) and f Y (y) = · y b−1 · e−y ,
Γ(a) Γ(b)
which means that X ∼ Gamma [a, 1] and Y ∼ Gamma [b, 1], where a > 0 and b > 0 are given shape
parameters.
Disribution of the Sum Since (X, Y ) are independent, their joint density is simply a product of marginal
ones:
1
f X,Y (x, y) = · xa−1 · y b−1 · exp(−(x + y))
Γ(a) · Γ(b)
for any (x > 0), (y > 0), and zero elsewhere. The density of Z was derived in Probability Theory
course and led to Z ∼ Gamma [a + b, 1], or
1
f Z (z) = · z a+b−1 · exp(−z).
Γ(a + b)
For those willing to validate it, use the substitution
x y
u= , 1−u=
z z
and integrate the joint density f X,Y (x, y) in x from 0 to z = x + y > 0.
12
Conditional Distribution of (X|Z = z) The distribution of (X|Z = z) is obtained as the ratio,
f X,Y (x, y)
f X|Z (x|z) =
f Z (z)
Γ(a + b)
f T |Z (t|z) = · ta−1 · (1 − t)b−1
Γ(a) · Γ(b)
for 0 < t < 1. Since it does not depend on z, variables T and Z are independent.
Conditional Distribution of (Z|X = x) (Z − X|X = x) has the same distribution as (Z − x|X = x),
which is Gamma [b, 1].
X1 = λ · X and Y1 = λ · Y,
are still independent each of them having the scale parameter equal to 1. Therefore, Z = X + Y ∼
Gamma [a + b, λ]. Also
X X1
T = = ∼ Beta [a, b]
X +Y X1 + Y1
and is independent of Z.
13
5.5 Two Exponentially Distributed Independent Variables
If X and Y are independent with a common exponential density,
then referring to the previous case, the following conclusions can be made.
1. Z = X + Y ∼ Gamma [2, λ], having the density
f Z (z) = λ2 · z · exp(−λ · z)
Conditional Distribution of (X|N = n) Conditionally, given N = n, the variable X has the p.d.f.
equal to the ratio,
n+1
1 xn −x 1 xn
f X|N (x|n) = ·[ · e ] · λ e−λx = · ,
P[N = n] n! λ+1 n!
which is Gamma[n + 1, (λ + 1)].
14
Expectation and Variance Conditional expectation of (X|N = n) is
n+1
E[X|N = n] = .
λ+1
Conditional Variance of (X|N = n) is
(n + 1)
Var[X|N = n] = .
(λ + 1)2
Marginal Distribution of N Since Q is uniformly distributed over the unit interval, its density is
f Q (q) = 1 for 0 < q < 1 and zero elsewhere. So the marginal p.m.f. for N is
Z 1
N
f (n) = P[N = n] = (1 − q)n · q dq.
0
Expectation and Variance Conditional expectation and variance of (Q|N = n) can be found directly
by using the facts about Beta distribution. Thus the expectation is:
Z 1
2 2
E[Q|N = n] = (n + 1)(n + 2) q 2 (1 − q)n dq = = .
0 2+n+1 n+3
The variance can be found as
2 (n + 1)
Var[Q|N = n] =
(n + 3)2 (n + 4)
15
Random Sums
By now, we can slightly simplify our common notation. Notice that when two random variables, (X, Y )
are considered, the conditional distribution and moments for T = T (X, Y ), given (X = x) or (Y = y) are
simply transformations of the variable that defines a condition. A common situation arises when variables
N and {Xk : k ≥ 1} are independent and X-values share the same distribution with finite first and
second moments.
Assume that
µ = E [Xk ] and σ 2 = Var [Xk ]
are finite. Also make a similar assumption about moments of N , so that
ν = E [N ] and τ 2 = Var [N ]
so its randomness is due to the two possible reasons, those in X-values and in the value of N. When N = 0
with a positive probability, set S = 0 by default.
E [S|N = n] = n · µ
E [S|N ] = N · µ (11)
16
5.9.2 Variance of a Random Sum
In Probability Theory course, for any two variables, say (T, W ), the fundamental identity was established:
17
5.10 Exercises
5.10.1 Random Poisson Sum
In addition to common assumptions, suppose that N ∼ Poisson [λ] and consider
N
X
S= Xk
k=1
E [S] = µ · λ and
18