Professional Documents
Culture Documents
Introduction v
1 Basic Concepts 1
1.1 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Classical Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Combinatorial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Random Variables 11
3.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Indicator Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Inclusion - Exclusion Formula . . . . . . . . . . . . . . . . . . . . . 18
3.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Inequalities 23
4.1 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Markov’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Chebyshev’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Generating Functions 31
5.1 Combinatorial Applications . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Properties of Conditional Expectation . . . . . . . . . . . . . . . . . 35
5.4 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iii
iv CONTENTS
These notes are based on the course “Probability” given by Prof. F.P. Kelly in Cam-
bridge in the Lent Term 1996. This typed version of the notes is totally unconnected
with Prof. Kelly.
Other sets of notes are available for different courses. At the time of typing these
courses were:
Probability Discrete Mathematics
Analysis Further Analysis
Methods Quantum Mechanics
Fluid Dynamics 1 Quadratic Mathematics
Geometry Dynamics of D.E.’s
Foundations of QM Electrodynamics
Methods of Math. Phys Fluid Dynamics 2
Waves (etc.) Statistical Physics
General Relativity Dynamical Systems
Combinatorics Bifurcations in Nonlinear Convection
v
vi INTRODUCTION
Chapter 1
Basic Concepts
|A|
P(A) =
|Ω|
Example. Choose r digits from a table of random numbers. Find the probability that
for 0 ≤ k ≤ 9,
1. no digit exceeds k,
Ak = {(a1 , . . . , ar ) : 0 ≤ ai ≤ k, i = 1 . . . r} .
r
Now |Ak | = (k + 1)r , so that P(Ak ) = k+110 .
Let Bk be the event that k is the greatest digit drawn. Then Bk = Ak \ Ak−1 . Also
r
−kr
Ak−1 ⊂ Ak , so that |Bk | = (k + 1)r − k r . Thus P(Bk ) = (k+1)
10r
1
2 CHAPTER 1. BASIC CONCEPTS
1 1 1 1
2
− 3
≤ hn − hn+1 ≤ 2
+ 3.
12n 12n 12n 6n
1
For n ≥ 2, P 0 ≤ hn − hn+1 ≤ P n2 . Thus hn is a decreasing sequence, and 0 ≤
n ∞
h2 −hn+1 ≤ r=2 (hr −hr+1 ) ≤ 1 r12 . Therefore hn is bounded below, decreasing
so is convergent. Let the limit be A. We have obtained
n! ∼ eA nn+1/2 e−n .
R π/2
We need a trick to find A. Let Ir = 0 sinr θ dθ. We obtain the recurrence Ir =
r−1 (2n)! (2n n!)2
r Ir−2 by integrating by parts. Therefore I2n = (2n n!)2 π/2 and I2n+1 = (2n+1)! .
Now In is decreasing, so
I2n I2n−1 1
1≤ ≤ =1+ → 1.
I2n+1 I2n+1 2n
1 by 1
playing silly buggers with log 1 + n
4 CHAPTER 1. BASIC CONCEPTS
Chapter 2
1. 0 ≤ P(A) ≤ 1 for A ⊂ Ω,
2. P(Ω) = 1,
it is easy to see that this function satisfies the axioms. The numbers p1 , p2 , . . . are
called a probability distribution. If Ω is finite with n elements, and if p1 = p2 = · · · =
pn = n1 we recover the classical definition of probability.
Another example would be to let Ω = {0, 1, . . . } and attach to outcome r the
r
probability pr = e−λ λr! for some λ > 0. This is a distribution (as may be easily
verified), and is called the Poisson distribution with parameter λ.
1. P(Ac ) = 1 − P(A),
2. P(∅) = 0,
5
6 CHAPTER 2. THE AXIOMATIC APPROACH
Proof. Note that Ω = A∪Ac , and A∩Ac = ∅. Thus 1 = P(Ω) = P(A)+P(Ac ). Now
we can use this to obtain P(∅) = 1 − P(∅c ) = 0. If A ⊂ B, write B = A ∪ (B ∩ Ac ),
so that P(B) = P(A) + P(B ∩ Ac ) ≥ P(A). Finally, write A ∪ B = A ∪ (B ∩ Ac )
and B = (B ∩ A) ∪ (B ∩ Ac ). Then P(A ∪ B) = P(A) + P(B ∩ Ac ) and P(B) =
P(B ∩ A) + P(B ∩ Ac ), which gives the result.
Theorem 2.2 (Boole’s Inequality). For any A1 , A2 , · · · ⊂ Ω,
n
! n
[ X
P Ai ≤ P(Ai )
1 i
∞ ∞
!
[ X
P Ai ≤ P(Ai )
1 i
Si−1
and then inductively let Bi = Ai \
Proof. Let BS1 = A1 S 1 Bk . Thus the Bi ’s are
disjoint and i Bi = i Ai . Therefore
! !
[ [
P Ai = P Bi
i i
X
= P(Bi )
i
X
≤ P(Ai ) as Bi ⊂ Ai .
i
Proof. We know that P(A1 ∪ A2 ) = P(A1 ) + P(A2 ) − P(A1 ∩ A2 ). Thus the result
is true for n = 2. We also have that
P(A1 ∪ · · · ∪ An ) = P(A1 ∪ · · · ∪ An−1 ) + P(An ) − P((A1 ∪ · · · ∪ An−1 ) ∩ An ) .
But by distributivity, we have
n
! n−1
! n−1
!
[ [ [
P Ai = P Ai + P(An ) − P (Ai ∩ An ) .
i 1 1
Proof. The result is true for n = 2. If true for n − 1, then it is true for n and 1 ≤ r ≤
n − 1 by the inductive step above, which expresses a n-union in terms of two n − 1
unions. It is true for r = n by the inclusion-exclusion formula.
Example (Derangements). After a dinner, the n guests take coats at random from a
pile. Find the probability that at least one guest has the right coat.
1
Solution. Let AS
k be the event that guest k has his own coat.
n
We want P( i=1 Ai ). Now,
(n − r)!
P(Ai1 ∩ · · · ∩ Air ) = ,
n!
by counting the number of ways of matching guests and coats after i1 , . . . , ir have
taken theirs. Thus
X n (n − r)! 1
P(Ai1 ∩ · · · ∩ Air ) = = ,
i <···<i
r n! r!
1 r
2.2 Independence
Definition 2.1. Two events A and B are said to be independent if
Event Probability
18 1
A1 36 = 2
1
A2 As above, 2
6×3 1
A3 36 = 2
3×3 1
A1 ∩ A2 36 = 4
3×3 1
A1 ∩ A3 36 = 4
A1 ∩ A2 ∩ A3 0
which is why they are called “independent” experiments. The obvious generalisation
to n experiments can be made, but for an infinite sequence of experiments we mean a
sample space Ω1 × Ω2 × . . . satisfying the appropriate formula ∀n ∈ N.
You might like to find the probability that n independent tosses of a biased coin
with the probability of heads p results in a total of r heads.
2.3 Distributions
The binomial distribution with parameters n and p, 0 ≤ p ≤ 1 has Ω = {0, . . . , n} and
probabilities pi = ni pi (1 − p)n−i .
Proof.
n r n(n − 1) . . . (n − r + 1) r
p (1 − p)n−r = p (1 − p)n−r
r r!
nn−1 n − r + 1 (np)r
= ... (1 − p)n−r
n n n r!
r n −r
n − i + 1 λr
Y λ λ
= 1− 1−
i=1
n r! n n
λr
→1× × e−λ × 1
r!
λr
= e−λ .
r!
Proof.
X X
P(A|Bi ) P(Bi ) = P(A ∩ Bi )
!
[
=P A ∩ Bi
i
= P(A) , as required.
Example (Gambler’s Ruin). A fair coin is tossed repeatedly. At each toss a gambler
wins £1 if a head shows and loses £1 if tails. He continues playing until his capital
reaches m or he goes broke. Find px , the probability that he goes broke if his initial
capital is £x.
Solution. Let A be the event that he goes broke before reaching £m, and let H or
T be the outcome of the first toss. We condition on the first toss to get P(A) =
P(A|H) P(H) + P(A|T ) P(T ). But P(A|H) = px+1 and P(A|T ) = px−1 . Thus
we obtain the recurrence
px+1 − px = px − px−1 .
x
Note that px is linear in x, with p0 = 1, pm = 0. Thus px = 1 − m.
P(A|Bi ) P(Bi )
P(Bi |A) = P .
j P(A|Bj ) P(Bj )
Proof.
P(A ∩ Bi ) P(A|Bi ) P(Bi )
P(Bi |A) = =P ,
P(A) j P(A|Bj ) P(Bj )
Random Variables
Example. If Ω = {(i, j), 1 ≤ i, j ≤ t}, then we can define random variables X and
Y by X(i, j) = i + j and Y (i, j) = max{i, j}
Let RX be the image of Ω under X. When the range is finite or countable then the
random variable is said to be discrete.
P
We write P(X = xi ) for ω:X(ω)=xi pω , and for B ⊂ R
X
P(X ∈ B) = P(X = x) .
x∈B
Then
(P(X = x) , x ∈ RX )
3.1 Expectation
Definition 3.2. The expectation of a random variable X is the number
X
E[X] = pw X(ω)
ω∈Ω
11
12 CHAPTER 3. RANDOM VARIABLES
Note that
X
E[X] = pw X(ω)
ω∈Ω
X X
= pω X(ω)
x∈RX ω:X(ω)=x
X X
= x pω
x∈RX ω:X(ω)=x
X
= xP(X = x) .
x∈RX
X
xP(X = x) = ∞ and
x∈RX
x≥0
X
xP(X = x) = −∞
x∈RX
x<0
r
Example. If P(X = r) = e−λ λr! , then E[X] = λ.
Solution.
∞
r
X
E[X] = re−λ λr!
r=0
∞
X λr−1
= λe−λ = λe−λ eλ = λ
r=1
(r − 1)!
n
Example. If P(X = r) = r pr (1 − p)n−r then E[X] = np.
3.1. EXPECTATION 13
Solution.
n
X
r n−r n
E[X] = rp (1 − p)
r=0
r
n
X n!
= r pr (1 − p)n−r
r=0
r!(n − r)!
n
X (n − 1)!
=n pr (1 − p)n−r
r=1
(r − 1)!(n − r)!
n
X (n − 1)!
= np pr−1 (1 − p)n−r
r=1
(r − 1)!(n − r)!
n−1
X (n − 1)! r
= np p (1 − p)n−1−r
r=1
(r)!(n − r)!
n−1
X n − 1
= np pr (1 − p)n−1−r
r=1
r
= np
f (X)(w) = f (X(w)).
Example. If a, b and c are constants, then a + bX and (X − c)2 are random variables
defined by
Proof. 1. X ≥ 0 means Xw ≥ 0 ∀ w ∈ Ω
X
So E[X] = pω X(ω) ≥ 0
ω∈Ω
2. If ∃ω ∈ Ω with pω > 0 and X(ω) > 0 then E[X] > 0, therefore P(X = 0) = 1.
14 CHAPTER 3. RANDOM VARIABLES
3.
X
E[a + bX] = (a + bX(ω)) pω
ω∈Ω
X X
=a pω + b pω X(ω)
ω∈Ω ω∈Ω
= a + E[X] .
4. Trivial.
5. Now
Proof.
" n
# "n−1 #
X X
E Xi =E Xi + Xn
i=1 i=1
"n−1 #
X
=E Xi + E[X]
i=1
3.2 Variance
2
Var X = E X 2 − E[X]
for Random Variable X
2 2
= E[X − E[X]] = σ
√
Standard Deviation = Var X
Proof.
Var a + bX = E[a + bX − a − bE[X]]
= b2 E[X − E[X]]
= b2 Var X
2
(iii) Var X = E X 2 − E[X]
Proof.
2
E[X − E[X]] = E X 2 − 2XE[X] + (E[X])2
2
= E X 2 − 2E[X] E[X] + E[X]
= E X − (E[X])2
2
2q q q2
= − −
p2 p p
q
= 2
p
Linear Regression
Proof.
2
Var (X + Y ) = E (X + Y )2 − E[X] − E[Y ]
(
1, if ω ∈ A;
I[A](w) = (3.1)
0, if ω ∈
/ A.
1.
E[I[A]] = P(A)
X
E[I[A]] = pω I[A](w)
ω∈Ω
= P(A)
2. I[Ac ] = 1 − I[A]
3. I[A ∩ B] = I[A]I[B]
4.
Example. n ≥ couples are arranged randomly around a table such that males and fe-
males alternate. Let N = The number of husbands sitting next to their wives. Calculate
3.3. INDICATOR FUNCTION 17
n
X
N= I[Ai ] Ai = event couple i are together
i=1
" n
#
X
E[N ] = E I[Ai ]
i=1
n
X
= E[I[Ai ]]
i=1
n
X 2
=
i=1
n
2
=2
Thus E[N ] = n
n
n
!2
2 X
E N = E I[Ai ]
i=1
n 2
X X
= E I[Ai ] + 2 I[Ai ]I[Aj ]
i=1 i≤j
2
E I[Ai ]2 = E[I[Ai ]] =
n
E[(I[A1 ]I[A2 ])] = IE[[A1 ∩ B2 ]] = P(A1 ∩ A2 )
= P(A1 ) P(A2 |A1 )
2 1 1 n−2 2
= −
n n−1n−1 n−1n−1
2
Var N = E N 2 − E[N ]
2
= (1 + 2(n − 2)) − 2
n−1
2(n − 2)
=
n−1
18 CHAPTER 3. RANDOM VARIABLES
N N
!c
[ \
Ai = Aci
1 1
"N # " N
!c #
[ \
I Ai = I Aci
1 1
"N #
\
=1−I Aci
1
N
Y
=1− I[Aci ]
1
N
Y
=1− (1 − I[Ai ])
1
N
X X
= I[Ai ] − i1 ≤ i2 I[A1 ]I[A2 ]
1
X
+ ... + (−1)j+1 I[A1 ]I[A2 ]...I[Aj ] + ...
i1 ≤i2 ...≤ij
Take Expectation
"N # N
!
[ [
E Ai = P Ai
1 1
N
X X
= P(Ai ) − i1 ≤ i2 P(A1 ∩ A2 )
1
X
+ ... + (−1)j+1
P Ai1 ∩ Ai2 ∩ .... ∩ Aij + ...
i1 ≤i2 ...≤ij
3.5 Independence
Definition 3.5. Discrete random variables X1 , ..., Xn are independent if and only if
for any x1 ...xn :
N
Y
P(X1 = x1 , X2 = x2 .......Xn = xn ) = P(Xi = xi )
1
Proof.
X
P(f1 (X1 ) = y1 , . . . , fn (Xn ) = yn ) = P(X1 = x1 , . . . , Xn = xn )
x1 :f1 (X1 )=y1 ,...
xn :fn (Xn )=yn
N
Y X
= P(Xi = xi )
1 xi :fi (Xi )=yi
N
Y
= P(fi (Xi ) = yi )
1
P P
NOTE that E[ Xi ] = E[Xi ] without requiring independence.
Theorem 3.7. If X1 , ..., Xn are independent random variables and f1 ....fn are func-
tion R → R then:
"N # N
Y Y
E fi (Xi ) = E[fi (Xi )]
1 1
n
! n
X X
Var Xi = Var Xi
i=1 i=1
20 CHAPTER 3. RANDOM VARIABLES
Proof.
! !2 " n #2
n
X n
X X
Var Xi = E Xi −E Xi
i=1 i=1 i=1
" n #2
X X X
2
=E Xi + Xi Xj − E
Xi
i i6=j i=1
X X X 2
X
= E Xi2 + E[Xi Xj ] − E[Xi ] − E[Xi ] E[Xj ]
i i6=j i i6=j
X 2
= E Xi2 − E[Xi ]
i
X
= Var Xi
i
n
!
1X 1
Var Xi = Var Xi
n i=1 n
Proof.
n
!
1X 1
Var Xi = Var Xi
n i=1 n2
n
1 X
= 2 Var Xi
n i=1
1
= Var Xi
n
E[A] = a Var A = σ 2
E[B] = b Var B = σ 2
3.5. INDEPENDENCE 21
E[X] = a + b Var X = σ 2
E[Y ] = a − b Var Y = σ 2
X +Y
E =a
2
X +Y 1
Var = σ2
2 2
X −Y
E =b
2
X −Y 1
Var = σ2
2 2
So this is better.
Example. Non standard dice. You choose 1 then I choose one. Around this cycle
Inequalities
23
24 CHAPTER 4. INEQUALITIES
Concave
f is strictly convex n ≥ 3 and not all the x0i s equal then we assume not all of x2 ...xn
are equal. But then
n n
!
X 0 X 0
(1 − pj ) pi f (xi ) ≥ (1 − pj )f pi xi
i=2 i=2
For strictness since f strictly convex equation holds in [1] and hence [2] if and only if
x1 = x2 = · · · = xn
26 CHAPTER 4. INEQUALITIES
f (x) ≤ αy + βy x x ∈ (a, b)
f (y) = αy + βy y
0
If f is differentiable at y then the linear function is the tangent f (y) + (x − y)f (y)
f (E[X]) = α + βE[X]
So for any random variable X taking values in (a, b)
LetZ = aX − bY
Then0 ≤ E Z 2 = E (aX − bY )2
= a2 E X 2 − 2abE[XY ] + b2 E Y 2
quadratic in a with at most one real root and therefore has discriminant ≤ 0.
4.3. MARKOV’S INEQUALITY 27
Take b 6= 0
2
E[XY ] ≤ E X 2 E Y 2
Corollary.
|Corr(X, Y )| ≤ 1
E[|X|]
P(|X| ≥ a) ≤ for any a ≥ 0
a
Proof. Let
A = |X| ≥ a
T hen |X| ≥ aI[A]
Take expectation
E[|X|] ≥ aP(A)
E[|X|] ≥ aP(|X| ≥ a)
E X2
P(|X| ≥ ) ≤
2
Proof.
x2
I[|X| ≥ ] ≤ ∀x
2
28 CHAPTER 4. INEQUALITIES
Then
x2
I[|X| ≥ ] ≤
2
Take Expectation
E X2
2
x
P(|X| ≥ ) ≤ E 2 =
2
Note
1. The result
is “distribution free” - no assumption about the distribution of X (other
than E X 2 ≤ ∞).
c
X = + with probability
22
c
= − with probability 2
2
c
= 0 with probability 1 − 2
c
Then P(|X| ≥ ) = 2
E X2 = c
c E X2
P(|X| ≥ ) = 2 =
2
Var X
P(X − µ ≥ ) ≤
2
Then
Sn
∀ ≥ 0, P − µ ≥ → 0 as n → ∞
n
4.5. LAW OF LARGE NUMBERS 29
E ( Snn − µ)2
Sn
P − µ ≥ ≤
n 2
E (Sn − nµ)2
= properties of expectation
n2 2
Var Sn
= 2 2 Since E[Sn ] = nµ
n
But Var Sn = nσ 2
nσ 2 σ2
Sn
Thus P − µ ≥ ≤ 2 2 = 2 → 0
n n n
Example. A1 , A2 ... are independent events, each with probability p. Let Xi = I[Ai ].
Then
Sn nA number of times A occurs
= =
n n number of trials
µ = E[I[Ai ]] = P(Ai ) = p
Theorem states that
Sn
P
− p ≥ → 0 as n → ∞
n
Which recovers the intuitive definition of probability.
Example. A Random Sample of size n is a sequence X1 , X2 , . . . , Xn of independent
identically distributed random variables (’n observations’)
Pn
Xi
X̄ = i=1 is called the SAMPLE MEAN
n
Theorem states that provided the variance of Xi is finite, the probability that the sample
mean differs from the mean of the distribution by more than approaches 0 as n → ∞.
We have shown the weak law of large numbers. Why weak? ∃ a strong form of
larger numbers.
Sn
P → µ as n → ∞ = 1
n
This is NOT the same as the weak form. What does this mean?
ω ∈ Ω determines
Sn
, n = 1, 2, . . .
n
as a sequence of real numbers. Hence it either tends to µ or it doesn’t.
Sn (ω)
P ω: → µ as n → ∞ = 1
n
30 CHAPTER 4. INEQUALITIES
Chapter 5
Generating Functions
In this chapter, assume that X is a random variable taking values in the range 0, 1, 2, . . ..
Let pr = P(X = r) r = 0, 1, 2, . . .
Definition 5.1. The Probability Generating Function (p.g.f) of the random variable
X,or of the distribution pr = 0, 1, 2, . . . , is
∞ ∞
X X
p(z) = E z X = z r P(X = r) = pr z r
r=0 r=0
This p(z) is a polynomial or a power series. If a power series then it is convergent for
|z| ≤ 1 by comparison with a geometric series.
X r
X
|p(z)| ≤ pr |z| ≤ pr = 1
r r
Example.
1
pr = r = 1, . . . , 6
6
1
p(z) = E z X = 1 + z + . . . z6
6
z 1 − z6
=
6 1−z
Theorem 5.1. The distribution of X is uniquely determined by the p.g.f p(z).
Proof. We know that we can differential p(z) term by term for |z| ≤ 1
0
p (z) = p1 + 2p2 z + . . .
0
and so p (0) = p1 (p(0) = p0 )
31
32 CHAPTER 5. GENERATING FUNCTIONS
Proof.
∞
X
p0 (z) = rpr z r−1 |z| ≤ 1
r=i
For z ∈ (0, 1), p0 (z) is a non decreasing function of z and is bounded above by
∞
X
E[X] = rpr
r=i
Then
∞
X N
X N
X
lim rpr z r−1 ≥ lim rpr z r−1 = rpr
z→1 z→1
r=i r=i r=i
True ∀ ≥ 0 and so
E[X] = lim p0 (z)
z→1
z 1 − z6
Recall p(z) =
6 1−z
Theorem 5.3.
E[X(X − 1)] = lim p00 (z)
z→1
Proof.
∞
X
00
p (z) = r(r − 1)pz r−2
r=2
Proof now the same as Abel’s Lemma
Theorem 5.4. Suppose that X1 , X2 , . . . , Xn are independent random variables with
p.g.f’s p1 (z), p2 (z), . . . , pn (z). Then the p.g.f of
X1 + X2 + . . . X n
is
p1 (z)p2 (z) . . . pn (z)
Proof.
E z X1 +X2 +...Xn = E z X1 .z X2 . . . .z Xn
= E z X1 E z X2 . . . E z Xn
00
E[X(X − 1)] = p (1) = λ2
2
Var X = E X 2 − E[X]
2
= E[X(X − 1)] + E[X] − E[X]
= λ 2 + λ − λ2
=λ
Example. Suppose that Y has a Poisson Distribution with parameter µ. If X and Y are
independent then:
E z X+Y = E z X E z Y
= e−λ(1−z) e−µ(1−z)
= e−(λ+µ)(1−z)
But this is the p.g.f of a Poisson random variable with parameter λ + µ. By uniqueness
(first theorem of the p.g.f) this must be the distribution for X + Y
Example. X has a binomial Distribution,
n r
P(X = r) = p (1 − p)n−r r = 0, 1, . . .
r
n
X X n r
E z = p (1 − p)n−r z r
r=0
r
= (pz + 1 − p)n
fn = fn−1 + fn−2 f0 = f1 = 1
Let
∞
X
F (z) = fn z n
n=0
fn z n = fn−1 z n + fn−2 z n
∞
X X∞ X∞
fn z n = fn−1 z n + fn−2 z n
n=2 n=2 n=0
F (z) − f0 − zf1 = z(F (z) − f0 ) + z 2 F (z)
F (z)(1 − z − z 2 ) = f0 (1 − z) + zf1
= 1 − z + z = 1.
1
Since f0 = f1 = 1, then F (z) = 1−z−z 2
Let √ √
1+ 5 1− 5
α1 = α2 =
2 2
1
F (z) =
(1 − α1 z)(1 − α2 z)
α1 α2
= −
(1 − α1 z) (1 − α2 z)
∞ ∞
!
1 X
n n
X
n n
= α1 α1 z − α2 α2 z
α1 − α2 n=0 n=0
P(X = x, Y = y)
This is often called the Marginal distribution for X. The conditional distribution for X
given by Y = y is
P(X = x, Y = y)
P(X = x|Y = y) =
P(Y = y)
5.3. PROPERTIES OF CONDITIONAL EXPECTATION 35
Thus E[X|Y ] : Ω → R
Y = X1 + X2 + · · · + Xn
Then
P(X1 = 1, Y = r)
P(X1 = 1|Y = r) =
P(Y = r)
P(X1 = 1, X2 + · · · + Xn = r − 1)
=
P(Y = r)
P(X1 ) P(X2 + · · · + Xn = r − 1)
=
P(Y = r)
n−1 r−1
p p (1 − p)n−r
= r−1n r
n−r
r p (1 − p)
n−1
r−1
= n
r
r
=
n
Then
Proof.
X
E[E[X|Y ]] = P(Y = y) E[X|Y = y]
y∈Ry
X X
= P(Y = y) P(X = x|Y = y)
y x∈Rx
XX
= xP(X = x|Y = y)
y x
= E[X]
E[X|Y ] = E[X]
X1 + X2 + · · · + XN
E z X1 +,...,Xn = E E z X1 +,...,Xn |N
∞
X
P(N = n) E z X1 +,...,Xn |N = n
=
n=0
X∞
= P(N = n) (p(z))n
n=0
= h(p(z))
0 0
= h (1)p (1) = E[N ] E[X1 ]
d2
Exercise Calculate dz 2 h(p(z)) and hence
Var X1 +, . . . , Xn
Let
Fn (z) = E z Xn
Then F1 (z) = F (z) the probability generating function of the offspring distribution.
Theorem 5.7.
Fn+1 (z) = Fn (F (z)) = F (F (. . . (F (z)) . . . ))
Fn (z) is an n-fold iterative formula.
38 CHAPTER 5. GENERATING FUNCTIONS
Proof.
= E E z Xn+1 |Xn
X∞
P(Xn = k) E z Xn+1 |Xn = k
=
n=0
∞ h n n i
X n
= P(Xn = k) E z Y1 +Y2 +···+Yn
n=0
X∞ h ni h ni
= P(Xn = k) E z Y1 . . . E z Yn
n=0
X∞
k
= P(Xn = k) (F (z))
n=0
= Fn (F (z))
0 00
Proof. Prove by calculating F (z), F (z) Alternatively
= σ 2 mn−1
Thus
2 2
E Xn2 − 2mE[Xn Xn−1 ] + m2 E Xn−1 = σ 2 mn−1
5.4. BRANCHING PROCESSES 39
Now calculate
2
Var Xn = E Xn2 − E[Xn ]
2
= m2 E Xn−1
2
+ σ 2 mn−1 − m2 E[Xn−1 ]
= m2 Var Xn−1 + σ 2 mn−1
= m4 Var Xn−2 + σ 2 (mn−1 + mn )
= m2(n−1) Var X1 + σ 2 (mn−1 + mn + · · · + m2n−3 )
= σ 2 mn−1 (1 + m + · · · + mn )
An = Xn = 0
= Extinction occurs by generation n
∞
[
and let A = An
1
= the event that extinction ever occurs
A1 ⊂ A2 ⊂ . . .
and define
∞
[
A = lim An = An
n→∞
1
Define Bn for n ≥ 1
B1 = A1
n−1
!c
[
Bn = An ∩ Ai
i=1
= An ∩ Acn−1
40 CHAPTER 5. GENERATING FUNCTIONS
Thus
P lim An = lim P(An )
n→∞ n→∞
So
q = lim Fn (0)
n→∞
Also
F (q) = F lim Fn (0)
n→∞
= lim F (Fn (0)) Since F is continuous
n→∞
= lim Fn+1 (0)
n→∞
Thus F (q) = q
Alternative Derivation
X
q= P(X1 = k) P(extinction|X1 = k)
k
X
= P(X1 = k) q k
= F (q)
Theorem 5.9. The probability of extinction, q, is the smallest positive root of the equa-
tion F (q) = q. m is the mean of the offspring distribution.
Proof.
∞
X 0 0
F (1) = 1 m= kfk = lim F (z)
z→1
0
∞
00 X
F (z) = j(j − 1)z j−2 in 0 ≤ z ≤ 1 Since f0 + f1 ≤ 1 Also F (0) = f0 ≥ 0
j=z
Thus if m ≤ 1, there does not exists a q ∈ (0, 1) with F (q) = q. If m ≥ 1 then let α
F (0) ≤ F (α) = α
F (F (0)) ≤ F (α) = α
Fn (0) ≤ α ∀n ≥ 1
q = lim Fn (0) ≤ 0
n→∞
q=α Since q is a root of F (z) = z
42 CHAPTER 5. GENERATING FUNCTIONS
Sn = S0 + X 1 + X 2 + · · · + X n Where, usually S0 = 0
We shall assume
(
1, with probability p
Xn = (5.2)
−1, with probability q
1
This is a simple random walk. If p = q = 2 then the random walk is called symmetric
Example (Gambler’s Ruin). You have an initial fortune of A and I have an initial
fortune of B. We toss coins repeatedly I win with probability p and you win with
probability q. What is the probability that I bankrupt you before you bankrupt me?
5.5. RANDOM WALKS 43
Let pz be the probability that the random walk hits a before it hits 0, starting from
z. Let qz be the probability that the random walk hits 0 before it hits a, starting from
z. After the first step the gambler’s fortune is either z − 1 or z + 1 with prob p and q
respectively. From the law of total probability.
and so z
q
1− p
pz = a
q
1− p
or
a−z
qz = if p = q
z
Thus qz + pz = 1 and so on, as we expected, the game ends with probability one.
P(hits 0 before a) = qz
a
q
p − ( pq )z
qz = a if p 6= q
q
p −1
a−z
Or = if p = q
z
What happens as a → ∞?
∞
[
P( paths hit 0 ever) = path hits 0 before it hits a
a=z+1
Dz = 1 + pDz+1 + qDz−1
E[duration] = E[E[duration | first step]]
= p (E[duration | first step up]) + q (E[duration | first step down])
= p(1 + Dz+1 ) + q(1 + Dz−1 )
Cz = Cp (z + 1) + Cq (z − 1) + 1
1
C= for p 6= q
q−p
5.5. RANDOM WALKS 45
Dz − z 2 + A + Bz
Dz = z(a − z) p=q
Must satisfy
2
λ(s) = ps ((λ(s)) + qs
Two Roots, p
1± 1 − 4pqs2
λ1 (s), λ2 (s) =
2ps
Every Solution of the form
z z
Uz (s) = A(s) (λ1 (s)) + B(s) (λ2 (s))
a z z a
(λ1 (s)) (λ2 (s)) − (λ1 (s)) (λ2 (s))
Uz (s) = a a
(λ1 (s)) − (λ2 (s))
q
But λ1 (s)λ2 (s) = recall quadratic
p
a−z a−z
q (λ1 (s)) − (λ2 (s))
Uz (s) = a a
p (λ1 (s)) − (λ2 (s))
Same method give generating function for absorption probabilities at the other barrier.
Generating function for the duration of the game is the sum of these two generating
functions.
Chapter 6
In this chapter we drop the assumption that Ω id finite or countable. Assume we are
given a probability p on some subset of Ω.
For example, spin a pointer, and let ω ∈ Ω give the position at which it stops, with
Ω = ω : 0 ≤ ω ≤ 2π. Let
θ
P(ω ∈ [0, θ]) = (0 ≤ θ ≤ 2π)
2π
Definition 6.1. A continuous random variable X is a function X : Ω → R for which
Z b
P(a ≤ X(ω) ≤ b) = f (x)dx
a
47
48 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
in this case. Intuition about probability density functions is based on the approximate
relation. Z x+xδx
P(X ∈ [x, x + xδx]) = f (z)dz
x
Proofs however more often use the distribution function
F (x) = P(X ≤ x)
F (x) is increasing in x.
In either case
this is known as the exponential distribution with parameter λ. If X has this distribu-
tion then
P(X ≤ x + z)
P(X ≤ x + z|X ≤ z) =
P(X ≤ z)
e−λ(x+z)
=
e−λz
−λx
=e
= P(X ≤ x)
Theorem 6.1. If X is a continuous random variable with pdf f (x) and h(x) is a contin-
uous strictly increasing function with h−1 (x) differentiable then h(x) is a continuous
random variable with pdf
d −1
fh (x) = f h−1 (x) h (x)
dx
d
P(h(X) ≤ x)
dx
is a continuous random variable with pdf as claimed fh . Note usually need to repeat
proof than remember the result.
50 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
More generally
Theorem 6.2. Let U ∼ U [0, 1]. For any continuous distribution function F, the random
variable X defined by X = F −1 (u) has distribution function F .
Proof.
P(X ≤ x) = P F −1 (u) ≤ x
= P(U ≤ F (x))
= F (x) ∼ U [0, 1]
Remark
P(X = Xi ) = pi i = 0, 1, . . .
Let
j−1
X j
X
X = xj if pi ≤ U ≤ pi U ∼ U [0, 1]
i=0 i=0
Let
FX (x) = P(Xz ≤ x)
= P(X ≤ x, Y ≤ ∞)
= F (x, ∞)
= lim F (x, y)
y→∞
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 51
FY (x) = F (∞, y)
For some function f called the joint probability density function satisfying the obvious
conditions.
1.
f (x1 , . . . , xn )dx1 ≥ 0
2. ZZ Z
... f (x1 , . . . , xn )dx1 . . . dxn = 1
Rn
Example. (n = 2)
F (x, y) = P(X ≤ x, Y ≤ y)
Z x Z y
= f (u, v)dudv
−∞ −∞
2
∂ F (x, y)
and so f (x, y) =
∂x∂y
Theorem 6.3. provided defined at (x, y). If X and y are jointly continuous random
variables then they are individually continuous.
is the pdf of X.
Where fXi (xi ) are the pdf’s of the individual random variables.
Example. Two points X and Y are tossed at random and independently onto a line
segment of length L. What is the probability that:
|X − Y | ≤ l?
1
f (x, y) = x, y ∈ [0, L]2
L2
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 53
Desired probability
ZZ
= f (x, y)dxdy
A
area of A
=
L2
L − 2 21 (L − l)2
2
=
L2
2Ll − l2
=
L2
Let θ ∈ [0, 2π] be the angle between the needle and the parallel lines and let x be
the distance from the bottom of the needle to the line closest to it. It is reasonable to
suppose that X is distributed Uniformly.
X ∼ U [0, L] Θ ∼ U [0, π)
1
f (x, θ) = 0 ≤ x ≤ L and 0 ≤ θ ≤ π
lπ
54 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
The needle intersects the line if and only if X ≤ sin θ The event A
ZZ
= f (x, θ)dxdθ
A
Z π
sin θ
=l dθ
0 πL
2l
=
πL
Definition 6.2. The expectation or mean of a continuous random variable X is
Z ∞
E[X] = xf (x)dx
−∞
R∞ R0
provided not both of −∞
xf (x)dx and −∞
xf (x)dx are infinite
Example (Normal Distribution). Let
1 −(x−µ)2
f (x) = √ e 2σ2 −∞≤x≤∞
2πσ
This is non-negative for it to be a pdf we also need to check that
Z ∞
f (x)dx = 1
−∞
x−µ
Make the substitution z = σ . Then
Z ∞
1 −(x−µ)2
I=√ e 2σ2 dx
2πσ −∞
Z Z ∞
1 −z 2
=√ e 2 dz
2π −∞
Z ∞ Z ∞
1 −x2 −y 2
Thus I 2 = e 2 dx e 2 dy
2π −∞ −∞
Z ∞ Z ∞
1 −(y 2 +x2 )
= e 2 dxdy
2π −∞ −∞
Z 2π Z ∞
1 −θ 2
= re 2 drdθ
2π 0 0
Z 2π
= dθ = 1
0
Therefore I = 1. A random variable with the pdf f(x) given above has a Normal
distribution with parameters µ and σ 2 we write this as
X ∼ N [µ, σ 2 ]
The Expectation is
Z ∞
1 −(x−µ)2
E[X] = √ xe 2σ2 dx
2πσ −∞
Z ∞ Z ∞
1 −(x−µ)2 1 −(x−µ)2
=√ (x − µ)e 2σ2 dx + √ µe 2σ2 dx.
2πσ −∞ 2πσ −∞
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 55
E[X] = 0 + µ
=µ
Theorem 6.4. If X is a continuous random variable then,
Z ∞ Z ∞
E[X] = P(X ≥ x) dx − P(X ≤ −x) dx
0 0
Proof.
Z ∞ Z ∞ Z ∞
P(X ≥ x) dx = f (y)dy dx
0
Z0 ∞ Z x
∞
= I[y ≥ x]f (y)dydx
0 0
Z ∞ Z y
= dxf (y)dy
Z0 ∞ 0
= yf (y)dy
0
Z ∞ Z 0
Similarly P(X ≤ −x) dx = yf (y)dy
0 −∞
result follows.
Note This holds for discrete random variables and is useful as a general way of
finding the expectation whether the random variable is discrete or continuous.
If X takes values in the set [0, 1, . . . , ] Theorem states
∞
X
E[X] = P(X ≥ n)
n=0
∞
X ∞ X
X ∞
P(X ≥ n) = I[m ≥ n]P(X = m)
n=0 n=0 m=0
∞ ∞
!
X X
= I[m ≥ n] P(X = m)
m=0 n=0
X∞
= mP(X = m)
m=0
Theorem 6.5. Let X be a continuous random variable with pdf f (x) and let h(x) be
a continuous real-valued function. Then provided
Z ∞
|h(x)| f (x)dx ≤ ∞
−∞
Z ∞
E[h(x)] = h(x)f (x)dx
−∞
56 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Proof.
Z ∞
P(h(X) ≥ y) dy
0
"Z #
Z ∞
= f (x)dx dy
0 x:h(x)≥0
Z ∞ Z
= I[h(x) ≥ y]f (x)dxdy
0 x:h(x)≥0
"Z #
Z h(x)≥0
= dy f (x)dx
x:h(x) 0
Z
= h(x)f (x)dy
x:h(x)≥0
Z ∞ Z
Similarly P(h(X) ≤ −y) = − h(x)f (x)dy
0 x:h(x)≤0
Var X = E (X − E[X])2
Example.
2
Var X = E X 2 − E[X]
Z ∞ Z ∞ 2
= x2 f (x)dx − xf (x)dx
−∞ −∞
X−µ
Example. Suppose X ∼ N [µ, σ 2 ] Let z = σ then
X −µ
P(Z ≤ z) = P ≤z
σ
= P(X ≤ µ + σz)
Z µ+σz
1 −(x−µ)2
= √ e 2σ2 dx
−∞ 2πσ
Z z
x−µ 1 −u2
Let u = = √ e 2 du
σ −∞ 2π
= Φ(z) The distribution function of a N (0, 1) random variable
Z ∼ N (0, 1)
6.2. TRANSFORMATION OF RANDOM VARIABLES 57
2
Var X = E Z 2 − E[Z] Last term is zero
Z ∞
1 −z 2
=√ z 2 e 2 dz
2π −∞
∞ Z ∞
1 −z 2 −z 2
= − √ ze 2 + e 2 dz
2π −∞ −∞
=0+1=1
Var X = 1
Variance of X?
X = µ + σz
Thus E[X] = µ we know that already
Var X = σ 2 Var Z
Var X = σ 2
X ∼ (µ, σ 2 )
Y1 = r1 (X1 , X2 , . . . , Xn )
Y2 = r2 (X1 , X2 , . . . , Xn )
..
.
Yn = rn (X1 , X2 , . . . , Xn )
P((X1 , X2 , . . . , Xn ) ∈ R) = 1
Let S be the image of R under the above transformation suppose the transformation
from R to S is 1-1 (bijective).
58 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
x1 = s1 (y1 , y2 , . . . , yn )
x2 = s2 (y1 , y2 , . . . , yn ) . . .
xn = sn (y1 , y2 , . . . , yn )
∂si
Assume that ∂yj exists and is continuous at every point (y1 , y2 , . . . , yn ) in S
∂s1 ∂s1
∂y ... ∂yn
1
J = ... .. .. (6.3)
. .
∂sn ... ∂sn
∂y1 ∂yn
If A ⊂ R
Z Z
P((X1 , . . . , Xn ) ∈ A) [1] = ··· f (x1 , . . . , xn )dx1 . . . dxn
A
Z Z
= ··· f (s1 , . . . , sn ) |J| dy1 . . . dyn
B
= P((Y1 , . . . , Yn ) ∈ B) [2]
Example (density of products and quotients). Suppose that (X, Y ) has density
(
4xy, for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) = (6.4)
0, Otherwise.
X
Let U = Y and V = XY
6.2. TRANSFORMATION OF RANDOM VARIABLES 59
√
r
V
X= UV Y =
U
√
r
v
x= uv y=
u
r r
∂x 1 v ∂x 1 u
= =
∂u 2 u ∂v 2 v
1
∂y −1 v 2 ∂y 1
= = √ .
∂u 2 u 23 ∂v 2 uv
1
Therefore |J| = 2u and so
1
g(u, v) = (4xy)
2u
√
r
1 v
= × 4 uv
2u u
u
=2 if (u, v) ∈ D
v
=0 Otherwise.
1
g(y1 , . . . ,n ) = f (A−1 g)
det A
Example. Suppose X1 , X2 have the pdf f (x1 , x2 ). Calculate the pdf of X1 + X2 .
Let Y = X1 + X2 and Z = X2 . Then X1 = Y − Z and X2 = Z.
−1 1 −1
A = (6.5)
0 1
1
det A−1 = 1
det A
Then
g(y, z) = f (x1 , x2 ) = f (y − z, z)
60 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
d n
g(y) = (F (y))
dy
n−1
= n (F (y)) f (y)
P(Y1 ≤ y) = 1 − P(X1 ≥ y, . . . , Xn ≥ y)
n
= 1 − (1 − F (y))
G(y, yn ) = P(Y1 ≤ y1 , Yn ≤ yn )
= P(Yn ≤ yn ) − P(Yn ≤ yn , Y1 ≥1 )
= P(Yn ≤ yn ) − P(y1 ≤ X1 ≤ yn , y1 ≤ X2 ≤ yn , . . . , y1 ≤ Xn ≤ yn )
n n
= (F (yn )) − (F (yn ) − F (y1 ))
∂2
g(y1 , yn ) = G(y1 , yn )
∂y1 ∂yn
n−2
= n(n − 1) (F (yn ) − F (y1 )) f (y1 )f (yn ) − ∞ ≤ y1 ≤ yn ≤ ∞
=0 otherwise
What happens if the mapping is not 1-1? X = f (x) and |X| = g(x)?
Z b
P(|X| ∈ (a, b)) = (f (x) + f (−x)) dx g(x) = f (x) + f (−x)
a
z1 = Y1
z2 = Y2 − Y1
..
.
zn = Yn − Yn−1
Z = AY
Where
1 0 0 ...0 0
−1 1 0 . . . 0 0
A = 0 −1 1 . . . 0 0 (6.7)
.. .. .. .. ..
. . . . .
0 0 ... −1 1
det(A) = 1
h(z1 , . . . , zn ) = g(y1 , . . . , yn )
= n!f (y1 ) . . . f (yn )
= n!λn e−λy1 . . . e−λyn
= n!λn e−λ(y1 +···+yn )
= n!λn e−λ(z1 2z2 +···+nzn )
Yn
= λie−λizn+1−i
i=1
Zn+1−i ∼ exp(λi)
D = R2 = X 2 + Y2
Y
then tan Θ = X then
y
d = x2 + y 2 and θ = arctan
x
2x 2y
|J| = −yx2
1
x
=2
(6.8)
1+( xy )2 y 2
1+( x )
6.2. TRANSFORMATION OF RANDOM VARIABLES 63
1 −x2 1 −y2
f (x, y) = √ e 2 √ e 2
2π 2π
1 −(x2 +y2 )
= e 2
2π
Thus
1 −d
g(d, θ) = e 2 0≤d≤∞ 0 ≤ θ ≤ 2π
4π
But this is just the product of the densities
1 −d
gD (d) = e 2 0≤d≤∞
2
1
gΘ (θ) = 0 ≤ θ ≤ 2π
2π
Then D and Θ are independent. d ∼exponentially mean 2. Θ ∼ U [0, 2π].
Note this is useful for the simulations of the normal random variable.
We know we can simulate N [0, 1] random variable by X = f 0 (U ) when U ∼
U [0, 1] but this is difficult for N [0, 1] random variable since
Z +x
1 −z2
F (x) = Θ(x) = √ e 2
−∞ 2π
is difficult.
Let U1 and U2 be independent ∼ U [0, 1]. Let R2 = −2 log U , so that R2 is
exponential with mean 2. Θ = 2πU2 . Then Θ ∼ U [0, 2π]. Now let
p
X = R cos Θ = −2 log U1 cos(2πU2 )
p
Y = R sin Θ = −2 log U2 sin(2πU1 )
1
answer = 3
64 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
√ !2
2 3 √ 2
a + = 3
2
answer = 12
(3) The foot of the perpendicular to the chord from the centre of the circle is uni-
formly distributed over the diameter of the interior circle.
Theorem 6.6. The moment generating function determines the distribution of X, pro-
vided m(θ) is finite for some interval containing the origin.
Proof. Not proved.
Theorem 6.7. If X and Y are independent random variables with moment generating
function mx (θ) and my (θ) then X + Y has the moment generating function
Proof.
h i
E eθ(x+y) = E eθx eθy
= E eθx E eθy
= mx (θ)my (θ)
Theorem 6.8. The rth moment of X ie the expected value of X r , E[X r ], is the coeffi-
r
cient of θr! of the series expansion of n(θ).
Proof. Sketch of....
θ2 2
eθX = 1 + θX + X + ...
2!
θ2
E eθX = 1 + θE[X] + E X 2 + . . .
2!
Thus
2
Var X = E X 2 − E[X]
2 1
= 2− 2
λ λ
66 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
h i
E eθ(X1 +···+Xn ) = E eθX1 . . . E eθXn
n
= E eθX1
n
λ
=
λ−θ
Suppose that Y ∼ Γ(n, λ).
Z ∞
λn e−λx xn−1
E eθY = eθx
dx
0 (n − 1)!
n Z ∞
(λ − θ)n e−(λ−θ)x xn−1
λ
= dx
λ−θ 0 (n − 1)!
Hence claim, since moment generating function characterizes distribution.
Example (Normal Distribution). X ∼ N [0, 1]
Z ∞
1 x−µ 2
E eθX = eθx √ e−( 2σ2 ) dx
−∞ 2πσ
Z ∞
1 −1 2 2 2
= √ exp (x − 2xµ + µ − 2θσ x) dx
−∞ 2πσ 2σ 2
Z ∞
1 −1 2 2 2 2 4
= √ exp (x − µ − θσ ) − 2µσ θ − θ σ dx
−∞ 2πσ 2σ 2
Z ∞
µθ+θ 2 σ2
2 1 −1 2 2
=e √ exp (x − µ − θσ ) dx
−∞ 2πσ 2σ 2
2.
aX ∼ N [aµ1 + a2 σ 2 ]
6.4. CENTRAL LIMIT THEOREM 67
Proof. 1.
h i
E eθ(X+Y ) = E eθX E eθY
1 2 2 1 2 2
= e(µ1 θ+ 2 σ1 θ ) e(µ2 θ+ 2 σ2 θ )
1 2 2 2
= e(µ1 +µ2 )θ+ 2 (σ1 +σ2 )θ
2.
h i h i
E eθ(aX) = E e(θa)X
1 2 2
= eµ1 (θa)+ 2 σ1 (θa)
2
1
σ12 θ 2
= e(aµ1 )θ+ 2 a
N [aµ1 , a2 σ12 ]
Var Xi = σ 2
X1 + · · · + Xn has Variance
Var X1 + · · · + Xn = nσ 2
68 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
X1 +···+Xn
n has Variance
X1 + · · · + Xn σ2
Var =
n n
X1 +···+X
√
n
n
has Variance
X1 + · · · + Xn
Var √ = σ2
n
n
X
Sn = Xi
1
b
Sn − nσ
Z
1 −z2
lim P a ≤ √ ≤b = √ e 2 dz
n→∞ σ n a 2π
θ2 θ3
= 1 + θE[Xi ] + E Xi2 + E Xi3 + . . .
2 3!
θ2 θ3 3
=1+ + E Xi + . . .
2 3!
Sn
The mgf of √
n
h Sn i h θ i
θ√ √ (X +···+Xn )
E e n =E e n 1
h θ i h θ i
√ X √ X
= E e n 1 ...E e n n
h θ in
√ X
=E e n 1
n
θ
= mX1 √
n
!n
θ2 θ3 E X 3 θ2
= 1+ + 3 =→ e 2 as n → ∞
2n 3!n 2
0
P p − p ≤ 0.005 = P(|Sn − np| ≤ 0.005n)
√
|Sn − np| 0.005 n
=P √ ≤ √
npq n
Z 1.96
1 −x2
√ e 2 dx = 2Φ(1.96) − 1
−1.96 2π
1 1
But we don’t know p. But pq ≤ 4 with the worst case p = q = 2
1.962 1
n≥ ' 40, 000
0.0052 4
If we replace 0.005 by 0.01 the n ≥ 10, 000 will be sufficient. And is we replace 0.005
by 0.045 then n ≥ 475 will suffice.
Note Answer does not depend upon the total population.
6.5. MULTIVARIATE NORMAL DISTRIBUTION 71
1 −1
x∧~
= 2 ~ x
n e
(2π) 2
Write
X1
X2
~ =
X
..
.
Xn
~ where A is an invertible matrix ~x = A−1 (~x − µ
and let ~z = µ
~ + AX ~ ) . Density of ~z
1 1 −1 −1 T −1
f (z1 , . . . , zn ) = n e 2 (A (~z−~µ)) (A (~z−~µ))
(2π) 2 det A
1 −1
µ)T Σ−1 (~
z −~ z −~
= 2 (~ µ)
n 1 e
(2π) 2 |Σ| 2
~z ∼ M V N [~
µ, Σ]
If the covariance matrix of the MVN distribution is diagonal, then the components of
the random vector ~z are independent since
n ”2
1
“
Y −1 zi −µi
f (z1 , . . . , zn ) = 1 e 2 σi
i=1
(2π) 2 σi
Where 2
σ1 0 ... 0
0 σ22 ... 0
Σ= .
.. .. ..
..
. . .
0 0 ... σn2
Not necessarily true if the distribution is no MVN recall sheet 2 question 9.
72 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
1
f (x1 , x2 ) = 1 ×
2π(1 − p2 ) 2 σ1 σ2
" " 2
1 x1 − µ1
exp − −
2(1 − p2 ) σ1
2 ##
x1 − µ1 x2 − µ2 x1 − µ1
2p +
σ1 σ2 σ1
Cov(X1 , X2 )
Correlation(X1 , X2 ) = =p
σ1 σ2