Professional Documents
Culture Documents
Theory of Probability
Instructor: Ping Li
Cornell University
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 2
General Information
• Homework: Weekly
– Please turn in your homework either in class or to BSCB front desk
(Comstock Hall, 1198).
– Before computing your overall homework grade, the assignment with the
lowest grade (if ≥ 25%) will be dropped, the one with the second lowest
grade (if ≥ 50%) will also be dropped.
– It is the students’ responsibility to keep copies of the submitted homework.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 5
• Grading:
1. The homework: 30%
Material
Topic Textbook
D = 2000
S1 = {0, 15, 31, 193, 261, 495, 563, 919}
S2 = {5, 15, 19, 193, 209, 325, 563, 1029, 1921}
Challenges
• ...
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 9
Search Engines
Word 1 1 0 0 1 0 0 0 1
Word 2 0 1 0 1 0 0 1 0
Word 3
Word 4
Word n
Generate (small) k random permutations and every times store the minimums.
|S1 ∩ S2 | a
P = Pr (min{π (S1 )} = min{π (S2 )}) = =
|S1 ∪ S2 | f1 + f2 − a
where a = |S1 ∩ S2 |, f1 = |S1 |, f2 = |S2 |.
Combinatorial Analysis
A B
1
1 2
2
Origin Destination
Q: How many different routes from the origin to reach the destination?
A
1
2
Destination
Origin
m
B
1
2
Q: How many different routes from the origin to reach the destination?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 18
Example:
—————–
Multiplicative or additive?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 19
Example:
In a 7-place license plate, the first 3 places are to be occupied by letters and the
final 4 by numbers, e.g., ITH-0827.
—————–
Multiplicative or additive?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 20
Example:
How many license plates would be possible if repetition among letters or numbers
were prohibited?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 21
———————
Permutations
Basic formula:
n × (n − 1) × (n − 2) × ... × 1 = n!
Why?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 23
1 2 3 4 ... n
...
Example:
Example:
A class consists of 6 men and 4 women. The students are ranked according to
their performance in the final, assuming no ties.
Q: If the women are ranked among themselves and the men among themselves
how many different rankings are possible?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 26
Example:
Example:
Answer:
6!
= 60
3! × 2! × 1!
Why?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 28
1 2 3 4 5 6 7 8 9 10 11
How many ways to switch the elements and remain the same permutation?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 30
Example:
A signal consists of 9 flags hung in a line, which can be made from a set of 4
white flags, 3 red flags, and 2 blue flags. All flags of the same color are identical.
Combinations
Pronounced as “n-choose-r ”
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 32
Step 1: When orders are relevant, the number of different groups of r objects
formed from n objects = n × (n − 1) × (n − 2) × ... × (n − r + 1)
1 2 3 4 ... r
...
Step 2: When orders are NOT relevant, each group of r objects can be
formed by r! different ways.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 33
Step 3: Therefore, when orders are NOT relevant, the number of different groups
of r objects would be
n × (n − 1) × ... × (n − r + 1)
r!
Example:
Example:
How many different committees consisting of 2 women and 3 men can be formed,
from a group of 5 women and 7 men?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 36
Example (1.4.4c):
Q: How many linear orderings are there in which no two defectives are
consecutive?
n−m+1
A: m .
n
Useful Facts about Binomial Coefficient
r
n
1, and nn
1. 0 = = 1. Recall 0! = 1.
n n
2. r = n−r
n n−1 n−1
3. r = r−1 + r
Proof 1 by algebra
n−1
n−1
(n−1)! (n−1)! (n−1)!(r+n−r)
r−1 + r = (n−r)!(r−1)! + (n−r−1)!r! = (n−r)!r!
• Avoid factorial n!, which is easily very large. 170! = 7.2574 × 10306
n n
• Take advantage of r = n−r
• Use accurate approximations of the log factorial function (log n!). Check
http://en.wikipedia.org/wiki/Factorial.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 39
Example for n = 2:
2 2
2 2 2
1, 21
(x + y) = x + 2xy + y , 0 = 2 = =2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 40
Proof:
Assume
n−1
X
n − 1 k n−1−k
(x + y)n−1 = x y
k
k=0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 41
Then
n n−1
X
X n n k n−k
xk y n−k = x y + xn + y n
k k
k=0 k=1
n−1
X n − 1 n−1
X n − 1
= xk y n−k + xk y n−k + xn + y n
k k−1
k=1 k=1
n−1
X n − 1 n−1
X n − 1
=y xk y n−1−k − y n + xk y n−k + xn + y n
k k−1
k=0 k=1
n−2
X n − 1
=y(x + y)n−1 + xj+1 y n−1−j + xn , (let j = k − 1)
j=0
j
n−1
X
n − 1 j n−1−j
=y(x + y)n−1 + x x y − xn + xn
j=0
j
Multinomial Coefficients
Answer:
n n!
=
n1 , n2 , ..., nr n1 !n2 !...nr !
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 43
Proof
n
Step 1: There are n ways to form the first group of size n1 .
1
n−n1
Step 2: There are n2 ways to form the second group of size n2 .
...
n−n1 −n2 −...−nr−1
Step r : There are nr ways to form the last group of size nr .
Example:
Q: How many distinct vectors (x1 , x2 , ..., xr ) can satisfy the constraints.
A:
n−1
r−1
Q: How many distinct vectors (x1 , x2 , ..., xr ) can satisfy the constraints.
A:
n+r−1
,
r−1
by padding r zeros after n one’s.
Example:
Solution:
8 6
3 × 3 = 56 × 20 = 1120
(b): How many, if 2 of the men refuse to serve together?
8
4 2
4
3 × 3 + 1 2 = 56 × 16 = 896
(c): How many, if 2 of the women refuse to serve together?
6 2
6 6
3 + 1 2 × 3 = 50 × 20 = 1000
(d): How many, if 1 man and 1 woman refuse to serve together?
7
5 7 5
7 5
3 3 + 3 2 + 2 3 = 350 + 350 + 210 = 910
8 6 7 5
Or 3 3 − 2 2 = 910
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 50
Xn
n i−1
= , n≥k
k k−1
i=k
There is an interesting proof (different from the book) using the result for the
number of integer solutions of equations, by a recursive argument.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 51
Proof:
n−1
If x1 = 1, there are still f (n, k) possibilities, i.e., k−1 .
n−2
If x1 = 2, there are still f (n − 1, k) possibilities, i.e., k−1
...
k−1
If x1 = (n + 1) − k , there are still f (k, k) possibilities, i.e., k−1
n+1−k n+1−k
X
X n−j
f (n + 1, k + 1) = f (n + 1 − j, k) =
j=1 j=1
k−1
n+1−k
X
f (n + 1, k + 1) = f (n + 1 − j, k)
j=1
n+1−k
X
n−j
=
j=1
k−1
n−1 n−2 k−1
= + + ... +
k−1 k−1 k−1
k−1 k n−1
= + + ... +
k−1 k−1 k−1
n
X i−1
=
k−1
i=k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 54
A:
4
50 40 X 10 40
− = 138910 =
4 4 k 4−k
k=1
———-
10 49
Can we also compute the answer using 1 3 = 184240? No.
10 49
It is not correct because the two sub-tasks, 1 and 3 , are overlapping; and
hence we can not simply apply the multiplicative principle.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 55
Axioms of Probability
Section 2.1
In this chapter we introduce the concept of the probability of an event and then
show how these probabilities can be computed in certain situations. As a
preliminary, however, we need the concept of the sample space and the events of
an experiment.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 57
Example:
S = {girl, boy}, if the output of an experiment consists of the determination of
the gender of a newborn.
Example:
If the experiment consists of flipping two coins (H/T), the the sample space
S = {(H, H), (H, T ), (T, H), (T, T )}
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 58
Example:
If the outcome of an experiment is the order of finish in a race among 7 horses
having post positions, 1, 2, 3, 4, 5, 6, 7, then
S = {all 7! permutations of (1, 2, 3, 4, 5, 6, 7)}
Example:
If the experiment consists of tossing two dice, then
S = {(i, j), i, j = 1, 2, 3, 4, 5, 6}, What is the size |S|?
Example:
If the experiment consists of measuring (in hours) the lifetime of a transistor, then
S = {t : 0 ≤ t ≤ ∞}.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 59
Definition of Events:
Example:
S = {(H, H), (H, T ), (T, H), (T, T )},
E = {(H, H)} is an event. E = {(H, T ), (T, T )} is also an event
Example:
S = {t : 0 ≤ t ≤ ∞},
E = {t : 1 ≤ t ≤ 5} is an event,
But {t : −10 ≤ t ≤ 5} is NOT an event.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 60
Set Union:
A ∪ B = the event which occurs if either A OR B occurs.
Example:
S = {1, 2, 3, 4, 5, ...} = the set of all positive integers.
A = {1}, B = {1, 3, 5}, C = {2, 4, 6},
Set Intersection:
A ∩ B = AB = the event which occurs if both A AND B occur.
Example:
S = {1, 2, 3, 4, 5, ...} = the set of all positive integers.
A = {1}, B = {1, 3, 5}, C = {2, 4, 6},
A ∩ B = {1}, B ∩ C = {} = ∅, A∩B∩C =∅
Mutually Exclusive:
If A ∩ B = ∅, then A and B are mutually exclusive.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 62
Set Complement:
Ac = S − A = the event which occurs only if A does not occur.
Example:
S = {1, 2, 3, 4, 5, ...} = the set of all positive integers.
A = {1, 3, 5, ...} = the set of all positive odd integers.
B = {2, 4, 6, ...} = the set of all positive even integers.
Ac = S − A = B , B c = S − B = A.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 63
Commutative Laws:
A∪B =B∪A A∩B =B∩A
Associative Laws:
(A ∪ B) ∪ C = A ∪ (B ∪ C) = A ∪ B ∪ C
(A ∩ B) ∩ C = A ∩ (B ∩ C) = A ∩ B ∩ C = ABC
Distributive Laws:
(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) = AC ∪ BC
(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)
DeMorgan’s Laws
n
!c n
[ \
Ei = Eic
i=1 i=1
n
!c n
\ [
Ei = Eic
i=1 i=1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 65
Sn c Tn c
Proof of ( i=1 Ei ) = i=1 Ei
Sn c Tn
Let A = ( i=1 Ei ) and B = i=1 Eic
1. Prove A ⊆B
Suppose x∈ A. =⇒ x does not belong to any Ei .
Tn
=⇒ x belongs to every Ei . =⇒ x ∈ i=1 Eic = B .
c
2. Proof B ⊆A
Suppose x∈ B . =⇒ x belongs to every Eic .
Sn c
=⇒ x does not belong to any Ei . =⇒ x ∈ ( i=1 Ei ) = A.
3. Consequently A =B
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 66
Tn c Sn c
Proof of ( i=1 Ei ) = E
i=1 i
Let Fi = Eic .
Sn c Tn
We have proved ( i=1 Fi ) = i=1 Fic
n
!c n
!c
\ \
Ei = Fic
i=1 i=1
n
!c !c n n
[ [ [
= Fi = Fi = Eic
i=1 i=1 i=1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 67
For each event E in the sample space S , there exists a value P (E), referred to
as the probability of E .
Axiom 2: P (S) = 1
Consequence 1: P (∅) = 0.
Consequence 2:
Sn
If S= i=1 Ei and all mutually-exclusive events Ei are equally likely, meaning
P (E1 ) = P (E2 ) = ... = P (En ), then
1
P (Ei ) = , i = 1 to n
n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 69
Proposition 1: P (E c ) = 1 − P (E)
Proof:
E ⊂ F =⇒ F = E ∪ (E c ∩ F )
=⇒ P (F ) = P (E) + P (E c ∩ F ) ≥ P (E)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 70
Proof:
E ∪ F = (E − EF )∪(F − EF )∪EF , union of three mutually exclusive sets.
=⇒ P (E ∪ F ) = P (E − EF ) + P (F − EF ) + P (EF )
= P (E) − P (EF ) + P (F ) − P (EF ) + P (EF )
= P (E) + P (F ) − P (EF )
A Generalization of Proposition 3:
P (E ∪ F ∪ G)
=P (E) + P (F ) + P (G) − P (EF ) − P (EG) − P (F G) + P (EF G)
Proof:
Let B = F ∪ G. Then P (E ∪ B) = P (E) + P (B) − P (EB)
P (B) = P (F ∪ G) = P (F ) + P (G) − P (F G)
We will temporarily skip this proposition and relevant examples 5m, 5n, and 5o.
However, we will come back to those material at a later time.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 73
Example:
Q: How many members of this club play at least one of these sports?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 74
Solution:
|E|
Number of outcomes in E
P (E) = =
Number of outcomes in S |S|
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 76
Example: If two dice are rolled, what is the probability that the sum of the
upturned faces will equal 7?
S = {(i, j), i, j = 1, 2, 3, 4, 5, 6}
E = {??}
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 77
Example: If 3 balls are randomly drawn from a bag containing 6 white and 5
black balls, what is the probability that one of the drawn balls is white and the
other two black?
S = {??}
E = {??}
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 78
Generate (small) k random permutations and every times store the minimums.
|S1 ∩ S2 | a
P = Pr (min{π (S1 )} = min{π (S2 )}) = =
|S1 ∪ S2 | f1 + f2 − a
where a = |S1 ∩ S2 |, f1 = |S1 |, f2 = |S2 |.
Poker Example (2.5.5f): A poker hand consists of 5 cards. If the cards have
distinct consecutive values and are not all of the same suit, we say that the hand
is a straight. What is the probability that one is dealt a straight?
A 2 3 4 5 6 7 8 9 10 J Q K A
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 83
Example (The Birthday Attack): If n people are present in a room, what is the
probability that no two of them celebrate their birthday on the same day of the
year (365 days) ? How large need n be so that this probability is less than 12 ?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 85
Example (The Birthday Attack): If n people are present in a room, what is the
probability that no two of them celebrate their birthday on the same day of the
year? How large need n be so that this probability is less than 12 ?
0.9
0.8
0.7
0.6
p(n)
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60
n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 87
Matlab Code
n = 5:60;
for i = 1:length(n)
p(i) = 1;
for j = 2:n(i);
p(i) = p(i) * (365-j+1)/365;
end;
end;
figure; plot(n,p,’linewidth’,2); grid on; box on;
xlabel(’n’); ylabel(’p(n)’);
(a) What is the probability that there are no offensive-defensive roommate pairs?
(b) What is the probability that there are 2i offensive-defensive roommate pairs?
Solution Steps:
Solution Steps:
n+1/2 −n
√
Stirling’s approximation: n! ≈ n e 2π
Be careful!
A Famous Brain-Teaser
• Q: Do you stick with the door you chose in the first place, or do you switch?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 95
• Jerry: I’ve heard that you will take a plane next week. Are you afraid that
there might be terrist activities?
• Jerry: So?
1 1
• Tom: Therefore, to dramatically reduce the chance from 106 to 1012 , I will
bring one bomb to the plane myself!
• Jerry: AH...
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 96
Two events E, F ⊆ S.
P (E|F ): conditional probability that E occurs given that F occurs.
If P (F ) > 0, then
P (EF )
P (E|F ) =
P (F )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 98
P (EF )
P (E|F ) =
P (F )
Example (3.2.2c):
In the card game bridge the 52 cards are dealt out equally to 4 players — called
East, West, North, and South. If North and South have a total of 8 spades among
them, what is the probability that East has 3 of the remaining 5 spades?
P (EF )
P (E|F ) =
P (F )
=⇒
Example:
Two solutions:
2. Working with the original sample space and basic formula — More difficult!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 102
Desired probability:
Desired probability:
b r b−1 r
b k n−k k−1 n−k
P (B) = , P (Bk ) = r+b
, P (Bk |B) = r+b−1
r+b
n n−1
Therefore,
b−1 r r+b
P (Bk |B)P (B) k−1 n−k b n k
P (B|Bk ) = = r+b−1
× × b r
=
P (Bk ) r+b n
n−1 k n−k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 104
Example (3.2.2f):
Suppose an urn contains 8 red balls and 4 white balls. We draw 2 balls from the
urn without replacement. Assume that at each draw each ball in the urn is equally
likely to be chosen.
Two solutions:
Two solutions:
8 7 14
× =
12 11 33
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 106
P (E1 E2 E3 ...En ) = P (E1 )P (E2 |E1 )P (E3 |E1 E2 )...P (En |E1 E2 ...En−1 )
Proof?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 107
Example (3.2.2g):
A: Two solutions:
1. Direct approach.
• Consider 4! permutations
• The desired probability
48 36 24
12 12 12 48! (13!)4
52 39 26 × 4! = 4! = 0.1055
52! (12!)4
13 13 13
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 109
Define
P (E1 E2 E3 E4 )
=P (E1 ) P (E2 |E1 ) P (E3 |E1 E2 ) P (E2 |E1 ) P (E4 |E1 E2 E3 )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 110
• P (E1 ) = 1
39
• P (E2 |E1 ) = 51
26
• P (E3 |E1 E2 ) = 50
13
• P (E4 |E1 E2 E3 ) = 49
39 26 13
• P (E1 E2 E3 E4 ) = 51 50 49 = 0.1055.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 111
Proof:
• By Venn diagram. Or
• By algebra using the conditional probability formula
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 112
Example (3.3.3a):
Q 1: What is the probability that a random person will have an accident within a
fixed one-year period?
Q 2: Suppose a person has an accident within a year. What is the probability that
he/she is accident prone?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 113
Example:
Let p be the probability that the student knows the answer and 1 − p the
probability that the student guesses. Assume that a student who guesses will be
1
correct with probability m , where m is the total number of multiple-choice
alternatives.
Q: What is the conditional probability that a student knew the answer, given that
he/she answered it correctly?
Solution:
P (K|C) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 114
Solution:
Example (3.3.3d):
Q: What is the probability that a person has the disease given that the test result
is positive?
Solution:
P (D|E) = ??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 116
Sn
Suppose F1 , F2 , ..., Fn are mutually exclusive events such as i=1 Fi = S ,
then
n
X
P (E) = P (E|Fi )P (Fi ),
i=1
Example (3.3.3m):
Q 1: What is the probability that a randomly chosen flashlight will give more than
100 hours of use?
Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability
that it was type j ?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 118
Q 1: What is the probability that a randomly chosen flashlight will give more than
100 hours of use?
0.7 × 0.2 + 0.4 × 0.3 + 0.3 × 0.5 = 0.14 + 0.12 + 0.15 = 0.41
Q 2: Given the flashlight lasted over 100 hours, what is the conditional probability
that it was type j ?
0.14
Type 1: 0.41 = 0.3415
0.12
Type 2: 0.41 = 0.3927
0.15
Type 3: 0.41 = 0.3659
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 119
Independent Events
Definition:
P (EF ) = P (E)P (F )
Example: Two coins are flipped and all 4 outcomes are equally likely. Let E be
the event that the first lands heads and F the event that the second lands tails.
Example: Suppose we toss 2 fair dice. Let E denote the event that the sum of
two dice is 6 and F denote the event that the first die equals 4.
Proposition 4.1:
Proof:
Definition:
Definition:
Example (3.4.4f):
Solution:
1 − (1 − p)n
2. exactly k successes occur in the first n trials.
n
pk (1 − p)n−k
k
How is this formula related to the binomial theorem?
pn → 0 if p < 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 126
Example (3.4.4h):
Independent trials, consisting of rolling a pair of fair dice, are performed. What is
the probability that an outcome of 5 appears before an outcome of 7 when the
outcome is the sum of the dice?
Solution 1:
Let En denote the event that no 5 or 7 appears on the first n − 1 trials and a 5
appears on the nth trial, then the desired probability is
∞ ∞
!
[ X
P En = P (En )
n=1 n=1
By independence
P (En ) = P ( no 5 or 7 on the first n − 1 trials)×P ( a 5 on the nth trial)
n−1
P ( no 5 or 7 on the first n − 1 trials) = [P (no 5 or 7 on any trial)]
n−1 n−1
4 6 4 1 13
P (En ) = 1− − =
36 36 36 9 18
A General Result:
P (E)
P (E) + P (F )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 131
If it is assumed that the successive flips of the coin are independent and each flip
results in a head with probability p, what is the probability that A ends up with all
the money if he starts with i units and B starts with N − i units?
Solution:
E = event that A ends with all the money when he starts with i.
H = event that first flip lands heads.
P0 =??
PN =??
P (E|H) =??
P (E|H c ) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 133
Pi − qPi−1 q
=⇒Pi+1 = =⇒ Pi+1 − Pi = (Pi − Pi−1 )
p p
q
P2 − P1 = P1
p
2
q q
P3 − P2 = (P2 − P1 ) = P1
p p
...
i−1
q
Pi − Pi−1 = P1
p
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 134
" 2 i−1 #
q q q
Pi =P1 1 + + + ... +
p p p
i
q
1 − p
=P1 if q 6= p
q
1− p
Because PN = 1, we know
1 − pq
P1 = N if q 6= p
1 − pq
i
1 − pq
Pi = N if q 6= p
1 − pq
At a party n men take off their hats. The hats are then mixed up, and each man
randomly selects one. We say that a match occurs if a man selects his own hat.
n−1 n−1
P (E|M ) = 0, P (M c ) = n , Pn = c
n P (E|M )
1 n−1
P (E|M c ) = n−1 Pn−2 + Pn−1 , Pn = n Pn−1 + n1 Pn−2 .
4. Identify the boundary conditions.
P1 = 0, P2 = 1/2.
5. Prove Pn by induction.
1 1 1 (−1)n
Pn = 2! − 3! + 4! − ... + n! .
(What is a good approximation of Pn ?)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 138
Solution (b):
3. The probability that none of the remaining n − k men select their own hats.
Solution (b):
3. The probability that none of the remaining n − k men select their own hats.
Pn−k
4. Multiplicative counting principle.
(n−k)!
n! × Pn−k
5. Adjustment for selecting k men.
(n−k)! n
n! Pn−k k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 140
P (.|F ) Is A Probability
Proposition 5.1
• 0 ≤ P (E|F ) ≤ 1
• P (S|F ) = 1
• If Ei , i = 1, 2, ..., are mutually exclusive events, then
∞ ∞
!
[ X
P Ei |F = P (Ei |F )
i=1 i=1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 141
Definition:
or, equivalently
There are k + 1 coins in a box. The ith coin will, when flipped, turn up heads with
probability i/k , i = 0, 1, ..., k . A coin is randomly selected from the box and is
then repeatedly flipped.
Q: If the first n flip all result in heads, what is the conditional probability that the
(n + 1)st flip will do likewise?
Key: Conditioning on that the ith coin is selected, the outcomes will be
conditionally independent.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 143
Solution:
k
X
P (H|Fn ) = P (H|Fn Ci )P (Ci |Fn )
i=0
k
X
P (H|Fn ) = P (H|Fn Ci )P (Ci |Fn )
i=0
i
By conditional independence P (H|Fn Ci ) = P (H|Ci ) = k
Definition:
A random variable is a function from the sample space S to the real numbers.
A discrete random variable is a random variable that takes only on a finite (or at
most a countably infinite) number of values.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 146
Example (1a):
Suppose the experiment consists of tossing 3 fair coins. Let Y denote the
number of heads appearing, then Y is a discrete random variable taking on one
of the values 0, 1, 2, 3 with probabilities
1
P (Y = 0) = P {(T, T, T )} =
8
3
P (Y = 1) = P {(H, T, T ), (T, H, T ), (T, T, H)} =
8
3
P (Y = 2) = P {(T, H, H), (H, T, H), (H, H, T )} =
8
1
P (Y = 3) = P {(H, H, H)} =
8
Check
3 3
!
[ X
P {Y = i} = P {Y = i} = 1.
i=0 i=0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 147
Example (1c)
P (X = 1) = P {H} = p
P (X = 2) = P {(T, H)} = (1 − p)p
P (X = 3) = P {(T, T, H)} = (1 − p)2 p
...
P (X = n − 1) = (1 − p)n−2 p
P (X = n) = (1 − p)n−1
Pn
Check i=1 P (X = i) = 1??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 148
Example (1d)
3 balls are randomly chosen from an urn containing 3 white, 3 red, and 5 black
balls. Suppose that we win $1 for each white ball selected and lose $1 for each
red ball selected. Let X denote our total winnings from the experiments.
5 3 3
5
3 + 1 1 1 55
P (X = 0) = 11
=
3
165
3
3 1
P (X = 3) = P (X = −3) = 11
=
3
165
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 150
There are N distinct types of coupons and each time one obtains a coupon
equally likely from one of the N types, independent of prior selections.
Q: What is P (T = n)?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 151
Solution:
1. Suffice to study P (T
> n)
because P (T = n) = P (T > n) − P (T > n − 1)
S
N
3. P (T > n) = P j=1 Aj
Then what?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 152
N
[ X X
P Aj = P (Aj ) − P (Aj1 Aj2 )
j=1 j j1 <j2
X
+ P (Aj1 Aj2 Aj3 ) + ... + (−1)N +1 P (A1 A2 ...AN )
j1 <j2 <j3
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 153
n
N −1
P (Aj ) =
N
n
N −2
P (Aj1 Aj2 ) =
N
...
n
N −k
P (Aj1 Aj2 ...Ajk ) =
N
...
n
1
P (Aj1 Aj2 ...AjN −1 ) =
N
P (A1 A2 ...AN ) = 0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 154
N
[ X X
P (T > n) = P Aj = P (Aj ) − P (Aj1 Aj2 )
j=1 j j1 <j2
X
+ P (Aj1 Aj2 Aj3 ) + ... + (−1)N +1 P (A1 A2 ...AN )
j1 <j2 <j3
n
n n
N −1 N N −2 N N −3
=N − +
N 2 N 3 N
n
N N 1
+ ... + (−1)
N −1 N
N −1 n
X N N −j
= (−1)j+1
j=1
j N
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 155
F (x) = P {X ≤ x}
Q: Find
Solution:
• P {X = 0};
P {X = 0} = c = e−λ .
• P {X > 2};
P {X > 2} =1 − P {X = 0} − P {X = 1} − P {X = 2}
λ2
=1 − c − cλ − c
2
−λ −λ λ2 e−λ
=1 − e − λe − .
2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 159
Definition:
X
E(X) = xp(x)
x;p(x)>0
The expected value is a weighted average of possible values that X can take on.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 160
Example Find E(x) where X is the outcome when we roll a fair die.
6 6
X 1 1X 7
E(X) = i = i=
i=1
6 6 i=1 2
E(X) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 161
Solution:
X = {??}
P (X) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 162
Example (4a)
Find E(X 2 ).
Solution: Let Y = X 2.
P {Y = 0} =P {X = 0} = 0.5
P {Y = 1} =P {X = 1 OR X = −1}
=P {X = 1} + P {X = −1}
=0.2 + 0.3 = 0.5
Therefore
Verify
X
P {X = xi }x2i
i
Proof:
X X X
g(xi )p(xi ) = g(xi )p(xi )
i j i:g(xi )=yj
X X
= yj p(xi )
j i:g(xi )=yj
X
= yj P {g(X) = yj }
j
=E(g(X)).
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 166
E(aX + b) = aE(X) + b.
2
V ar(X) = E (X − µ)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 168
2
V ar(X) = E(X 2 ) − (E(X)) = E(X 2 ) − µ2
Proof:
2
V ar(X) =E (X − µ)
X
= (x − µ)2 p(x)
x
X
= (x2 − 2µx + µ2 )p(x)
x
X X X
2 2
= x p(x) − 2µ xp(x) + µ p(x)
x x x
=E(X 2 ) − 2µ2 + µ 2
=E(X 2 ) − µ2 .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 169
V ar(a) = V ar(b) = 0
V ar(aX + b) = a2 V ar(X)
E(aX + b) = aE(x) + b
P {X = 1} = p(1) = p,
P {X = 0} = p(0) = 1 − p.
Properties:
E(X) = p
V ar(X) = p(1 − p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 172
X ∼ Binomial(n, p).
n
X
X= Yi , Yi ∼ Bernoulli(p)
i=1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 174
Example: Five fair coins are flipped. If the outcomes are assumed
independent, find the probability mass function of the numbers of heads obtained.
Solution:
• P {X = i} =?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 175
Properties of X ∼ binomial(n, p)
Proof:??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 176
n
X n i
E(X) = i p (1 − p)n−i
i=0
i
n
X n!
= i pi (1 − p)n−i
i=0
i!(n − i)!
n
X n!
= i pi (1 − p)n−i
i=1
i!(n − i)!
n
X (n − 1)!
=np pi−1 (1 − p)(n−1)−(i−1)
i=1
(i − 1)!((n − 1) − (i − 1))!
n−1
X (n − 1)!
=np pj (1 − p)(n−1)−j
j=0
j!((n − 1) − j)!
=np
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 177
n n
X n X n!
E(X 2 ) = i2 pi (1 − p)n−i = i2 pi (1 − p)n−i
i=0
i i=1
i!(n − i)!
n
X (n − 1)!
=np i pi−1 (1 − p)(n−1)−(i−1)
i=1
(i − 1)!((n − 1) − (i − 1))!
n−1
X (n − 1)!
=np (j + 1) pj (1 − p)(n−1)−j
j=0
j!((n − 1) − j)!
=npE(Y + 1) Y ∼ binomial(n − 1, p)
=np((n − 1)p + 1)
2
V ar(X) = E(X 2 ) − (E(X)) = np((n − 1)p + 1) − (np)2 = np(1 − p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 178
n!
P {X = i} = pi (1 − p)n−i
(n − i)!i!
(1 − p)n pi
= n(n − 1)(n − 2)...(n − i + 1)
i! (1 − p)i
n i
(1 − p) (np) n(n − 1)(n − 2)...(n − i + 1)
=
i! (1 − p)i ni
e−λ λi
≈
i!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 179
n
λ
lim 1− = e−λ
n→∞ n
Check
∞ ∞ i
−λ λ
X X
p(i) = e = 1??
i=0 i=0
i!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 181
−λ i
n i n−i e λ
p (1 − p) ≈
i i!
• Poisson is widely used for modeling rare (p ≈ 0) events (but not too rare).
– The number of misprints on a page of a book
– ...
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 182
n = 20 p = 0.2 n = 20 p = 0.2
0.2 0.2
0.15 0.15
p(i)
p(i)
0.1 0.1
0.05 0.05
0 0
0 5 10 15 20 0 5 10 15 20
i i
Matlab Code
function comp_binom_poisson(n,p);
Q: Calculate the probability that there is at least one error on this page.
P (X ≥ 1) = 1 − P (X = 0) = 1 − e−1/2 = 0.393.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 185
X ∼ P oisson(λ) Y ∼ binomial(n, p)
X ∼ P oisson(λ)
∞ i ∞ i−1 ∞ j
X
−λ λ X
−λ λ X
−λ λ
E(X) = ie =λ e =λ e =λ
i=1
i! i=1
(i − 1)! j=0
j!
∞ i ∞ i−1
2
X
2 −λ λ X
−λ λ
E(X ) = i e =λ ie
i=1
i! i=1
(i − 1)!
∞ j
−λ λ
X
=λ (j + 1)e = λ (E(X) + 1) = λ2 + λ
j=0
j!
V ar(X) = λ2 + λ − λ2 = λ
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 187
P {X = n} = (1 − p)n−1 p, n = 1, 2, 3, ...
P∞ P∞
Check n=1 P {X = n} = n=1 (1 − p)n−1 p = 1??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 188
Expectation X ∼ Geometric(p).
∞
X
E(X) = n(1 − p)n−1 p (q = 1 − p)
n=1
=p + 2qp + 3q 2 p + 4q 3 p + 5q 4 p + ...
2 3 4
=p 1 + 2q + 3q + 4q + 5q + ... =??
We know
1
1 + q + q 2 + q 3 + q 4 + ... =
1−q
How about
I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 189
I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =??
First trick
q × I = q + 2q 2 + 3q 3 + 4q 4 + 5q 5 + ...
2 3 1 4
I − qI = 1 + q + q + q + q + ... =
1−q
1
=⇒I =
(1 − q)2
p 1
=⇒E(X) = pI = =
(1 − q)2 p
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 190
I = 1 + 2q + 3q 2 + 4q 3 + 5q 4 + ... =??
Second trick
d 2 3 4 5
I= q + q + q + q + q + ...
dq
d 1 1
I= =
dq 1 − q (1 − q)2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 191
Variance X ∼ Geometric(p).
∞
X
E(X 2 ) = n2 (1 − p)n−1 p (q = 1 − p)
n=1
2 2 2 2 3 2 4
=p 1 + 2 q + 3 q + 4 q + 5 q + ...
J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 192
J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =??
Trick 1
qJ = q + 22 q 2 + 32 q 3 + 42 q 4 + 52 q 5 + ...
J(1 − q) = 1 + (22 − 12 )q + (32 − 22 )q 2 + (42 − 32 )q 3 + ...
= 1 + 3q + 5q 2 + 7q 3 + ...
J(1 − q)q = q + 3q 2 + 5q 3 + 7q 4 + ...
2 1+q
J(1 − q)2 = 1 + 2q + 2q 2 + 2q 3 + ... = −1 + =
1−q 1−q
1+q p(2 − p
2 2−p 1−p
J= E(X ) = = V ar(X) =
(1 − q)3 p3 p2 p2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 193
J = 1 + 22 q + 32 q 2 + 42 q 3 + 52 q 4 + ... =??
Trick 2
∞ ∞
X
2 n−1
X d n
J= n q = nq
n=1 n=1
dq
"∞ #
d X
n d q
= nq = E(X)
dq n=1 dq p
q d 1 q 1
= =
p dq 1 − q p (1 − q)2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 194
X ∼ N B(r, p)
n−1 r
P {X = n} = p (1 − p)n−r , n = r, r + 1, r + 2, ...
r−1
∞
X n−1 r
E(X k ) = nk p (1 − p)n−r
n=r
r−1
∞
X (n − 1)!
= nk pr (1 − p)n−r
n=r
(r − 1)!(n − r)!
Now what?
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 196
∞
X n!
E(X k ) = nk−1 pr (1 − p)n−r
n=r
(r − 1)!(n − r)!
∞
r X
k−1 n!
= n pr+1 (1 − p)n−r , (m = n + 1)
p n=r r!(n − r)!
∞
r X (m − 1)!
= (m − 1)k−1 pr+1 (1 − p)m−1−r
p m=r+1 r!(m − 1 − r)!
r
= E[(Y − 1)k−1 ], Y ∼ N B(r + 1, p)
p
r
E(X k ) = E[(Y − 1)k−1 ], Y ∼ N B(r + 1, p)
p
Let k = 1. Then
r r
E(X) = E[(Y − 1)0 ] =
p p
Let k = 2. Then
r2 r r+1
E(X ) = E[(Y − 1)] = −1
p p p
r(1 − p)
2 2
V ar(X) = E(X ) − E X =
p2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 198
Example (4.8.8a) An urn contains N white and M black balls. Balls are
randomly selected, one at a time, until a black one is obtained. Each selected ball
is replaced before the next one is drawn.
M
Solution: This is a geometric random variable with p= M +N .
n−1
M N n−1
n−1 N M
P (X = n) = (1 − p) p= =
M +N M +N (M + N )n
1 M +N N
E(X) = = =1+
p M M
Solution:
Solution:
p = 21 , r = N + 1.
4. Remember there are k matches remaining in the right-hand pocket.
n = 2N − k + 1.
6. Adjustment by considering either left-hand or right-hand can be empty.
2N −k 1 2N −k+1
2 N 2 .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 203
m N −m
i n−i
P {X = i} = N
, i = 0, 1, ..., n.
n
n − (N − m) ≤ i ≤ min(n, m)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 204
m
If the n balls are chosen with replacement, then it is binomial n, p = N .
??nm
E(X) ≈ np = ??
N
?? nm m
V ar(X) ≈ np(1 − p) = 1− ??
N N
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 205
n m N −m
X
k i n−i
E(X k ) = i N
i=1 n
n
X m! (N − m)! n!(N − n)!
= ik−1
i=1
(i − 1)!(m − i)! (n − i)!(N − m − n − i)! N!
n−1
X m! (N − m)! n!(N − n)!
= (j + 1)k−1
j=0
j!(m − 1 − j)! (n − 1 − j)!(N − m − n − 1 − j)! N!
nm k−1
= E (Y + 1) , Y ∼ HG(N − 1, m − 1, n − 1)
N
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 206
F (x) = P {X ≤ x}
The function f is called the probability density function of the random variable X .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 208
Some Facts
R∞
• P {X ∈ (−∞, ∞)} = −∞
f (x)dx = 1
Rb
• P {X ∈ (a, b)} = P {a ≤ X ≤ b} = a
f (x)dx
Ra
• P {X = a} = a
f (x)dx = 0
Ra
• F (a) = P {X ≤ a} = P {X < a} = −∞
f (x)dx
• F (∞) = 1, F (−∞) = 0
d
• f (x) = dx F (x)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 209
Solution:
d d 1
fY (x) = FY (y) = FX (y/2) = fX (y/2)
dy dy 2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 212
http://en.wikipedia.org/wiki/Derivative
• Trigonometric functions
d d d
dx sin x = cos x, dx cos x = − sin x, dx tan x = sec2 x
• Inverse trigonometric functions
d √ 1 d √ 1
dx arcsin x = 1−x2
, dx arccos x = − 1−x2
,
d 1
dx arctan x = 1+x2 ,
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 213
Integration by Parts
Z b Z b
b
f (x)dx = f (x)x|a − xf ′ (x)dx
a a
Z b Z b
f (x)d[g(x)] = f (x)g(x)|ba − g(x)f ′ (x)dx
a a
Z b Z b Z b
b
f (x)h(x)dx = f (x)d[H(x)] = f (x)H(x)|a − H(x)f ′ (x)dx
a a a
Examples:
Z b Z b
′
log xdx =x log x|ba − x [log x] dx
a a
b
1
Z
=b log b − a log a − x dx
a x
=b log b − a log a − b + a.
b b b
1 1 1
Z Z Z
′
x log xdx = log xd[x ] = x2 log x|ba −2
x2 [log x] dx
a 2 a 2 2 a
1 2 1 2 1 b
Z
= b log b − a log a − xdx
2 2 2 a
1 2 1 2 1 2 2
= b log b − a log a − b −a
2 2 4
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 215
Z b Z b Z b
xn ex dx = xn d[ex ] = xn ex |ba −n ex xn−1 dx
a a a
Z b Z b Z b
b
xn cos xdx = xn d[sin x] = xn sin x|a −n sin x xn−1 dx
a a a
Z b Z b Z b
n n
sin x sin 2xdx =2 sin x sin x cos xdx = 2 sinn+1 xd[sin x]
a a a
b
2 n+2
= sin x
n+2 a
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 216
Expectation
X
Find E e .
Two solutions:
Solution 1: Y = eX .
X
FY (y) =P {Y ≤ y} = P e ≤ y
Z log y
=P {X ≤ log y} = 2ydy = log2 y
0
Therefore,
2
fY (y) = log y, 1≤y≤e
y
Z ∞ Z e
E(Y ) = yfY (y)dy = 2 log ydy
−∞ 1
e
1
Z
e
= 2y log y|1 − 2y dy = 2
1 y
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 219
Solution 2:
Z 1 Z 1
X
2xex dx = 2xdex
E e =
0 0
Z 1
1
= 2xex |0 −2 ex dx
0
=2e − 2(e − 1) = 2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 220
Proof:
Z ∞ Z ∞ Z ∞
P {X > x}dx = fX (y)dydx
0 0 x
Z ∞Z y
= fX (y)dxdy
0 0
Z ∞ Z y
= fX (y) dx dy
0 0
Z ∞
= fX (y)ydy = E(X)
0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 222
• V ar(aX + b) = a2 V ar(X)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 223
Cumulative function
0
if x≤a
F (x) = x−a
b−a if a<x<b
1 x≥b
if
Expectation
a+b
E(X) =
2
Variance
(b − a)2
V ar(X) =
12
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 225
2
Definition: X is a normal random variable, X ∼ N µ, σ if its density is
given by
1 − (x−µ)
2
Check that
∞ ∞
1
Z Z
(x−µ)2
− 2σ2
f (x)dx = √ e dx = 1.
−∞ −∞ 2πσ
x−µ
Note that, by letting z= σ ,
∞ ∞
1 1 − z2
Z 2
Z
− (x−µ)
√ e 2σ2 dx = √ e 2 dz = I.
−∞ 2πσ −∞ 2π
∞
1
Z 2
− (x−µ)
E(X) = x√ e 2σ dx 2
−∞ 2πσ
Z ∞ Z ∞
1 (x−µ)2
− 2σ2 1 − (x−µ)
2
= (x − µ) √ e dx + µ√ e 2σ2 dx
−∞ 2πσ −∞ 2πσ
Z ∞
1 − (x−µ)
2
= (x − µ) √ e 2σ2 d[x − µ] + µ
−∞ 2πσ
Z ∞
1 − 2σz2
= z√ e 2 dz + µ
−∞ 2πσ
=µ.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 229
2 2
V ar(X ) =E (X − µ)
Z ∞
2 1 (x−µ)2
− 2σ2
= (x − µ) √ e d[x − µ]
−∞ 2πσ
Z ∞
2 1 − 2σz2
= z √ e 2
dz
−∞ 2πσ
Z ∞ 2
2 z 1 − z22 h z i
=σ 2
√ e 2σ d
−∞ σ 2π σ
Z ∞
2 1
2
2 − t2
=σ t √ e d [t]
−∞ 2π
Z ∞ 2
2 1 t2
− 2 t
=σ t√ e d
−∞ 2π 2
Z ∞
2 1 h t2 i
=σ −t √ d e− 2
−∞ 2π
Z ∞ ∞
1 t2 1 t2
=σ 2 √ e− 2 dt − √ σ 2 te− 2 = σ2
−∞ 2π 2π −∞
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 230
2
− t2
Why limt→∞ te = 0??
2
− t2 t [t]′ 1
lim te = lim = lim 2 ′ = lim t2 /2 = 0
t→∞ t→∞ et2 /2 t→∞ et /2 t→∞ te
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 231
Binomial(n, p) ≈ N (µ, σ 2 ),
n = 10 p = 0.2
0.35
0.3
0.25
Density (mass) function
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
x
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 233
n = 20 p = 0.2
0.25
0.2
Density (mass) function
0.15
0.1
0.05
0
−10 −5 0 5 10 15 20 25
x
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 234
n = 50 p = 0.2
0.16
0.14
0.12
Density (mass) function
0.1
0.08
0.06
0.04
0.02
0
−10 −5 0 5 10 15 20 25 30
x
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 235
n = 100 p = 0.2
0.1
0.09
0.08
0.07
Density (mass) function
0.06
0.05
0.04
0.03
0.02
0.01
0
−10 0 10 20 30 40 50
x
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 236
n = 1000 p = 0.2
0.035
0.03
0.025
Density (mass) function
0.02
0.015
0.01
0.005
0
100 150 200 250 300
x
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 237
Matlab code
function NormalApprBinomial(n,p);
figure;
bar((0:n),binopdf(0:n,n,p),’g’);hold on; grid on;
x = mu - 3*sigma2:0.001:mu+3*sigma2;
plot(x,normpdf(x,mu,sqrt(sigma2)),’r-’,’linewidth’,2);
xlabel(’x’);ylabel(’Density (mass) function’);
title([’n = ’ num2str(n) ’ p = ’ num2str(p)]);
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 238
2 x−µ
• If X ∼ N µ, σ , then Y = σ ∼ N (0, 1).
Solution:
X−µ X−3
Let Y = σ = 3 . Then Y ∼ N (0, 1).
2−3 X −3 5−3
P {2 < X < 5} =P < <
3 3 3
1 2 2 1
=P − < Y < =Φ −Φ −
3 3 3 3
=Φ (0.6667) − 1 + Φ (0.3333)
≈0.7475 − 1 + 0.6306 ≈ 0.3781
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 240
X ∼ EXP (λ).
Cumulative function:
Z x
F (x) = λe−λx dx = 1 − e−λx , (x ≥ 0).
0
F (∞) =??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 241
Moments:
Z ∞
E(X n ) = xn λe−λx dx
0
Z ∞
=− xn de−λx
0
Z ∞
= e−λx nxn−1 dx
0
n
= E(X n−1 )
λ
E(X) = λ1 , E(X 2 ) = 2
λ2 , V ar(X) = 1
λ2 , E(X n ) = n!
λn ,
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 242
• For instance, the amount of time (starting from now) until an earthquake
occurs, or until a new war breaks out, etc.
Solution:
R∞
0
f (x)dx = 1??
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 245
√
Γ(1) = 1. Γ(0.5) = π. Γ(n) = (n − 1)!.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 246
X ∼ Gamma(α, λ).
Expectation and Variance:
α α
E(X) = λ, V ar(X) = λ2 .
∞
λe−λx (λx)α−1
Z
E(X) = x dx
0 Γ(α)
∞
e−λx (λx)α
Z
= dx
0 Γ(α)
Γ(α + 1) 1 ∞ e−λx (λx)α
Z
= λ dx
Γ(α) λ 0 Γ(α + 1)
Γ(α + 1) 1
=
Γ(α) λ
α
= .
λ
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 248
X ∼ Beta(a, b).
Expectation and Variance:
a
E(X) = ,
a+b
ab
V ar(X) =
(a + b)2 (a + b + 1)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 250
Given the density function fX (x) and cumulative function FX (x) for a random
variable X , what is the general formula for the density function fY (y) of the
random variable Y = g(X)?
1
fY (y) = y 1/n−1 , 0≤y≤1
n
−1
d −1
=fX g (y) g (y)
dy
fX g −1 (y) d g −1 (y)
if y = g(x) for some x
dy
fY (y) =
0 if y 6= g(x) for all x
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 252
Example 7b: Y = X2
2
FY (y) = P {Y ≤ y} = P X ≤ y
√ √
=P {− y ≤ X ≤ y}
√ √
=FX ( y) − FX (− y)
1 √ √
fY (y) = √ (fX ( y) + fX (− y))
2 y
1
Q: Find the distribution of X .
R∞
One can check −∞ f (x) = 1, but E(X) = ∞.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 254
Assume z < 0,
Definition: For any two random variables X and Y , the joint cumulative
probability distribution is defined as
FX (x) =P {X ≤ x}
=P {X ≤ x, Y ≤ ∞}
=F (x, ∞)
The distribution of Y is
FY (y) =P {Y ≤ y} = F (∞, y)
Example:
X
pY (y) = P {Y = y} = p(x, y)
x:p(x,y)>0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 259
Example 1a: Suppose that 3 balls are randomly selected from an urn
containing 3 red, 4 white, and 5 blue balls. Let X and Y denote, respectively, the
number of red and white balls chosen.
5
3 10
p(0, 0) = 12 = 220
3
4 5
1 2 40
p(0, 1) = 12
=
3
220
4 5
2 1 30
p(0, 2) = 12
=
3
220
4
3 4
p(0, 3) = 12 = 220
3
84
pX (0) = p(0, 0) + p(0, 1) + p(0, 2) + p(0, 3) =
220
9
3 84
pX (0) = 12 =
3
220
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 261
X and Y are jointly continuous if there exists a function f (x, y) defined for all
real x and y , having the property that for every set C of pairs of real numbers
Z Z
P {(X, Y ) ∈ C} = f (x, y)dxdy
(x,y)∈C
∂2
f (x, y) = F (x, y)
∂x∂y
• (a) Determine C .
• (b) P {X > 1, Y < 1},
• (c) P {X < Y },
• (d) P {X < a}.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 264
Solution (a):
Z ∞ Z ∞
1= Ce−x e−2y dxdy
0 0
Z ∞ Z ∞
=C e−x dx e−2y dy
0 0
1 −2y 0 C
=C × e−x |0∞ × e |∞ =
2 2
Therefore, C = 2.
Solution (b):
Z 1 Z ∞
P {X > 1, Y < 1} = 2e−x e−2y dxdy
0 1
=e−x |1∞ e−2y |01 = e−1 (1 − e−2 )
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 265
Solution (c):
Z Z
P {X < Y } = 2e−x e−2y dxdy =
(x,y):x<y
Z ∞Z y
= 2e−x e−2y dxdy
0 0
Z ∞ Z y
= e−x dx 2e−2y dy
0 0
Z ∞
−y
−2y
= 1−e 2e dy
0
Z ∞
= 2e−2y − 2e−3y dy
0
2 1
=1 − =
3 3
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 266
Solution (d):
P {X ∈ A, Y ∈ B} = P {X ∈ A} × P {Y ∈ B}
equivalently
P {X ≤ x, Y ≤ y} = P {X ≤ x} × P {Y ≤ y}
or, equivalently,
p(x, y) = pX (x)pY (y) for all x, y when X and Y are discrete
f (x, y) = fX (x)fY (y) for all x, y when X and Y are continous
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 273
Example 2b: Suppose the number of people that enter a post office on a
given day is a Poisson random variable with parameter λ.
Each person that enters the post office is male with probability p and a female
with probability 1 − p.
Let X and Y , respectively, be the number of males and females that enter the
post office.
Key: Suppose we know there are N = n people that enter a post office. Then
the number of males is a binomial with parameters (n, p).
[N = X + Y ] ∼ P oi(λ).
Therefore,
∞
X
P {X = i, Y =} = P {X = i, Y = j|N = n}P {N = n}
n=0
=P {X = i, Y = j|N = i + j}P {N = i + j}
because P {X = i, Y = j|N 6= i + j} = 0.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 275
P {X = i, Y = j}
=P {X = i, Y = j|X + Y = i + j}P {X + Y = i + j}
i+j
i+j i λ
= p (1 − p)j e−λ
i (i + j)!
pi (1 − p)j e−λ λi λj
=
i!j!
pi (1 − p)j e−λ(p+(1−p)) λi λj
=
i!j!
i j
(λp) (λ(1 − p))
= e−λp e−λ(1−p)
i! j!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 276
Solution (b)
∞
X
P {X = i} = P {X = i, Y = j}
j=0
∞ i
j
X (λp) (λ(1 − p))
= e−λp e−λ(1−p)
j=0
i! j!
∞
i X j
(λp) (λ(1 − p))
= e−λp e−λ(1−p)
i! j=0
j!
i
−λp (λp)
= e
i!
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 277
Example 2c: A man and a woman decide to meet at a certain location. Each
person independently arrives at a time uniformly distributed between 12 noon and
1 pm.
Q: Find the probability that the first to arrive has to wait longer than 10 minutes.
Solution:
Let X = number of minutes by which one person arrives later than 12 noon.
Let Y = number of minutes by which the other person arrives later than 12 noon.
X and Y are independent.
Z Z
P {|X − Y | ≥ 10} = f (x, y)dxdy
|x−y|>10
Z Z
= fX (x)fY (y)dxdy
|x−y|>10
1 1
Z Z
=2 dxdy
60 60
y<x−10
60 x−10
1
Z Z
=2 dydx
10 0 3600
25
= .
36
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 279
Example 2f:
(a): Two random variables X and Y with joint density given by
are independent.
equivalently,
n
Y
P {X1 ≤ x1 , X2 ≤ x2 , ..., Xn ≤ xn } = P {Xi ≤ xi },
i=1
Solution:
Z Z Z
P {X ≥ Y Z} = f (x, y, z)dxdydz
x≥yz
Z 1 Z 1 Z 1
= dxdydz
0 0 yz
Z 1 Z 1
= (1 − yz) dydz
0 0
1
z
Z
= 1− dz
0 2
3
=
4
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 283
∞
∂
Z
fZ (z) = FZ (z) = fY (y)fX (z − y)dy
∂z −∞
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 284
The discrete case: X and Y are two independent discrete random variables,
then Z = X + Y has probability mass function:
X
pZ (z) = pY (j)pX (z − j)
j
——————–
Z = X + Y ∼ Binomial(2, p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 285
X 2
pZ (0) = pY (j)pX (0 − j) =pY (0)pX (0) = (1 − p)2 = (1 − p)2
j
0
X
pZ (1) = pY (j)pX (1 − j) =pY (0)pX (1 − 0) + pY (1)pX (1 − 1)
j
2
=(1 − p)p + p(1 − p) = p(1 − p)
1
X 2 2
pZ (2) = pY (j)pX (2 − j) =pY (1)pX (2 − 1) = p2 = p
j
2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 286
X
pZ (z) = pY (j)pX (z − j).
j
1. Suppose 0 ≤ z ≤ m, then 0 ≤ j ≤ z
Suppose 0 ≤ z ≤ m, then 0 ≤ j ≤ z
X z
X
pZ (z) = pY (j)pX (z − j) = pY (j)pX (z − j)
j j=0
z
X m n
= pj (1 − p)m−jpz−j (1 − p)n−z+j
j=0
j z−j
z
X m n
=pz (1 − p)m+n−z
j=0
j z−j
m+n z
= p (1 − p)m+n−z
z
because we have seen (e.g., using a combinatorial argument)
z
X m n m+n
= .
j=0
j z − j z
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 288
Z ∞ Z 1
fZ (z) = fY (y)fX (z − y)dy = fX (z − y)dy
−∞ 0
1. If 0 ≤ z ≤ 1, then 0 ≤ y ≤ z .
If 0 ≤ z ≤ 1, then 0 ≤ y ≤ z .
Z 1 Z z
fZ (z) = fX (z − y)dy = dy = z.
0 0
If 1 < z ≤ 2, then z − 1 ≤ y ≤ 1.
Z 1 Z 1
fZ (z) = fX (z − y)dy = dy = 2 − z.
0 z−1
z
0<z≤1
fZ (z) = 2−z 1<z<2
0
otherwise
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 290
Z ∞ Z z
fZ (z) = fY (y)fX (z − y)dy = fY (y)fX (z − y)dy
−∞ 0
z
λe−λy (λy)s−1 λe−λ(z−y) (λ(z − y))t−1
Z
= dy
0 Γ(s) Γ(t)
z
e−λz λ2 λs+t−2
Z
−λy s−1 λy t−1
= e y e (z − y) dy
Γ(s)Γ(t) 0
z
λe−λz λs+t−1
Z
= y s−1 (z − y)t−1 dy
Γ(s)Γ(t) 0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 292
W ∼ Gamma(s + t, λ).
Because fZ (z) is also a density function, we must have
1
Γ(s + t)
Z
θ s−1 (1 − θ)t−1 dθ = 1
Γ(s)Γ(t) 0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 293
Γ(s)Γ(t)
B(s, t) = .
Γ(s + t)
Thus,
1
Γ(s + t) Γ(s + t) Γ(s)Γ(t)
Z
θ s−1 (1 − θ)t−1 dθ = =1
Γ(s)Γ(t) 0 Γ(s)Γ(t) Γ(s + t)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 294
n n
!
X X
X1 + X2 + ... + Xn = Xi ∼ Gamma si , λ
i=1 i=1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 295
Proposition 3.2 If Xi , i
= 1 to n, are independent random variables that
are normally distributed with respective parameters ui and σi2 , i = 1 to n. Then
n n n
!
X X X
X1 + X2 + ... + Xn = Xi ∼ N µi , σi2
i=1 i=1 i=1
————————
It suffices to show
σ12 σ22
X 1 + X 2 ∼ N µ1 + µ2 , +
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 296
Z ∞
fZ (z) = fX (z − y)fY (y)dy
−∞
Z ∞ (z−y−µx )2
" (y−µy )2
#
1 − 2 1 − 2σ2
= √ e 2σx
√ e y dy
−∞ 2πσx 2πσy
Z ∞ 2 2
− (z−y−µ x ) − (y−µy )
1 1 2 2
=√ √ e 2σx 2σy
dy
2π −∞ 2πσx σy
Z ∞ (z−y−µx )2 σy 2 +(y−µ )2 σ 2
y x
1 1 − 2 2
=√ √ e 2σx σy
dy
2π −∞ 2πσx σy
Z ∞ y 2 (σx
2 +σ 2 )−2y(µ σ 2 −(µ −z)σ 2 )+(µ −z)2 σ 2 +µ2 σ 2
y y x x y x y y x
1 1 − 2 σ2
=√ √ e 2σx y dy
2π −∞ 2πσx σy
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 297
Z ∞
fZ (z) = fX (z − y)fY (y)dy
−∞
∞ y 2 (σx
2 +σ 2 )−2y(µ σ 2 −(µ −z)σ 2 )+(µ −z)2 σ 2 +µ2 σ 2
y y x x y x y y x
1 1
Z
− 2 2
=√ √ e 2σx σy
dy
2π −∞ 2πσx σy
∞
r
1 B D2 −4EC
Z 2
− Cy +Dy+E
= √ e 2B dy = 2
e 8BC
−∞ 2πA CA
R∞ (y−µ) 2
1 −
because −∞ √
2πσ
e 2σ 2 dy = 1, for any µ, σ 2 .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 298
∞
1
Z 2
− Cy +Dy+E
√ e 2B dy
−∞ 2πA
∞ y 2 + D y+ E
1
Z C C
−
2B
= √ e C dy
−∞ 2πA
2 2
∞ ( y+ D ) − D 2 + E
2C C
1
Z 4C
− B
= √ e 2
C dy
−∞ 2πA
2 2
− D 2 +E Z
− 4C B C ∞ ( y+ D )
2C
1 −
2B
=e 2
C
√ q e C dy
−∞ B √AB
2π C
C
2
− D 2 +E r r
C
− 4C
2 B B B D2 −4EC
=e C = e 8BC
CA2 CA2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 299
r
1 B D2 −4EC
fZ (z) = √ 2
e 8BC
2π CA
2 −(µ −z)σ 2 )2 −((µ −z)2 σ 2 +µ2 σ 2 )(σ 2 +σ 2 )
(µy σx x y x y y x x y
1 2 2 2 2
2σx σy (σx +σy )
=√ q e
2π σx2 + σy2
2
( x y) z−µ −µ
1 − 2(σ2 +σ2 )
=√ q e x y
2π σx2 + σy2
σx2 σy2 .
Therefore, Z = X + Y ∼ N µx + µy , +
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 300
P (EF )
Recall P (E|F ) = P (F ) .
p(x, y)
pX|Y (x|y) =
pY (y)
f (x, y)
fX|Y (x|y) =
fY (y)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 303
—————–
————————–
R∞
P {X > 1|Y = y} = 1
fX|Y (x|y)dx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 306
Solution:
e−x/y e−y
f (x, y) y e−x/y 1
fX|Y (x|y) = = R ∞ e−x/y e−y = R ∞ −x/y = e−x/y
fY (y) dx 0
e dx y
0 y
∞ ∞
1 −x/y
Z Z
P {X > 1|Y = y} = fX|Y (x|y)dx = e dx = e−1/y
1 1 y
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 307
The random variables X and Y have a bivariate normal distribution if, for
constants, ux , uy , σx , σy , −1 < ρ < 1, their joint density function is given, for
all −∞ < x, y < ∞, by
(x−µx )2 (y−µy )2 (x−µx )(y−µy )
1 1
− 2(1−ρ 2) 2
σx
+ 2
σy
−2ρ σx σy
f (x, y) = p e
2πσx σy 1 − ρ2
2πσx 2πσy
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 308
Z ∞
fX (x) = f (x, y)dy
−∞
∞ (x−µx )2 (y−µy )2 (x−µx )(y−µy )
1 1
Z
− 2(1−ρ 2) 2
σx
+ 2
σy
−2ρ σx σy
= p e dy
−∞ 2πσx σy 1 − ρ2
∞ (x−µx )2 (x−µx )t
1 1
+t2 −2ρ
Z
− 2(1−ρ 2) 2
σx σx
= p e dt
−∞ 2πσx 1 − ρ2
∞ (x−µx ) 2 +t2 σ 2 −2ρ(x−µ )tσ
1
Z x x x
− 2(1−ρ2 )σx2
= p e dt
2πσx 1 − ρ2
−∞
∞
r 2
1 B D2 −4EC 1
Z 2
− Ct +Dt+E − (x−µx)
2
= √ e 2B dt = 2
e 8BC = √ e 2σx
−∞ 2πA CA 2πσx
√ p
A = 2π 1 − ρ2 σx , B = (1 − ρ2 )σx2 , C = σx2 ,
D = −2ρ(x − µx )σx , E = (x − µx )2
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 309
Therefore,
f (x, y)
fX|Y (x|y) =
fY (y)
A useful trick: Only worry about the terms that are functions of x; the rest
terms are just normalizing constants.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 310
g(x) 1
f (x) = R ∞ = Cg(x) ∝ g(x), where C = R∞
−∞
g(x)dx −∞
g(x)dx
R∞
is a probability density function. (One can verify −∞ f (x) = 1).
(x−1)2
−
For example, g(x) = e 2 is essentially the density function of N (1, 1).
2 −2x 2 −2xy
−x −x
Similarly, g(x) =e 2 , g(x) = e 2 are both (essentially) the
density functions of normals.
————–
1
Z ∞
−x
2 −2xy √ y2
= e 2 dx = 2πe 2
C −∞
Therefore,
1 − y2 1 − (x−y)2
f (x) = Cg(x) = √ e 2 × g(x) = √ e 2
2π 2π
is the density function of N (y, 1), where y is merely a constant.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 312
(X , Y ) is a bivariate normal.
f (x, y)
fX|Y (x|y) = ∝ f (x, y)
fY (y)
1 (x−µx )2 (x−µx )(y−µy )
− 2(1−ρ 2) 2
σx
−2ρ σx σy
∝e
σ
(x−µx )2 −2ρ(x−µx )(y−µy ) x
σy
− 2 2
∝e 2(1−ρ )σx
2
σ
x−µx −ρ(y−µy ) x
σy
− 2 2
∝e 2(1−ρ )σx
Therefore,
σx
X|Y ∼ N µx + ρ(y − µy ) , (1 − ρ2 )σx2
σy
σy
Y |X ∼ N µy + ρ(x − µx ) , (1 − ρ2 )σy2
σx
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 313
Q: What is the conditional distribution of this success probability given that the
n + m trials result in n successes?
Solution:
Solution:
Define
The ordered values X(1) ≤ X(2) ≤ ... ≤ X(n) are known as the order
statistics corresponding to the random variables X1 , X2 , ..., Xn .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 317
Then, by independence
∂F n (x)
fX(n) (x) = = nF n−1 (x)f (x)
∂x
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 318
Then, by independence
FX(1) (x) =1 − P (X1 > x)P (X2 > x)...P (Xn > x)
=1 − [1 − F (x)]n
for x1 ≤ x2 ≤ ... ≤ xn ,
fX(1) ,X(2) ,...,X(n) (x1 , x2 , ..., xn ) = n!f (x1 )f (x2 )...f (xn ),
P {X(1) = x1 , X(2) = x2 }
=P {{X1 = x1 , X2 = x2 } OR {X2 = x1 , X1 = x2 }}
=P {X1 = x1 , X2 = x2 } + P {X2 = x1 , X1 = x2 }
=f (x1 )f (x2 ) + f (x1 )f (x2 )
=2!f (x1 )f (x2 )
Warning, this is a heuristic argument, not a rigorous proof. Please read the book.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 320
Therefore,
n k−1 n−k
fX(k) (x) = f (x) [F (x)] [1 − F (x)]
1, k − 1, n − k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 322
Z y
FX(k) (y) = fX(k) (x)
−∞
Z y
n k−1 n−k
= f (x) [F (x)] [1 − F (x)] dx
−∞ 1, k − 1, n − k
Z y
n k−1 n−k
= f (x) [F (x)] [1 − F (x)] dx
1, k − 1, n − k −∞
Z F (y)
n
= [z]k−1 [1 − z]n−k dz
1, k − 1, n − k 0
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 323
n
X
= P {j of the Xi ’s are ≤ y}
j=k
n
X n j n−j
= [F (y)] [1 − F (y)]
j
j=k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 324
n
P {y ∈ [X(n) , ∞)} = [F (y)]n
n
n
P {y ∈ [X(n−1) , X(n) )} = [F (y)]n−1 [1 − F (y)]
n−1
n
P {y ∈ [X(k) , X(k+1) )} = [F (y)]k [1 − F (y)]n−k
k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 325
Y1 = g1 (X1 , X2 ), Y2 = g2 (X1 , X2 ).
x1 = h1 (y1 , y2 ), x2 = h2 (y1 , y2 )
Two conditions
Solution:
y1 = g1 (x1 , x2 ) = x1 + x2 , y2 = g2 (x1 , x2 ) = x1 − x2 ,
=⇒
x1 = h1 (y1 , y2 ) = y1 +y
2
2
, x2 = h2 (y1 , y2 ) = y1 −y2
2 ,
1 1
J(x1 , x2 ) = = −2;
1 −1
Therefore,
∂g1 x
=p = cos θ
∂x 2
x +y 2
∂g1 y
=p = sin θ
∂y 2
x +y 2
∂g2 1 −y −y − sin θ
= = =
∂x 1 + (y/x)2 x2 x2 + y 2 r
∂g2 1 1 x cos θ
= 2
= 2 2
=
∂y 1 + (y/x) x x +y r
Y1 = X1 + X2 + X3
Y2 = X1 − X2
Y3 = X1 − X3
Solution:
Y1 + Y2 + Y3 Y1 − 2Y2 + Y3 Y1 + Y2 − 2Y3
X1 = , X2 = , X3 =
3 3 3
a b c
d
e f = aei + bf g + cdh − ceg − bdi − af h
g h i
x1 0 0 0 ... 0
× x2 0 0 ... 0
× × x3 0 ... 0
= x1 x2 x3 ...xn
× × × x4 ... 0
× × × × ... 0
× × × × × xn
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 336
If a ≤ X ≤ b, then a ≤ E(X) ≤ b.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 337
Proposition 2.1
Solution:
Z Z
E(|X − Y |) = |x − y|f (x, y)dxdy
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 339
Solution:
Linearity of Expectations
E(X + Y + Z) = E((X + Y ) + Z)
=E(X + Y ) + E(Z) = E(X) + E(Y ) + E(Z)
n
" #
1 X
E(X̄) =E Xi
n i=1
" n #
1 X
= E Xi
n i=1
n
1X
= E[Xi ]
n i=1
n
1X
= µ
n i=1
1
= nµ
n
=µ
X ∼ Binomial(n, p)
X = Z1 + Z2 + ... + Zn
Because E(Zi ) = p,
n n
!
X X
E(X) = E Zi = E(Zi ) = np
i=1 i=1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 345
Q: Find E(X).
Note that the Xi ’s are not independent; but we only need the expectation.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 346
Therefore,
mn
E(X) = .
N
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 347
A group of N people throw their hats into the center of a room. The hats are
mixed up and each person randomly selects one.
Q: Find the expected number of people that selects their own hat.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 349
X = X1 + X2 + ... + XN ,
where
1 if the i th person selects his own hat
Xi =
0 otherwise
1
E(Xi ) = P {Xi = 1} =
N
Thus
N
X 1
E(X) = E(Xi ) = N =1
i=1
N
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 350
1 1
f (x) = , −∞ < x < ∞
π x2 + 1
∞
x 1
Z
E(X) = dx = 0 ??? (Answer is NO)
−∞ π x2 + 1
because
∞ ∞
|x| 1 2x 1
Z Z
dx = dx
−∞ π x2 + 1 0 π x2 + 1
∞
1 1
Z
2
= 2+1
d[x + 1]
0 π x
1 2
∞
= log(x + 1)0
π
1
= log ∞ = ∞
π
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 352
1 if A occurs
i
Ii =
0 otherwise
Pn
X= i=1 Ii : number of these events Ai that occur. We know
n
X n
X
E(X) = E(Ii ) = P (Ai )
i=1 i=1
Suppose we are interested in the number of pairs of events that occur, eg. Ai Aj
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 353
Pn
X= i=1 Ii : number of these events Ai that occur.
X(X−1) X
2 = 2 : number of pairs of event that occur.
⇐⇒
P
i<j Ii Ij : number of pairs of event that occur, because
1 if both A and A occur
i j
Ii Ij =
0 otherwise
Therefore,
X(X − 1) X
E = P (Ai Aj )
2 i<j
X
2
E X − E (X) = 2 P (Ai Aj )
i<j
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 354
2
X n 2
p = n(n − 1)p2
E X − E (X) = 2 P (Ai Aj ) = 2
i<j
2
E(X 2 ) = n2 p2 − np2 + np
2 2 2
2
V ar(X) = n p − np + np − [np] = np(1 − p)
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 355
mm−1
P (Ai Aj ) = P (Ai )P (Aj |Ai ) = , i 6= j
N N −1
2
X n mm−1
E X − E (X) = 2 P (Ai Aj ) = 2
i<j
2 N N −1
2 m m − 1 nm
E(X ) = n(n − 1) +
N N −1 N
Let Ai = event that person i selects his/her own hat in the match problem, i =1
to N .
1 1
P (Ai Aj ) = P (Ai )P (Aj |Ai ) = , i 6= j
N N −1
X = number of people who select their own hat. We have shown E(X) = 1.
2
X 1
E(X ) − E(X) = 2 P (Ai Aj ) = N (N − 1) =1
i<j
N (N − 1)
E(XY ) = E(X)E(Y ).
Proof:
Z ∞ Z ∞
E[h(X) g(Y )] = g(x)h(y)f (x, y)dxdy
−∞ −∞
Z ∞ Z ∞
= g(x)h(y)fX (x)fY (y)dxdy
−∞ −∞
Z ∞ Z ∞
= g(x)fX (x)dx h(y)fY (y)dy
−∞ −∞
=E[h(X)] E[g(Y )]
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 360
Covariance
Equivalently
Proof:
• Cov(X, Y ) = Cov(Y, X)
• Cov(X, X) = V ar(X)
• Cov(aX, Y ) = aCov(X, Y )
•
n
X m
X n X
X m
Cov Xi , Yj = Cov(Xi , Yj )
i=1 j=1 i=1 j=1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 364
Proof:
n m n m n m
!
X X X X X X
Cov Xi , Yj =E Xi Yj − E Xi E Yj
i=1 j=1 i=1 j=1 i=1 j=1
n X
X m n
X m
X
=E Xi Yj − E(Xi ) E(Yj )
i=1 j=1 i=1 j=1
n X
X m n X
X m
= E(Xi Yj ) − E(Xi )E(Yj )
i=1 j=1 i=1 j=1
n X
X m
= [E(Xi Yj ) − E(Xi )E(Yj )]
i=1 j=1
n X
X m
= Cov(Xi , Yj )
i=1 j=1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 365
n
X m
X n X
X m
Cov Xi , Yj = Cov(Xi , Yj )
i=1 j=1 i=1 j=1
If n = m and Xi = Yi , then
n n n n X
n
!
X X X X
V ar Xi =Cov Xi , Xj = Cov(Xi , Xj )
i=1 i=1 j=1 i=1 j=1
n
X X
= Cov(Xi , Xi ) + Cov(Xi , Xj )
i=1 i6=j
n
X X
= V ar(Xi ) + Cov(Xi , Xj )
i=1 i6=j
n
X X
= V ar(Xi ) + 2 Cov(Xi , Xj )
i=1 i<j
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 366
n n
1X 2 1 X
X̄ = Xi , S = (Xi − X̄)2
n i=1 n − 1 i=1
Solution:
n
!
1 X
V ar(X̄) =V ar Xi
n i=1
n
!
1 X
= 2 V ar Xi
n i=1
n
1 X
= 2 V ar (Xi )
n i=1
n
1 X 2 σ2
= 2 σ =
n i=1 n
!2
n n
2 1 X 1 X
2
X
E(X̄ ) = 2 E Xi = 2E Xi + 2 Xi Xj
n i=1
n i i<j
n
1 X 2
X
= 2 E(Xi ) + 2 E(Xi Xj )
n i i<j
n
1 X 2 2
X
= 2 (µ + σ ) + 2 µ2
n i i<j
2
1 σ
= 2 n µ2 + σ 2 + n(n − 1)µ2 = + µ2
n n
Or, directly,
2 2 σ2
+ µ2
E(X̄ ) =V ar X̄ + E X̄ =
n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 371
Therefore,
n
2 2 2 2
E(S ) = µ + σ − E X̄
n−1
2
n σ
= µ2 + σ 2 − − µ2
n−1 n
=σ 2
2 1
Pn
Sample variance S = n−1 i=1 (Xi − X̄)2 is an unbiased estimator of σ 2 .
1 1
Why n−1 instead of n ?
Pn 2
i=1 (X i − X̄) is a sum of n terms with only n − 1 degrees of freedom,
1
P n
because X̄ = n i=1 Xi .
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 372
The Standard Clever Technique for evaluating E(S 2 ) (as in the textbook)
n
X n
X
(n − 1)S 2 = (Xi − X̄)2 = (Xi −µ + µ − X̄)2
i=1 i=1
Xn n
X n
X
= (Xi − µ)2 + (X̄ − µ)2 − 2 (Xi − µ)(X̄ − µ)
i=1 i=1 i=1
Xn
= (Xi − µ)2 − n(X̄ − µ)2
i=1
n
" #
X
E[(n − 1)S 2 ] =(n − 1)E(S 2 ) = E (Xi − µ)2 − n(X̄ − µ)2
i=1
σ2 2
=nV ar(Xi ) − nV ar(X̄) = nσ − n = (n − 1)σ 2
n
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 373
Cov(X, Y )
ρ(X, Y ) = p
V ar(X)V ar(Y )
−1 ≤ ρ(X, Y ) ≤ 1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 374
Cov(X, Y )
−1 ≤ ρ(X, Y ) = p ≤1
V ar(X)V ar(Y )
!
X Y
V ar p +p
V ar(X) V ar(Y )
V ar(X) V ar(Y ) 2Cov(X, Y )
= + +p
V ar(X) V ar(Y ) V ar(X)V ar(Y )
=2 (1 + ρ(X, Y )) ≥ 0
!
X Y
V ar p −p
V ar(X) V ar(Y )
V ar(X) V ar(Y ) 2Cov(X, Y )
= + − p
V ar(X) V ar(Y ) V ar(X)V ar(Y )
=2 (1 − ρ(X, Y )) ≥ 0
Therefore, ρ(X, Y ) ≤ 1.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 376
1 −x/y −y
f (x, y) = e e , 0 < x, y < ∞
y
Q: Compute
Z ∞
E[X|Y = y] = xfX|Y (x|y)dx
−∞
Solution:
f (x, y) f (x, y)
fX|Y (x|y) = = R∞
fY (y) −∞
f (x, y)dx
1 −x/y −y
ye e
= R ∞ 1 −x/y −y
0 y
e e dx
1 −x/y −y
ye e
= 0
e−x/y −y
e ∞
1 −x/y −y
ye e
=
e−y
1
= e−x/y , y>0
y
∞ ∞
x −x/y
Z Z
E[X|Y = y] = xfX|Y (x|y)dx = e dx = y
−∞ 0 y
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 380
E(X) = E[E[X|Y ]]
Q: What is the expected amount of money spent in the store on a given day?
Solution:
N
!
X
E(X) =E Xi
i=1
N
" !#
X
=E E Xi |N
i=1
N
" #
X
=E E(Xi )
i=1
=E(N 8) = 8E(N )
=8 × 50 = $400
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 384
—————————–
Let P (S = i) = Pi .
1 2 3
P2 = , P3 = , P4 = ,
36 36 36
4 5 6
P5 = , P6 = , P7 = ,
36 36 36
5 4 3
P8 = , P9 = , P10 = ,
36 36 36
2 1
P11 = , P12 =
36 36
Equivalently,
i−1
Pi = P14−i = , i = 2, 3, ..., 7
36
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 386
Obviously,
1, if i = 2, 3, 7, 11, 12
E(R|S = i) =
1 + E(Z), otherwise
Therefore,
6 10
X Pi X Pi
E(R) =1 + + = 3.376
i=4
Pi + P7 i=8 Pi + P7
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 387
If Y is discrete, then
X
P (E) = P (E|Y = y)P (Y = y)
y
If Y is continuous, then
Z ∞
P (E) = P (E|Y = y)fY (y)dy
−∞
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 388
We are presented with n distinct prizes in sequence. After being presented with a
prize we must decide whether to accept it or to reject it and consider the next
prize. The objective is to maximize the probability of obtaining the best prize.
Assuming that all n! orderings are equally likely.
—————————–
n
X
Pk (best) = Pk (best|X = i)P (X = i)
i=1
n
1 X
= Pk (best|X = i)
n i=1
Obviously
Pk (best|X = i) = 0, if i ≤k
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 390
Exhaustive search is the best thing to do, which only involves a linear scan.
n
k X 1
Pk (best) =
n i−1
i=k+1
k 1 1 1
= + + ... +
n k k+1 n−1
n−1 k−1
!
k X1 X1
= −
n i=1 i i=1
i
k
≈ (log(n − 1) − log(k − 1))
n
k n−1
= log
n k−1
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 393
k n−1 k n
Pk (best) ≈ log ≈ log
n k−1 n k
′ 1 n k1 n n
[Pk (best)] ≈ log − = 0 =⇒ log = 1 =⇒ k =
n k nk k e
n n1
Also, when k = e, Pk (best) ≈ ne ≈ 0.36788, as n → ∞.
Cornell University, BTRY 4080 / STSCI 4080 Fall 2009 Instructor: Ping Li 394
Conditional Variance
Proposition 5.2:
P
N
Q: Compute V ar i=1 Xi .
Solution:
N N N
! " !# " #!
X X X
V ar Xi =E V ar Xi |N + V ar E Xi |N
i=1 i=1 i=1
=E (V ar(X)N ) + V ar(N E(X))
=V ar(X)E(N ) + E 2 (X)V ar(N )