Week 1: Review of Probability and Statistics

Week 1: Review of Probability and Statistics

Key Concepts:
Key Concepts:
Sample space and Events
Rules of Probability
Conditional probability and Independence
Computing Probabilities
Key Concepts: Probability: Definition and Properties
Sample space and Events (i) 0 ≤ P(A) ≤ 1
Rules of Probability (ii) P(S) = 1
Conditional probability and Independence (iii) if A1 , A2 , · · · , and B are mutually exclusive,
i.e. Ai ∩ Aj = ∅, for i 6= j, then
Computing Probabilities
X
P(∪Ai ) = P(Ai )
i
(iv) Not hard to see that P(∅) = 0

(v) P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
If A1 , A2 , · · · An are n events, then
n
P(U1 Ai ) =
X X
P(Ai ) − P(Ai ∩ Aj )+
i<j
n−1 n
X
P(Ai ∩ Aj ∩ Ak ) − · · · + (−) P(∩1 Ai )
i<j<k
Week 1: Review of Probability
Conditional Probability & Independence:
The conditional probability of A given B,
P(A∩B)
P(A|B) = P(B)
Events A and B are independent if
P(A ∩ B) = P(A)P(B)
P(A∩B)
P(A|B) = P(B)
P(A ∩ B) = P(A)P(B)
Discrete Probability
P(A∩B)
P(A|B) = P(B)
P(A ∩ B) = P(A)P(B)
Discrete Probability
List of values x1 , x2 , · · · , xn
Associated probabilities:
P(x1 ), P(x2 ), · · · , P(xn )
P i) ≥ 0
(i) P(x
(ii) i P(xi ) = 1
Examples: Bernoulli(p), Binomial(n,p),
hypergeometric, and Poisson.
Conditional Probability & Independence: Continuous Probability
The conditional probability of A given B, A continuous probability distribution is completely
P(A∩B) defined via the probability density function, f . The p.d.f
P(A|B) = P(B) satisfies
Events A and B are independent if (i) f ≥ 0
R∞
(ii) −∞ f (x)dx = 1
P(A ∩ B) = P(A)P(B)
(iii) The probabilities are calculated by
Discrete Probability Z b
P(a, b) = f (x)dx
List of values x1 , x2 , · · · , xn a
Associated probabilities:
(iv) Note: for continuous distribution, p(x) = 0 for
P(x1 ), P(x2 ), · · · , P(xn ) all x. So,
P(a, b) = P[a, b]
P i) ≥ 0
(i) P(x
Examples: Normal, Uniform, exponential ,and
(ii) i P(xi ) = 1 Pareto.
Examples: Bernoulli(p), Binomial(n,p),
hypergeometric, and Poisson.
Continuous Probability on R2
Given a p.d.f f (x1 , x2 )
for any a < b; c < d
Z d Z b
P(a < X1 < b, c < X2 < d) = f (x1 , x2 )dx1 dx2
c a
More generally, for densities on Rn

n
for any AR⊂
RR ,
P(A) = A
f (x1 , x2 , · · · , xn )dx1 dx2 · · · dxn
Random Variables
Given a p.d.f f (x1 , x2 )
for any a < b; c < d
Z d Z b
P(a < X1 < b, c < X2 < d) = f (x1 , x2 )dx1 dx2
c a
More generally, for densities on Rn

n
for any AR⊂
RR ,
P(A) = A
f (x1 , x2 , · · · , xn )dx1 dx2 · · · dxn
Random Variables
A real valued random variable X is a function
from a probability space (Ω, P) to R
X : Ω 7→ R
The distribution of X is the probability on R
given by
−1
P(X ∈ A) = P(X (A)) = P({ω : X (ω) ∈ A})
Typically, all that matters is the distribution of X .

The underlying sample space is not very
relevant.
Continuous Probability on R2 Cumulative Distribution function
Given a p.d.f f (x1 , x2 ) The function F : R 7→ [0, 1] defined by
for any a < b; c < d F (t) = P(X ≤ t), is called the (Cumulative)
Distribution Function of X
Z d Z b
F is increasing int, right continuous,
P(a < X1 < b, c < X2 < d) = f (x1 , x2 )dx1 dx2 limt→−∞ F (t) = 0; limt→∞ F (t) = 1
c a
If P(X = x) > 0, then F (x) has a jump at x
with jump size equal to p(x)
n
More generally, for densities on R
If P is continuous then F is continuous
n
for any AR⊂
RR ,
Rt
If f is pdf then F (t) = −∞ f (x)dx and
P(A) = A
f (x1 , x2 , · · · , xn )dx1 dx2 · · · dxn dF (t)
f (x) = dt |x
Random Variables
X : Ω 7→ R
The distribution of X is the probability on R
given by
−1
P(X ∈ A) = P(X (A)) = P({ω : X (ω) ∈ A})
Typically, all that matters is the distribution of X .

The underlying sample space is not very
relevant.
Continuous Probability on R2 Cumulative Distribution function
Given a p.d.f f (x1 , x2 ) The function F : R 7→ [0, 1] defined by
for any a < b; c < d F (t) = P(X ≤ t), is called the (Cumulative)
Distribution Function of X
Z d Z b
F is increasing int, right continuous,
P(a < X1 < b, c < X2 < d) = f (x1 , x2 )dx1 dx2 limt→−∞ F (t) = 0; limt→∞ F (t) = 1
c a
If P(X = x) > 0, then F (x) has a jump at x
with jump size equal to p(x)
n
More generally, for densities on R
If P is continuous then F is continuous
n
for any AR⊂
RR ,
Rt
If f is pdf then F (t) = −∞ f (x)dx and
P(A) = A
f (x1 , x2 , · · · , xn )dx1 dx2 · · · dxn dF (t)
f (x) = dt |x
Random Variables
Moment generating function
The function MX (t) := E(etX ) is called the
X : Ω 7→ R moment generating Function (MGF) of X .
The distribution of X is the probability on R Moment generating function, if it exists in an
given by interval containing 0, huniquelyi determines the
dMX (t)
distribution. E(X ) = dt
−1 t=0
P(X ∈ A) = P(X (A)) = P({ω : X (ω) ∈ A})
MGF may not always exist.
Typically, all that matters is the distribution of X . E(e−itX ) is called the characteristics function.
The underlying sample space is not very This always exists and has nice properties.
relevant.
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) > 0; i,j P(xi , yj ) = 1
P
PX (xi ) = j P(xi , yj ) is called the marginal
distribution of X
P(xi , yj ) = P(X = xi , Y = yj )
P
P
distribution of X
Joint distribution: continuous
P(xi , yj ) = P(X = xi , Y = yj )
P
P
distribution of X
The joint density of (X , Y ) is given by the joint
pdf f (x, y )
R R
P((X , Y ) ∈ A) = A
f (x, y )dxdy
The marginal density is given by
Z ∞
fX (x) = f (x, y)dy
−∞
Conditional density of Y given X = x is
f (x, y)
f (y|x) =
fX (x)
Joint Distribution: Discrete Independence
their joint probability are described by X and Y are independent if and only if
f (x, y) = fX (x)fY (x) for all x, y
P(xi , yj ) = P(X = xi , Y = yj ) If X and Y are independent then ρ(X , Y ) = 0.
P The converse is not true
for all xi , yj ; P(xi , yj ) > 0; i,j P(xi , yj ) = 1 If X and Y are independent
P V (X + Y ) = V (X ) + V (Y )
distribution of X If X1 and X2 are independent then
MX1 +X2 (t) = MX1 (t) × MX2 (t)
pdf f (x, y )
R R
P((X , Y ) ∈ A) = A
f (x, y )dxdy
The marginal density is given by
Z ∞
fX (x) = f (x, y)dy
−∞
Conditional density of Y given X = x is
f (x, y)
f (y|x) =
fX (x)
Joint Distribution: Discrete Independence
their joint probability are described by X and Y are independent if and only if
f (x, y) = fX (x)fY (x) for all x, y
P(xi , yj ) = P(X = xi , Y = yj ) If X and Y are independent then ρ(X , Y ) = 0.
P The converse is not true
for all xi , yj ; P(xi , yj ) > 0; i,j P(xi , yj ) = 1 If X and Y are independent
P V (X + Y ) = V (X ) + V (Y )
distribution of X If X1 and X2 are independent then
MX1 +X2 (t) = MX1 (t) × MX2 (t)
pdf f (x, y )
R R Inequalities
P((X , Y ) ∈ A) = A
f (x, y )dxdy
The marginal density is given by Markov’s Ineqaulity if X ≥ 0 then
Z ∞ E(X )
fX (x) = f (x, y)dy P(X ≥ a) ≤
−∞ a
Conditional density of Y given X = x is Chebyshev’s inequality
f (x, y) V (x)
f (y|x) = P|(X − E(X )| > a) ≤
fX (x) a2
Week 1: Limit Theorems
Key Concepts:
Key Concepts:
Notions of convergence
Law of Large Number (LLN)
Central Limit Theorem (CLT)
Key Concepts:
Notions of convergence
Notions of Convergence
Let (Ω, P) be a probability space. For each ω ∈ Ω, we
define a sequence of random variable
X1 (ω), X2 (ω), · · ·
We know, what it means to say a sequence of
numbers an converges to a
X1 (ω), X2 (ω), · · · are functions. A natural
definition would be
Xn → X if Xn (ω) → X (ω) for all ω
Note that the underlying probability plays no

role here.
We need a notion of convergence that uses P
Key Concepts: Convergence in Probability
Notions of convergence We say that Xn → X in probability if
P(|Xn − X | > ) = 0, as n → ∞
Notions of Convergence
or conversely, if for every positive ,
Let (Ω, P) be a probability space. For each ω ∈ Ω, we
define a sequence of random variable
X1 (ω), X2 (ω), · · · P(|Xn − X | ≤ ) = 1, as n → ∞.
We know, what it means to say a sequence of

numbers an converges to a This means that by taking n sufficiently large,
one can achieve arbitrarily high probability that
X1 (ω), X2 (ω), · · · are functions. A natural Xn is arbitrarily close to X .
definition would be
Xn → X if Xn (ω) → X (ω) for all ω
Note that the underlying probability plays no

role here.
We need a notion of convergence that uses P
Convergence in Distribution:
Xn → X in distribution if,
FXn (t) → FX (t) for all t such that FX is
continuous at t
The restriction on to points of continuity makes
this definition look artificial (will discuss this in a
moment)
Note: If the limit X has a continuous
distribution, then FX is continuous and we have
FXn (t) → FX (t) for all t
Now, suppose Yn ∼ N(0, σn ); and σn → 0

P(−a < Yn < a) = P( −a
σn < Z <
a
σn )
Xn → X in distribution if,
FXn (t) → FX (t) for all t such that FX is
continuous at t
The restriction on to points of continuity makes
this definition look artificial (will discuss this in a
moment)

P(−a < Yn < a) = P( −a
σn < Z <
a
σn )→1
(why?),
In the topics we cover F will typically be Normal
Xn → X in distribution if, distribution. So we do not have to worry about
FXn (t) → FX (t) for all t such that FX is discontinuous points. There is none.
continuous at t A useful tool for showing convergence in
The restriction on to points of continuity makes distribution is the following:
this definition look artificial (will discuss this in a If MXn (t) → MX , then Xn converges to X .
moment)

P(−a < Yn < a) = P( −a
σn < Z <
a
σn )→1
(why?), since σan → ∞
so we would like to say that the distribution of
Yn converges to the probability concentrated at
0
But Fn (0) = 0.5 for all n and F (0) = 1. So
Fn (0) does not converge to F (0)
F is not continuous at 0. At all other t,
Fn (t) → F (t)
moment) Example: Poisson Approximation to Binomial
Note: If the limit X has a continuous If npn → λ then Bin(n, pn ) converges to
distribution, then FX is continuous and we have Poisson(λ)
FXn (t) → FX (t) for all t MGF of Bin(n, Pn ); Mn (t) = (1 − pn + pn et )n

write pn = λ/n, so that
t n
Now, suppose Yn ∼ N(0, σn ); and σn → 0 Mn (t) = (1 − λ
n (1 − e ))
P(−a < Yn < a) = P( −a

σn < Z <
a
)→1 Since
σn
(1 − xn )n → e−x ; Mn (t) → exp(λ(et − 1))
Yn converges to the probability concentrated at
0
But Fn (0) = 0.5 for all n and F (0) = 1. So
Fn (0) does not converge to F (0)
F is not continuous at 0. At all other t,
Fn (t) → F (t)
moment) Example: Poisson Approximation to Binomial
Note: If the limit X has a continuous If npn → λ then Bin(n, pn ) converges to
distribution, then FX is continuous and we have Poisson(λ)
FXn (t) → FX (t) for all t MGF of Bin(n, Pn ); Mn (t) = (1 − pn + pn et )n

write pn = λ/n, so that
t n
Now, suppose Yn ∼ N(0, σn ); and σn → 0 Mn (t) = (1 − λ
n (1 − e ))
P(−a < Yn < a) = P( −a

σn < Z <
a
)→1 Since
σn
(1 − xn )n → e−x ; Mn (t) → exp(λ(et − 1))
Example: Normal Approximation to Poisson
Yn converges to the probability concentrated at If λ1 , λ2 , ..., be an increasing sequence s.t.
0 λn → ∞ with corresponding sequence of r.vs
But Fn (0) = 0.5 for all n and F (0) = 1. So Xn then Poisson(λn ) converges to
Fn (0) does not converge to F (0) Z ∼ N(0, 1)
F is not continuous at 0. At all other t, MGF of Poisson(λn ); Mn (t) = exp(λn (et − 1))
Fn (t) → F (t) write Xn in its standardized form as Zn
We see that MZn (t) converges to MZ (t)
Law of Large Numbers:
Law of Large Numbers:
Theorem (WLLN)
Let X1 , X2 , · · · , Xi · · · Xn be a sequence of
independent random variables with E(Xi ) = µ and
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any
P
> 0,
P(|X̄n − µ| > ) → 0 as n → ∞
Proof.
We first find E(X̄n ) and Var (X̄n ):
n
1X
E(X̄n ) = E(Xi ) = µ
n i=1
Since the Xi ’s are independent,
n
1 X σ2
Var (X̄n ) = 2
Var (Xi ) =
n i=1 n
The desired result now follows immediately from

Chebyshev’s inequality, which states that
Var (X̄n ) σ2
P(|X̄n − µ| > ) ≤ = → 0, as n →
2 n2
∞
Law of Large Numbers: Problems
1. Let X1 , X2 , · · · be a sequence of independent
Theorem (WLLN) random variables with E(Xi ) = µ and
Var (Xi ) = σi2 . Show that if n−2 ni=1 σi2 → 0,
P
independent random variables with E(Xi ) = µ and then X̄ → µ in probability.
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any 2. Let Xi be as in Problem 1 but with E(Xi ) = µi
P
and n−1 ni=1 µi → µ. Show that X̄ → µ in
P
> 0,
probability.
P(|X̄n − µ| > ) → 0 as n → ∞
Proof.
n
1X
E(X̄n ) = E(Xi ) = µ
n i=1
n
1 X σ2
Var (X̄n ) = 2
Var (Xi ) =
n i=1 n

Var (X̄n ) σ2
P(|X̄n − µ| > ) ≤ = → 0, as n →
2 n2
∞
Law of Large Numbers: Problems
1. Let X1 , X2 , · · · be a sequence of independent
Theorem (WLLN) random variables with E(Xi ) = µ and
Var (Xi ) = σi2 . Show that if n−2 ni=1 σi2 → 0,
P
independent random variables with E(Xi ) = µ and then X̄ → µ in probability.
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any 2. Let Xi be as in Problem 1 but with E(Xi ) = µi
P
and n−1 ni=1 µi → µ. Show that X̄ → µ in
P
> 0,
probability.
P(|X̄n − µ| > ) → 0 as n → ∞ Solution
X1 , X2 , · · · , is a sequence of independent random
Proof. variables with E(Xi ) = µ, Var (Xi ) = σi2 . If
n−2
P 2
σi → 0, show that WLLN holds.
Pn Pn 2
Xi

n 1 1 σi
1X E(X̄ ) = µ Var (X̄ ) = Var =
E(X̄n ) = E(Xi ) = µ n n2
n i=1
By Chebyshev
Pn 2
1 1 σi
n P(|X̄ − µ| > ) ≤
1 X σ2 2 n2
Var (X̄n ) = 2 Var (Xi ) =
n i=1
n
The last term goes to zero by assumption.

Var (X̄n ) σ2
P(|X̄n − µ| > ) ≤ = → 0, as n →
2 n2
∞
Example (Monte Carlo):
Example (Monte Carlo):
Suppose we want to evaluate 01 f (x)dx which
R
is difficult to evaluate analytically

Note: 01 f (x)dx = E(f ) with respect to uniform
R
distribution
Simulate x1 , x2 , · · · , xn (large n) from U(0, 1)
distribution
By WLLN
n Z 1
1X
f (xi ) ≈ f (x)dx
n 1 0
This is called Monte Carlo integration

Example (Monte Carlo): Homework 1-Problem 19 & 20
Suppose we want to evaluate 01 f (x)dx which
R
Find Monte Carlo approximation to
R1
is difficult to evaluate analytically 0
cos2πxdx
Note: 01 f (x)dx = E(f ) with respect to uniform
R
Find an estimate of the standard deviaiton of
distribution the approximation
Simulate x1 , x2 , · · · , xn (large n) from U(0, 1)
distribution
By WLLN
n Z 1
1X
f (xi ) ≈ f (x)dx
n 1 0
This is called Monte Carlo integration

Central Limit Theorem(CLT):
Theorem (CLT)
Let X1 , X2 , · · · , be a sequence of independent
random variables with E(Xi ) = 0, Var (Xi ) = σ 2 with
common distribution F .( that is
X1 , X2 , · · · , i.i.d ∼ F ).
Assume that F has MGF M(t) defined in an interval
around 0. P
n
Let Sn = 1 Xi . Then for all −∞ < x < ∞,

Sn
P √ ≤x → Φ(x) as n → ∞
σ n
where Φ(x) = P(Z ≤ x) is the CDF of standard

normal
Dividing the numerator and denominator of
Theorem (CLT) Sn
√ by n, we get
σ n
Let X1 , X2 , · · · , be a sequence of independent
random variables with E(Xi ) = 0, Var (Xi ) = σ 2 with √ !
nX̄n
common distribution F .( that is P ≤x → Φ(x) as n → ∞
X1 , X2 , · · · , i.i.d ∼ F ). σ
Assume that F has MGF M(t) defined in an interval
around 0. P
n If E(Xi ) = µ we can apply CLT to Xi − µ(This
Let Sn = 1 Xi . Then for all −∞ < x < ∞,
has expected value 0) and so in this case,

Sn √ !
P √ ≤x → Φ(x) as n → ∞ n(X̄n − µ)
σ n P ≤x → Φ(x) as n → ∞
σ
where Φ(x) = P(Z ≤ x) is the CDF of standard
normal Typically
√ we use CLT to get an approximation of
n(X̄n −µ)
Extensions of CLT P σ ≤ x
The central limit theorem can be proved in How good is the approximation?
greater generality
√ If F is symmetric and has tails that die rapidly
Note: sd(Sn ) = nσ, so σS√n n has mean 0 and then the approximation is good
s.d 1
In case when F is highly skewed or has tail that
go to 0 very slowly, we need a large n to get a
good approximation.
Proof. Central Limit Theorem(CLT):
Proof.
M(t) = E(etX ), MSn (t) = E(etSn ) = [M(t)]n

Let !
S√
n √
Sn t
Zn = √ , MZ (t) =E e σ n = MSn (t/σ n)
σ n n
Thus, n
t
MZn (t) = M √
σ n
2 /2
We want to show that , as n → ∞, this goes to et
We will make use of the following result
n
b + an b
1+ → e as an → 0
n
So we also need to express MZn (t) in this form (how?).

Proof.
M(t) = E(etX ), MSn (t) = E(etSn ) = [M(t)]n

Let !
S√
n √
Sn t
σ n n
Thus, n
t
MZn (t) = M √
σ n
2 /2
We want to show that , as n → ∞, this goes to et
We will make use of the following result
n
b + an b
1+ → e as an → 0
n

Note M(0) = 1;
Since E(X ) = 0, M 0 (0) = 0, E(X 2 ) = σ 2 , we have
M 00 (0) = σ 2
By Taylor expansion
0 s2 00 s3 000
M(s) = M(0) + sM (0) + M (0) + M (0)
2 6
s2 s3
M(s) = 1 + σ2 + M 000 (0)
Proof. Central Limit Theorem(CLT): Proof Continued:
Proof. s2 2 s3 000
M(s) = 1 + σ + M (0)
tX
M(t) = E(e ), MSn (t) = E(etSn ) n
= [M(t)] 2 6
Let !
S√ t
Sn t n √ with s = σ
√
n
σ n n
Thus, n

t

t2
M √ =1+ + n

t
MZn (t) = M √ σ n 2n
σ n
2 /2 where
We want to show that , as n → ∞, this goes to et t 000
We will make use of the following result n = M (0)
6n3/2
t3
n
M 000 (0).

b + an b where n =
1+ → e as an → 0 3/2
6n
n t
Show that M σ√ n
can be written as
Note M(0) = 1; t 2 /2 + an

t
Since E(X ) = 0, M 0 (0) = 0, E(X 2 ) = σ 2 , we have M √ =1+
σ n n
M 00 (0) = σ 2
By Taylor expansion
with an → 0 We then have
" #n
t 2 /2 + an
n
s2 00 s3 000

0
t t 2 /2
M(s) = M(0) + sM (0) + M (0) + M (0) M √ = 1+ →e
2 6 σ n n
s2 s3
M(s) = 1 + σ2 + M 000 (0) as required.
Problems
Problems
17. Suppose that a measurement has mean µ and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that
P{|X̄ − µ| < 1} = .95?

Problems
should n be so that
P{|X̄ − µ| < 1} = .95?
( √ √ )
n(X̄ − µ) n
P{|X̄ −µ| < 1} = P | |<
5 5
( √ )
n
≈P |Z | < = .95
5
But we also know that
P {|Z | < 1.96} = .95
√
n
1.96 =
5
2 2
n = (1.96) × 5
Problems Problems
should n be so that
P{|X̄ − µ| < 1} = .95?
( √ √ )
n(X̄ − µ) n
P{|X̄ −µ| < 1} = P | |<
5 5
( √ )
n
≈P |Z | < = .95
5
P {|Z | < 1.96} = .95
√
n
1.96 =
5
2 2
n = (1.96) × 5
Problems Problems
should n be so that
P{|X̄ − µ| < 1} = .95?
Xi weight of ith package
( √ √ ) E(Xi ) = 15, σ = 10
n(X̄ − µ) n
P{|X̄ −µ| < 1} = P | |< P100
5 5 Total weight, T = 1 Xi
( √ ) P(T > 1700) =

n
≈P |Z | < = .95
T − 1500 1700 − 1500

5
P >
10 × 10 10 × 10
≈ P (Z > 2)
P {|Z | < 1.96} = .95
√
n
1.96 =
5
2 2
n = (1.96) × 5
Problems
Problems
Problems
X1 , X2 , · · · , Xn ∼ U(0, 1)
M = min(X1 , X2 , · · · , Xn )
n
P{1 − M < t} = P{M > 1 − t} = 1 − (1 − t)
t
P{1 − M < } = P{n(1 − M) < t}
n
t n
1 − (1 − )
n
n
As n → ∞, 1 − 1 − t
n → 1 − e−t
n(1 − M) → exp(1)
Problems Problems
X1 , X2 , · · · , Xn ∼ U(0, 1)
M = min(X1 , X2 , · · · , Xn )
n
P{1 − M < t} = P{M > 1 − t} = 1 − (1 − t)
t
P{1 − M < } = P{n(1 − M) < t}
n
t n
1 − (1 − )
n
n
As n → ∞, 1 − 1 − t
n → 1 − e−t
n(1 − M) → exp(1)

Week 1: Review of Probability and Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 1: Review of Probability and Statistics

Uploaded by

Copyright:

Available Formats

Week 1: Review of Probability and Statistics

Week 1: Review of Probability and Statistics

(iv) Not hard to see that P(∅) = 0

Events A and B are independent if

Events A and B are independent if

Events A and B are independent if

More generally, for densities on Rn

More generally, for densities on Rn

Typically, all that matters is the distribution of X .

Typically, all that matters is the distribution of X .

Conditional density of Y given X = x is

Conditional density of Y given X = x is

Conditional density of Y given X = x is Chebyshev’s inequality

Xn → X if Xn (ω) → X (ω) for all ω

Note that the underlying probability plays no

We know, what it means to say a sequence of

Xn → X if Xn (ω) → X (ω) for all ω

Note that the underlying probability plays no

FXn (t) → FX (t) for all t

Now, suppose Yn ∼ N(0, σn ); and σn → 0

FXn (t) → FX (t) for all t

Now, suppose Yn ∼ N(0, σn ); and σn → 0

FXn (t) → FX (t) for all t

Now, suppose Yn ∼ N(0, σn ); and σn → 0

FXn (t) → FX (t) for all t MGF of Bin(n, Pn ); Mn (t) = (1 − pn + pn et )n

P(−a < Yn < a) = P( −a

FXn (t) → FX (t) for all t MGF of Bin(n, Pn ); Mn (t) = (1 − pn + pn et )n

P(−a < Yn < a) = P( −a

We first find E(X̄n ) and Var (X̄n ):

Since the Xi ’s are independent,

The desired result now follows immediately from

We first find E(X̄n ) and Var (X̄n ):

Since the Xi ’s are independent,

The desired result now follows immediately from

The desired result now follows immediately from

is difficult to evaluate analytically

This is called Monte Carlo integration

This is called Monte Carlo integration

where Φ(x) = P(Z ≤ x) is the CDF of standard

M(t) = E(etX ), MSn (t) = E(etSn ) = [M(t)]n

So we also need to express MZn (t) in this form (how?).

M(t) = E(etX ), MSn (t) = E(etSn ) = [M(t)]n

So we also need to express MZn (t) in this form (how?).

P{|X̄ − µ| < 1} = .95?

P{|X̄ − µ| < 1} = .95?

But we also know that

P {|Z | < 1.96} = .95

P{|X̄ − µ| < 1} = .95?

But we also know that

P {|Z | < 1.96} = .95

P{|X̄ − µ| < 1} = .95?

Xi weight of ith package

( √ ) P(T > 1700) =

You might also like