MA 6.101 Probability and Statistics: Assistant Professor, IIIT Hyderabad

MA 6.
101
Probability and Statistics
Tejas Bodas
Assistant Professor, IIIT Hyderabad
1 / 16
Recap: Stochastic Simulation
2 / 16
▶ Inverse transform method
2 / 16
▶ Generating Samples from a Discrete random variable
2 / 16
▶ Generating samples from a Continuous random variable Y

with CDF FY (·)
2 / 16

with CDF FY (·)
▶ Define X := FY−1 (U). X also has the same distribution FY .
2 / 16

with CDF FY (·)
▶ FY (Y ) is a uniform random variable.
2 / 16

with CDF FY (·)
▶ FY (Y ) is a uniform random variable.
▶ This helps in checking if data samples you are are from

random variable Y or not.
2 / 16
Convergence of Random Variables
2 / 16
Pointwise Convergence
3 / 16
▶ When do we say that {xn } converges to x ∈ R ?
3 / 16
We say that {xn } converges to x ∈ R (denoted by xn →

x ) if for every ϵ > 0, we can find an N(ϵ) ∈ N such that
for |xn − x | < ϵ for n > N(ϵ).
3 / 16

▶ What about convergence of functions?
3 / 16

▶ When do we say that a sequence of functions Fn (·) converge

to F (·) on the domain R?
3 / 16

▶ When do we say that a sequence of functions Fn (·) converge

to F (·) on the domain R?
We say that the sequence of function Fn (·) converge

to F (·) pointwise if the sequence {Fn (x )} converges to
F (x ) (Fn (x ) → F (x )) for all x ∈ R.
3 / 16
Uniform Convergence
4 / 16
Uniform Convergence
We say that the sequence of function Fn (·) converge to

F (·) pointwise if the sequence {Fn (x )} converges to F (x )
(Fn (x ) → F (x )) for all x ∈ R.
4 / 16
Uniform Convergence

(Fn (x ) → F (x )) for all x ∈ R.
▶ For every x , the sequence {Fn (x )} coverges to F (x ).
4 / 16
Uniform Convergence

(Fn (x ) → F (x )) for all x ∈ R.
▶ For every ϵ, there exists N(ϵ, x ) which can depend on x .
4 / 16
Uniform Convergence

(Fn (x ) → F (x )) for all x ∈ R.
▶ Only those Fn (x ) are ϵ close to F (x ) for which n > N(ϵ, x ).
4 / 16
Uniform Convergence

(Fn (x ) → F (x )) for all x ∈ R.
▶ Only those Fn (x ) are ϵ close to F (x ) for which n > N(ϵ, x ).
If N(ϵ, x ) = N(ϵ) (i.e., independent of x ) for every x ∈ R,

then such convergence of Fn (·) to F (·) is called as uniform
convergence.
4 / 16
Convergence of Sequence of random variables
5 / 16
▶ We will now be interested in the convergence properties of an
infinite sequence of random variables {Xn } to some limiting
random variable X .
5 / 16
random variable X .
▶ What does the convergence Xn → X even mean ?
5 / 16
random variable X .
▶ When you perform the random experiment once, you get a
sequence of realizations {xn } and x .
5 / 16
random variable X .
▶ If you are ’lucky’, maybe xn → x .
5 / 16
random variable X .
▶ But if you were to perform the experiment again, you may not
be so ’lucky’ and get a different sequence {xn′ } which may not
converge to x ′ .
5 / 16
random variable X .
▶ But if you were to perform the experiment again, you may not
be so ’lucky’ and get a different sequence {xn′ } which may not
converge to x ′ .
▶ We will come up with notions of convergence that depend on
how often you see the sequence of realizations converging.
5 / 16
▶ Convergence of Xn → X
6 / 16
▶ Here X could even be a deterministic number.
6 / 16
▶ Xn′ s could be dependent on each other.
6 / 16
▶ Xn′ s could be dependent on each other.
▶ Each random variable Xn could have a different law

(pmf/pdf).
6 / 16
Modes of Convergence (Xn → X )
7 / 16
Pointwise or Sure convergence
7 / 16
{Xn , n ≥ 0} converges to X pointwise or surely if for all

ω ∈ Ω we have limn→∞ Xn (ω) = X (ω)
7 / 16

▶ Consider Ω = {H, T }.
7 / 16

(
1
n if ω = H
▶ Further, Xn =
1
1+ n if ω = T .
7 / 16

( (
1
n if ω = H 0 if ω = H
▶ Further, Xn = and X =
1
1+ n if ω = T . 1 if ω = T .
7 / 16
Almost sure convergence
Xn converges to X almost surely if
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.
8 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.
▶ The set of outcomes where the convergence does not happen

has measure 0.
8 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

has measure 0. P{ω ∈ Ω : limn→∞ Xn (ω) = X (ω)} = 0.
8 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ Consider Ω = [0, 1] where you pick a number uniformly in

[0, 1].
8 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.


[0, 1]. Let Xn (ω) = ω n for all ω ∈ Ω and X (ω) = 0 for all ω.
8 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.


▶ Xn (ω) → X (ω) for ω ∈ [0, 1).
8 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.


▶ Xn (ω) → X (ω) for ω ∈ [0, 1).
▶ Xn (ω) ↛ X (ω) for ω = 1 and P{ω = 1}.
8 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.


▶ Xn (ω) → X (ω) for ω ∈ [0, 1).
▶ Xn (ω) ↛ X (ω) for ω = 1 and P{ω = 1}.
▶ This is almost sure convergence as P{[0, 1)} = 1.
8 / 16
Almost sure (a.s.) convergence
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.
9 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.
▶ Example 2: Strong law of large numbers (SLLN).
Let {Xn , n ≥ 0} denote a sequence of i.i.d random vari-

Pn
ables with mean µ and denote Sn = i=1 Xi . Then
Sn
n → µ a.s.
9 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

Pn
Sn
n → µ a.s.
▶ Toss a biased coin (probability of head is µ) repeatedly. What

is ω and Ω?
9 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

Pn
Sn
n → µ a.s.

is ω and Ω?
▶ Let Xi denote the outcome of the i th toss and Sn denotes the

number of heads in n tosses.
9 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

Pn
Sn
n → µ a.s.

is ω and Ω?

▶ The empirical mean is given by Sn

n .
9 / 16
P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

Pn
Sn
n → µ a.s.

is ω and Ω?

▶ The empirical mean is given by Sn

n .
9 / 16
Detour: Incremental formula for sample mean
10 / 16
▶ Now that we know Snn → µ we can use µˆn := Snn as an

’estimator’ for the mean especially in cases when the
underlying distribution is not known.
10 / 16

▶ Note that the estimator µˆn is a random variable. What is its

cdf? what is its mean & Variance?
10 / 16


▶ µˆn = Sn
n is an ’unbiased estimator’ since E [µˆn ] = µ.
10 / 16


▶ µˆn = Sn
▶ Var (µˆn ) = σ2
n
10 / 16


▶ µˆn = Sn
▶ Var (µˆn ) = σ2
n
▶ We will soon see CLT that will tell the CDF of µˆn without any
information on the law of Xi .
10 / 16
11 / 16
▶ Now given µˆn , suppose you see an additional sample Xn+1 .
11 / 16
▶ How will you compute µ̂n+1 ?
11 / 16

Pn+1
Xi
▶ Naive way : µ̂n+1 = i=1
n+1 .
11 / 16

Pn+1
Xi
n+1 .
▶ There is an incremental formula that uses µˆn .
11 / 16

Pn+1
Xi
n+1 .
1
µ̂n+1 = µˆn + n+1 [Xn+1 − µˆn ]
11 / 16

Pn+1
Xi
n+1 .
1
µ̂n+1 = µˆn + n+1 [Xn+1 − µˆn ]
▶ Such averaging formulas are used extensively in Reinforcement

learning.
11 / 16

MA 6.101 Probability and Statistics: Assistant Professor, IIIT Hyderabad

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MA 6.101 Probability and Statistics: Assistant Professor, IIIT Hyderabad

Uploaded by

Copyright:

Available Formats

MA 6.

Assistant Professor, IIIT Hyderabad

▶ Inverse transform method

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Generating samples from a Continuous random variable Y

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Generating samples from a Continuous random variable Y

▶ Define X := FY−1 (U). X also has the same distribution FY .

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Generating samples from a Continuous random variable Y

▶ Define X := FY−1 (U). X also has the same distribution FY .

▶ FY (Y ) is a uniform random variable.

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Generating samples from a Continuous random variable Y

▶ Define X := FY−1 (U). X also has the same distribution FY .

▶ FY (Y ) is a uniform random variable.

▶ This helps in checking if data samples you are are from

We say that {xn } converges to x ∈ R (denoted by xn →

We say that {xn } converges to x ∈ R (denoted by xn →

▶ What about convergence of functions?

We say that {xn } converges to x ∈ R (denoted by xn →

▶ What about convergence of functions?

▶ When do we say that a sequence of functions Fn (·) converge

We say that {xn } converges to x ∈ R (denoted by xn →

▶ What about convergence of functions?

▶ When do we say that a sequence of functions Fn (·) converge

We say that the sequence of function Fn (·) converge

We say that the sequence of function Fn (·) converge to

We say that the sequence of function Fn (·) converge to

▶ For every x , the sequence {Fn (x )} coverges to F (x ).

We say that the sequence of function Fn (·) converge to

▶ For every x , the sequence {Fn (x )} coverges to F (x ).

▶ For every ϵ, there exists N(ϵ, x ) which can depend on x .

We say that the sequence of function Fn (·) converge to

▶ For every x , the sequence {Fn (x )} coverges to F (x ).

▶ For every ϵ, there exists N(ϵ, x ) which can depend on x .

▶ Only those Fn (x ) are ϵ close to F (x ) for which n > N(ϵ, x ).

We say that the sequence of function Fn (·) converge to

▶ For every x , the sequence {Fn (x )} coverges to F (x ).

▶ For every ϵ, there exists N(ϵ, x ) which can depend on x .

▶ Only those Fn (x ) are ϵ close to F (x ) for which n > N(ϵ, x ).

If N(ϵ, x ) = N(ϵ) (i.e., independent of x ) for every x ∈ R,

▶ Here X could even be a deterministic number.

▶ Here X could even be a deterministic number.

▶ Xn′ s could be dependent on each other.

▶ Here X could even be a deterministic number.

▶ Xn′ s could be dependent on each other.

▶ Each random variable Xn could have a different law

Pointwise or Sure convergence

Pointwise or Sure convergence

{Xn , n ≥ 0} converges to X pointwise or surely if for all

Pointwise or Sure convergence

{Xn , n ≥ 0} converges to X pointwise or surely if for all

Pointwise or Sure convergence

{Xn , n ≥ 0} converges to X pointwise or surely if for all

Pointwise or Sure convergence

{Xn , n ≥ 0} converges to X pointwise or surely if for all