You are on page 1of 65

MA 6.

101
Probability and Statistics
Tejas Bodas

Assistant Professor, IIIT Hyderabad

1 / 16
Recap: Stochastic Simulation

2 / 16
Recap: Stochastic Simulation

▶ Inverse transform method

2 / 16
Recap: Stochastic Simulation

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

2 / 16
Recap: Stochastic Simulation

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Generating samples from a Continuous random variable Y


with CDF FY (·)

2 / 16
Recap: Stochastic Simulation

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Generating samples from a Continuous random variable Y


with CDF FY (·)

▶ Define X := FY−1 (U). X also has the same distribution FY .

2 / 16
Recap: Stochastic Simulation

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Generating samples from a Continuous random variable Y


with CDF FY (·)

▶ Define X := FY−1 (U). X also has the same distribution FY .

▶ FY (Y ) is a uniform random variable.

2 / 16
Recap: Stochastic Simulation

▶ Inverse transform method

▶ Generating Samples from a Discrete random variable

▶ Generating samples from a Continuous random variable Y


with CDF FY (·)

▶ Define X := FY−1 (U). X also has the same distribution FY .

▶ FY (Y ) is a uniform random variable.

▶ This helps in checking if data samples you are are from


random variable Y or not.

2 / 16
Convergence of Random Variables

2 / 16
Pointwise Convergence

3 / 16
Pointwise Convergence
▶ When do we say that {xn } converges to x ∈ R ?

3 / 16
Pointwise Convergence
▶ When do we say that {xn } converges to x ∈ R ?

We say that {xn } converges to x ∈ R (denoted by xn →


x ) if for every ϵ > 0, we can find an N(ϵ) ∈ N such that
for |xn − x | < ϵ for n > N(ϵ).

3 / 16
Pointwise Convergence
▶ When do we say that {xn } converges to x ∈ R ?

We say that {xn } converges to x ∈ R (denoted by xn →


x ) if for every ϵ > 0, we can find an N(ϵ) ∈ N such that
for |xn − x | < ϵ for n > N(ϵ).

▶ What about convergence of functions?

3 / 16
Pointwise Convergence
▶ When do we say that {xn } converges to x ∈ R ?

We say that {xn } converges to x ∈ R (denoted by xn →


x ) if for every ϵ > 0, we can find an N(ϵ) ∈ N such that
for |xn − x | < ϵ for n > N(ϵ).

▶ What about convergence of functions?

▶ When do we say that a sequence of functions Fn (·) converge


to F (·) on the domain R?

3 / 16
Pointwise Convergence
▶ When do we say that {xn } converges to x ∈ R ?

We say that {xn } converges to x ∈ R (denoted by xn →


x ) if for every ϵ > 0, we can find an N(ϵ) ∈ N such that
for |xn − x | < ϵ for n > N(ϵ).

▶ What about convergence of functions?

▶ When do we say that a sequence of functions Fn (·) converge


to F (·) on the domain R?

We say that the sequence of function Fn (·) converge


to F (·) pointwise if the sequence {Fn (x )} converges to
F (x ) (Fn (x ) → F (x )) for all x ∈ R.

3 / 16
Uniform Convergence

4 / 16
Uniform Convergence

We say that the sequence of function Fn (·) converge to


F (·) pointwise if the sequence {Fn (x )} converges to F (x )
(Fn (x ) → F (x )) for all x ∈ R.

4 / 16
Uniform Convergence

We say that the sequence of function Fn (·) converge to


F (·) pointwise if the sequence {Fn (x )} converges to F (x )
(Fn (x ) → F (x )) for all x ∈ R.

▶ For every x , the sequence {Fn (x )} coverges to F (x ).

4 / 16
Uniform Convergence

We say that the sequence of function Fn (·) converge to


F (·) pointwise if the sequence {Fn (x )} converges to F (x )
(Fn (x ) → F (x )) for all x ∈ R.

▶ For every x , the sequence {Fn (x )} coverges to F (x ).

▶ For every ϵ, there exists N(ϵ, x ) which can depend on x .

4 / 16
Uniform Convergence

We say that the sequence of function Fn (·) converge to


F (·) pointwise if the sequence {Fn (x )} converges to F (x )
(Fn (x ) → F (x )) for all x ∈ R.

▶ For every x , the sequence {Fn (x )} coverges to F (x ).

▶ For every ϵ, there exists N(ϵ, x ) which can depend on x .

▶ Only those Fn (x ) are ϵ close to F (x ) for which n > N(ϵ, x ).

4 / 16
Uniform Convergence

We say that the sequence of function Fn (·) converge to


F (·) pointwise if the sequence {Fn (x )} converges to F (x )
(Fn (x ) → F (x )) for all x ∈ R.

▶ For every x , the sequence {Fn (x )} coverges to F (x ).

▶ For every ϵ, there exists N(ϵ, x ) which can depend on x .

▶ Only those Fn (x ) are ϵ close to F (x ) for which n > N(ϵ, x ).

If N(ϵ, x ) = N(ϵ) (i.e., independent of x ) for every x ∈ R,


then such convergence of Fn (·) to F (·) is called as uniform
convergence.

4 / 16
Convergence of Sequence of random variables

5 / 16
Convergence of Sequence of random variables
▶ We will now be interested in the convergence properties of an
infinite sequence of random variables {Xn } to some limiting
random variable X .

5 / 16
Convergence of Sequence of random variables
▶ We will now be interested in the convergence properties of an
infinite sequence of random variables {Xn } to some limiting
random variable X .
▶ What does the convergence Xn → X even mean ?

5 / 16
Convergence of Sequence of random variables
▶ We will now be interested in the convergence properties of an
infinite sequence of random variables {Xn } to some limiting
random variable X .
▶ What does the convergence Xn → X even mean ?
▶ When you perform the random experiment once, you get a
sequence of realizations {xn } and x .

5 / 16
Convergence of Sequence of random variables
▶ We will now be interested in the convergence properties of an
infinite sequence of random variables {Xn } to some limiting
random variable X .
▶ What does the convergence Xn → X even mean ?
▶ When you perform the random experiment once, you get a
sequence of realizations {xn } and x .
▶ If you are ’lucky’, maybe xn → x .

5 / 16
Convergence of Sequence of random variables
▶ We will now be interested in the convergence properties of an
infinite sequence of random variables {Xn } to some limiting
random variable X .
▶ What does the convergence Xn → X even mean ?
▶ When you perform the random experiment once, you get a
sequence of realizations {xn } and x .
▶ If you are ’lucky’, maybe xn → x .
▶ But if you were to perform the experiment again, you may not
be so ’lucky’ and get a different sequence {xn′ } which may not
converge to x ′ .

5 / 16
Convergence of Sequence of random variables
▶ We will now be interested in the convergence properties of an
infinite sequence of random variables {Xn } to some limiting
random variable X .
▶ What does the convergence Xn → X even mean ?
▶ When you perform the random experiment once, you get a
sequence of realizations {xn } and x .
▶ If you are ’lucky’, maybe xn → x .
▶ But if you were to perform the experiment again, you may not
be so ’lucky’ and get a different sequence {xn′ } which may not
converge to x ′ .
▶ We will come up with notions of convergence that depend on
how often you see the sequence of realizations converging.

5 / 16
Convergence of Sequence of random variables

▶ Convergence of Xn → X

6 / 16
Convergence of Sequence of random variables

▶ Convergence of Xn → X

▶ Here X could even be a deterministic number.

6 / 16
Convergence of Sequence of random variables

▶ Convergence of Xn → X

▶ Here X could even be a deterministic number.

▶ Xn′ s could be dependent on each other.

6 / 16
Convergence of Sequence of random variables

▶ Convergence of Xn → X

▶ Here X could even be a deterministic number.

▶ Xn′ s could be dependent on each other.

▶ Each random variable Xn could have a different law


(pmf/pdf).

6 / 16
Modes of Convergence (Xn → X )

7 / 16
Modes of Convergence (Xn → X )

Pointwise or Sure convergence

7 / 16
Modes of Convergence (Xn → X )

Pointwise or Sure convergence

{Xn , n ≥ 0} converges to X pointwise or surely if for all


ω ∈ Ω we have limn→∞ Xn (ω) = X (ω)

7 / 16
Modes of Convergence (Xn → X )

Pointwise or Sure convergence

{Xn , n ≥ 0} converges to X pointwise or surely if for all


ω ∈ Ω we have limn→∞ Xn (ω) = X (ω)

▶ Consider Ω = {H, T }.

7 / 16
Modes of Convergence (Xn → X )

Pointwise or Sure convergence

{Xn , n ≥ 0} converges to X pointwise or surely if for all


ω ∈ Ω we have limn→∞ Xn (ω) = X (ω)

▶ Consider Ω = {H, T }.
(
1
n if ω = H
▶ Further, Xn =
1
1+ n if ω = T .

7 / 16
Modes of Convergence (Xn → X )

Pointwise or Sure convergence

{Xn , n ≥ 0} converges to X pointwise or surely if for all


ω ∈ Ω we have limn→∞ Xn (ω) = X (ω)

▶ Consider Ω = {H, T }.
( (
1
n if ω = H 0 if ω = H
▶ Further, Xn = and X =
1
1+ n if ω = T . 1 if ω = T .

7 / 16
Almost sure convergence

Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

8 / 16
Almost sure convergence

Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ The set of outcomes where the convergence does not happen


has measure 0.

8 / 16
Almost sure convergence

Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ The set of outcomes where the convergence does not happen


has measure 0. P{ω ∈ Ω : limn→∞ Xn (ω) = X (ω)} = 0.

8 / 16
Almost sure convergence

Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ The set of outcomes where the convergence does not happen


has measure 0. P{ω ∈ Ω : limn→∞ Xn (ω) = X (ω)} = 0.

▶ Consider Ω = [0, 1] where you pick a number uniformly in


[0, 1].

8 / 16
Almost sure convergence

Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ The set of outcomes where the convergence does not happen


has measure 0. P{ω ∈ Ω : limn→∞ Xn (ω) = X (ω)} = 0.

▶ Consider Ω = [0, 1] where you pick a number uniformly in


[0, 1]. Let Xn (ω) = ω n for all ω ∈ Ω and X (ω) = 0 for all ω.

8 / 16
Almost sure convergence

Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ The set of outcomes where the convergence does not happen


has measure 0. P{ω ∈ Ω : limn→∞ Xn (ω) = X (ω)} = 0.

▶ Consider Ω = [0, 1] where you pick a number uniformly in


[0, 1]. Let Xn (ω) = ω n for all ω ∈ Ω and X (ω) = 0 for all ω.

▶ Xn (ω) → X (ω) for ω ∈ [0, 1).

8 / 16
Almost sure convergence

Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ The set of outcomes where the convergence does not happen


has measure 0. P{ω ∈ Ω : limn→∞ Xn (ω) = X (ω)} = 0.

▶ Consider Ω = [0, 1] where you pick a number uniformly in


[0, 1]. Let Xn (ω) = ω n for all ω ∈ Ω and X (ω) = 0 for all ω.

▶ Xn (ω) → X (ω) for ω ∈ [0, 1).

▶ Xn (ω) ↛ X (ω) for ω = 1 and P{ω = 1}.

8 / 16
Almost sure convergence

Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ The set of outcomes where the convergence does not happen


has measure 0. P{ω ∈ Ω : limn→∞ Xn (ω) = X (ω)} = 0.

▶ Consider Ω = [0, 1] where you pick a number uniformly in


[0, 1]. Let Xn (ω) = ω n for all ω ∈ Ω and X (ω) = 0 for all ω.

▶ Xn (ω) → X (ω) for ω ∈ [0, 1).

▶ Xn (ω) ↛ X (ω) for ω = 1 and P{ω = 1}.

▶ This is almost sure convergence as P{[0, 1)} = 1.

8 / 16
Almost sure (a.s.) convergence
Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

9 / 16
Almost sure (a.s.) convergence
Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ Example 2: Strong law of large numbers (SLLN).

Let {Xn , n ≥ 0} denote a sequence of i.i.d random vari-


Pn
ables with mean µ and denote Sn = i=1 Xi . Then
Sn
n → µ a.s.

9 / 16
Almost sure (a.s.) convergence
Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ Example 2: Strong law of large numbers (SLLN).

Let {Xn , n ≥ 0} denote a sequence of i.i.d random vari-


Pn
ables with mean µ and denote Sn = i=1 Xi . Then
Sn
n → µ a.s.

▶ Toss a biased coin (probability of head is µ) repeatedly. What


is ω and Ω?

9 / 16
Almost sure (a.s.) convergence
Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ Example 2: Strong law of large numbers (SLLN).

Let {Xn , n ≥ 0} denote a sequence of i.i.d random vari-


Pn
ables with mean µ and denote Sn = i=1 Xi . Then
Sn
n → µ a.s.

▶ Toss a biased coin (probability of head is µ) repeatedly. What


is ω and Ω?

▶ Let Xi denote the outcome of the i th toss and Sn denotes the


number of heads in n tosses.

9 / 16
Almost sure (a.s.) convergence
Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ Example 2: Strong law of large numbers (SLLN).

Let {Xn , n ≥ 0} denote a sequence of i.i.d random vari-


Pn
ables with mean µ and denote Sn = i=1 Xi . Then
Sn
n → µ a.s.

▶ Toss a biased coin (probability of head is µ) repeatedly. What


is ω and Ω?

▶ Let Xi denote the outcome of the i th toss and Sn denotes the


number of heads in n tosses.

▶ The empirical mean is given by Sn


n .
9 / 16
Almost sure (a.s.) convergence
Xn converges to X almost surely if

P (ω ∈ Ω : limn→∞ Xn (ω) = X (ω)) = 1.

▶ Example 2: Strong law of large numbers (SLLN).

Let {Xn , n ≥ 0} denote a sequence of i.i.d random vari-


Pn
ables with mean µ and denote Sn = i=1 Xi . Then
Sn
n → µ a.s.

▶ Toss a biased coin (probability of head is µ) repeatedly. What


is ω and Ω?

▶ Let Xi denote the outcome of the i th toss and Sn denotes the


number of heads in n tosses.

▶ The empirical mean is given by Sn


n .
9 / 16
Detour: Incremental formula for sample mean

10 / 16
Detour: Incremental formula for sample mean

▶ Now that we know Snn → µ we can use µˆn := Snn as an


’estimator’ for the mean especially in cases when the
underlying distribution is not known.

10 / 16
Detour: Incremental formula for sample mean

▶ Now that we know Snn → µ we can use µˆn := Snn as an


’estimator’ for the mean especially in cases when the
underlying distribution is not known.

▶ Note that the estimator µˆn is a random variable. What is its


cdf? what is its mean & Variance?

10 / 16
Detour: Incremental formula for sample mean

▶ Now that we know Snn → µ we can use µˆn := Snn as an


’estimator’ for the mean especially in cases when the
underlying distribution is not known.

▶ Note that the estimator µˆn is a random variable. What is its


cdf? what is its mean & Variance?

▶ µˆn = Sn
n is an ’unbiased estimator’ since E [µˆn ] = µ.

10 / 16
Detour: Incremental formula for sample mean

▶ Now that we know Snn → µ we can use µˆn := Snn as an


’estimator’ for the mean especially in cases when the
underlying distribution is not known.

▶ Note that the estimator µˆn is a random variable. What is its


cdf? what is its mean & Variance?

▶ µˆn = Sn
n is an ’unbiased estimator’ since E [µˆn ] = µ.

▶ Var (µˆn ) = σ2
n

10 / 16
Detour: Incremental formula for sample mean

▶ Now that we know Snn → µ we can use µˆn := Snn as an


’estimator’ for the mean especially in cases when the
underlying distribution is not known.

▶ Note that the estimator µˆn is a random variable. What is its


cdf? what is its mean & Variance?

▶ µˆn = Sn
n is an ’unbiased estimator’ since E [µˆn ] = µ.

▶ Var (µˆn ) = σ2
n

▶ We will soon see CLT that will tell the CDF of µˆn without any
information on the law of Xi .

10 / 16
Detour: Incremental formula for sample mean

11 / 16
Detour: Incremental formula for sample mean

▶ Now given µˆn , suppose you see an additional sample Xn+1 .

11 / 16
Detour: Incremental formula for sample mean

▶ Now given µˆn , suppose you see an additional sample Xn+1 .

▶ How will you compute µ̂n+1 ?

11 / 16
Detour: Incremental formula for sample mean

▶ Now given µˆn , suppose you see an additional sample Xn+1 .

▶ How will you compute µ̂n+1 ?


Pn+1
Xi
▶ Naive way : µ̂n+1 = i=1
n+1 .

11 / 16
Detour: Incremental formula for sample mean

▶ Now given µˆn , suppose you see an additional sample Xn+1 .

▶ How will you compute µ̂n+1 ?


Pn+1
Xi
▶ Naive way : µ̂n+1 = i=1
n+1 .

▶ There is an incremental formula that uses µˆn .

11 / 16
Detour: Incremental formula for sample mean

▶ Now given µˆn , suppose you see an additional sample Xn+1 .

▶ How will you compute µ̂n+1 ?


Pn+1
Xi
▶ Naive way : µ̂n+1 = i=1
n+1 .

▶ There is an incremental formula that uses µˆn .

1
µ̂n+1 = µˆn + n+1 [Xn+1 − µˆn ]

11 / 16
Detour: Incremental formula for sample mean

▶ Now given µˆn , suppose you see an additional sample Xn+1 .

▶ How will you compute µ̂n+1 ?


Pn+1
Xi
▶ Naive way : µ̂n+1 = i=1
n+1 .

▶ There is an incremental formula that uses µˆn .

1
µ̂n+1 = µˆn + n+1 [Xn+1 − µˆn ]

▶ Such averaging formulas are used extensively in Reinforcement


learning.

11 / 16

You might also like