You are on page 1of 27

Lecture 11:

Distribution of sample mean 𝑋ത


Monte Carlo Simulations
ECO220Y – Intro to Data Analysis and
Applied Econometrics

Blanchenay ECO220
Last lecture
• Sample statistics are RV
– Can find distributions analytically,
theoretically, or empirically
• Proportion in sample is linear transformation of
binomial
• Can be approximated by Normal distribution
under certain condition (large sample size)

Blanchenay ECO220
• Can use approximation to find probabilities
• Variance goes down with sample size
Firefox for Android – Google Play
ratings
𝜇𝑋 = 𝐸 𝑋 ≈ 4.376

Draw 𝑛 = 10 users at random, compute mean


rating 𝑋ത

Blanchenay ECO220
• Expectation, Variance, Distribution?
• What is 𝑃(𝑋ത ≥ 4.5)?
1
Expectation of 𝑋ത = (𝑋1 + ⋯ + 𝑋10 )
10
1
𝐸 𝑋ത = 𝐸 𝑋1 + 𝑋2 + ⋯ + 𝑋10
10
1 1
= 𝐸 𝑋1 + ⋯ + 𝐸 𝑋10 = 10 ⋅ 𝐸 𝑋
10 10

ഥ = 𝑬 𝑿 = 𝝁𝑿
𝑬 𝑿 = 4.376

Blanchenay ECO220
Still true if we take a sample of 𝑛 = 50?
1
Variance of 𝑋ത = (𝑋1 + ⋯ + 𝑋10 )
10
1
𝑉 𝑋ത = 2 𝑉 𝑋1 + 𝑋2 + ⋯ + 𝑋10
10
Assume independent draws
1 1
= 2 𝑉 𝑋1 + ⋯ + 𝑉 𝑋10 = 2 10 ⋅ 𝑉 𝑋
10 10
𝑽 𝑿 𝝈𝟐𝑿
𝑽 𝑿ഥ = = = 0.124
𝟏𝟎 𝟏𝟎
𝝈𝑿
ഥ =

Blanchenay ECO220
𝒔𝒅 𝑿
𝟏𝟎
Still true if we take a sample of 𝑛 = 50?
10% Rule
For draws of a sample to be considered
independent:

• Either draw with replacement (eg from


population of dice rolls)

• Or draw without replacement, but with small

Blanchenay ECO220
enough 𝑛: less than 10% population
– Intuition: if you have drawn 50% of
population, it’s easier to guess the next draw
Summary
1
For 𝑋 = (𝑋1 + ⋯ + 𝑋𝑛 )

𝑛

ഥ = 𝑬 𝑿 = 𝝁𝑿
• 𝑬 𝑿

𝑽 𝑿 𝝈𝟐𝑿
ഥ =
• 𝑽 𝑿 =
𝒏 𝒏

Blanchenay ECO220
𝒔𝒅 𝑿 𝝈𝑿

• 𝒔𝒅 𝑿 = =
𝒏 𝒏
Distribution of 𝑋ത ?
If 𝑋 Normally distributed 𝑁 𝜇, 𝜎 2
• 𝑋ത linear combinations of 𝑛 r.v. Normally
distributed
• With previous info:
𝜎𝑋2
𝑋ത ∼ 𝑁 𝜇𝑋 ,
𝑛

• NB: if we take a sample of different size 𝑛′, we

Blanchenay ECO220
get another random variable
– If there’s confusion denote 𝑋ത10 … 𝑋ത50
Central Limit Theorem
For 𝒏 large enough, the distribution of sample
mean 𝑿 ഥ approximately follows a Normal,
regardless of the original distribution of 𝑿

• Computation of 𝐸(𝑋)
ത and 𝑉(𝑋)
ത still apply

• “𝑛 large enough”:
– Rule of thumb: 𝒏 ≥ 𝟑𝟎

Blanchenay ECO220
– If distribution of 𝑋 almost normal, can use less
– If distribution of 𝑋 very different from normal,
probably need more
Blanchenay ECO220
Average of 𝑛 = 50 dice rolls

Blanchenay ECO220
True CLT
For 𝒏 large enough, the distribution of a Linear
combination of 𝒏 independent RVs follows a
Normal, regardless of the individual
distribution of the RVs

• But sample mean is just a rescaled sum, so a


linear combination

Blanchenay ECO220
Empirical method

MONTE CARLO SIMULATION

Blanchenay ECO220
Monte Carlo simulation
Manhattan Project: secret
development of nuclear weapon
• Q: how much do neutrinos
travel through material
(shielding)?
• Von Neumann & Ulam: can’t
solve equations exactly
• Solution: simulate them as
random process a large

Blanchenay ECO220
number of time using ENIAC
(first computer)
Monte Carlo simulation
1. Fix 𝑛
2. Draw 𝑛 values at random from population
3. Compute statistics and record it
4. Repeat steps 2 and 3 a higher number of times
(eg 100,000)

• Use the 100,000 samples to infer distribution of


sample statistic

Blanchenay ECO220
– Graphically
– Numerically (STATA)
Firefox for Android – Google Play
ratings X (Firefox rating)
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 1 1
10% 3 1 Obs 2,954,953
25% 4 1 Sum of Wgt. 2,954,953

50% 5 Mean 4.376244


Largest Std. Dev. 1.112566
75% 5 5
90% 5 5 Variance 1.237804
95% 5 5 Skewness -1.903797
• 𝐸 𝑋 ≈ 4.376 99% 5 5 Kurtosis 5.68557

Draw 𝑛 = 10 users at random, compute mean


rating 𝑋ത

Blanchenay ECO220
• Expectation, Variance, Distribution?
• What is 𝑃(𝑋ത ≥ 4.5)?
Firefox Monte Carlo simulation
1. Fix 𝑛 = 10
2. Draw 10 values at random from population
3. Compute statistics (here 𝑋)
ത and record it
4. Repeat steps 2 and 3 for 1,000,000 samples

Use the 1,000,000 simulated samples to infer


distribution of 𝑋ത

Blanchenay ECO220
Distribution of 𝑋ത
• 68.0% of values
between 4.378-
0.351 and
4.378+0.351

• 96.8% of values
within 2sd

• 99.3% of values
within 3sd

Blanchenay ECO220
𝑃 𝑋ത ≥ 4.5 ?
If we had assumed 𝑋ത ∼ 𝑁 4.376,0.124 :
4.5 − 4.376
𝑃 𝑋ത ≥ 4.5 = 𝑃 𝑍 ≥ = 0.362
0.124

Using Monte Carlo: 𝑃 𝑋ത ≥ 4.5 = 0.459

Blanchenay ECO220
Discrepancies
If we take one sample of 10 and find 𝑋ത ≥ 4.5:
• Sampling error
• Simulation error: chance difference between
true probability distribution and simulated
probability distribution
– Do large number of samples to reduce it

Blanchenay ECO220
Distribution of 𝑋ത50
• 68.8% of values
between 4.376-
0.351 and
4.378+0.351

• 95.2 % of values
within 2sd

• 99.6% of values
within 3sd

Blanchenay ECO220
Reminder: Distrib. of 𝑃෠ as sample size ↑

Blanchenay ECO220
Precision
2
𝜎𝑋
𝐸 𝑋ത = 𝜇𝑋 𝑉 𝑋ത =
𝑛

With larger sample size,


• Distribution becomes Normal (CLT starts
working)

Blanchenay ECO220
• Variance of 𝑋ത becomes smaller
Distribution of 𝑋ത as 𝑛 ↗

Blanchenay ECO220
Distribution of 𝑋ത (100,000 repetitions)

Sample ഥ)
𝑬(𝑿 ഥ)
𝑽𝒂𝒓(𝑿 % obs. % obs. % obs.
size 𝒏 within within within
1sd 2sd 3sd

10 4.378 0.123 68.2 96.7 99.3


50 4.377 0.025 68.8 95.2 99.6
100 4.377 0.012 67.5 95.3 99.7

Blanchenay ECO220
1000 4.376 0.0012 68.8 95.5 99.7
+∞ 4.376 0 68.3 95.4 99.7
Sample mean as an estimate of population
mean

Imagine we do not know 𝜇𝑋 and would like to find


its value:

• Collect a sample of 𝑛 observations in population


• In expectation, sample mean equal to 𝜇𝑋
– Because 𝐸 𝑋ത = 𝜇𝑋

Blanchenay ECO220
• Higher 𝑛 ⇒ lower V 𝑋ത ⇒ 𝑋ത more precise
estimate of 𝜇𝑋
Key messages
𝑉 𝑋
For a given 𝑛: 𝐸 𝑋 = 𝐸(𝑋) and 𝑉 𝑋 =
ത ത
𝑛
– 10 percent rule
Shape: CLT: distribution of 𝑋ത approximately Normal if
𝑛 large enough (the less Normally distributed 𝑋 is, the
higher 𝑛 should be)
• If not sure: Monte Carlo simulations let us
empirically estimate distribution

Blanchenay ECO220
– Can be generalized to other statistics than 𝑋ത
• As 𝑛 increases, variance of 𝑋ത decreases
• Estimate 𝜇𝑋 using 𝑋ത

You might also like