Distribution of Sample Mean Monte Carlo Simulations: ECO220Y - Intro To Data Analysis and Applied Econometrics

Lecture 11:
Distribution of sample mean 𝑋ത

Monte Carlo Simulations
ECO220Y – Intro to Data Analysis and
Applied Econometrics
Blanchenay ECO220
Last lecture
• Sample statistics are RV
– Can find distributions analytically,
theoretically, or empirically
• Proportion in sample is linear transformation of
binomial
• Can be approximated by Normal distribution
under certain condition (large sample size)
Blanchenay ECO220
• Can use approximation to find probabilities
• Variance goes down with sample size
Firefox for Android – Google Play
ratings
𝜇𝑋 = 𝐸 𝑋 ≈ 4.376
Draw 𝑛 = 10 users at random, compute mean

rating 𝑋ത
Blanchenay ECO220
• Expectation, Variance, Distribution?
• What is 𝑃(𝑋ത ≥ 4.5)?
1
Expectation of 𝑋ത = (𝑋1 + ⋯ + 𝑋10 )
10
1
𝐸 𝑋ത = 𝐸 𝑋1 + 𝑋2 + ⋯ + 𝑋10
10
1 1
= 𝐸 𝑋1 + ⋯ + 𝐸 𝑋10 = 10 ⋅ 𝐸 𝑋
10 10
ഥ = 𝑬 𝑿 = 𝝁𝑿
𝑬 𝑿 = 4.376
Blanchenay ECO220
Still true if we take a sample of 𝑛 = 50?
1
Variance of 𝑋ത = (𝑋1 + ⋯ + 𝑋10 )
10
1
𝑉 𝑋ത = 2 𝑉 𝑋1 + 𝑋2 + ⋯ + 𝑋10
10
Assume independent draws
1 1
= 2 𝑉 𝑋1 + ⋯ + 𝑉 𝑋10 = 2 10 ⋅ 𝑉 𝑋
10 10
𝑽 𝑿 𝝈𝟐𝑿
𝑽 𝑿ഥ = = = 0.124
𝟏𝟎 𝟏𝟎
𝝈𝑿
ഥ =
Blanchenay ECO220
𝒔𝒅 𝑿
𝟏𝟎
Still true if we take a sample of 𝑛 = 50?
10% Rule
For draws of a sample to be considered
independent:
• Either draw with replacement (eg from

population of dice rolls)
• Or draw without replacement, but with small
Blanchenay ECO220
enough 𝑛: less than 10% population
– Intuition: if you have drawn 50% of
population, it’s easier to guess the next draw
Summary
1
For 𝑋 = (𝑋1 + ⋯ + 𝑋𝑛 )
ത
𝑛
ഥ = 𝑬 𝑿 = 𝝁𝑿
• 𝑬 𝑿
𝑽 𝑿 𝝈𝟐𝑿
ഥ =
• 𝑽 𝑿 =
𝒏 𝒏
Blanchenay ECO220
𝒔𝒅 𝑿 𝝈𝑿
ഥ
• 𝒔𝒅 𝑿 = =
𝒏 𝒏
Distribution of 𝑋ത ?
If 𝑋 Normally distributed 𝑁 𝜇, 𝜎 2
• 𝑋ത linear combinations of 𝑛 r.v. Normally
distributed
• With previous info:
𝜎𝑋2
𝑋ത ∼ 𝑁 𝜇𝑋 ,
𝑛
• NB: if we take a sample of different size 𝑛′, we
Blanchenay ECO220
get another random variable
– If there’s confusion denote 𝑋ത10 … 𝑋ത50
Central Limit Theorem
For 𝒏 large enough, the distribution of sample
mean 𝑿 ഥ approximately follows a Normal,
regardless of the original distribution of 𝑿
• Computation of 𝐸(𝑋)
ത and 𝑉(𝑋)
ത still apply
• “𝑛 large enough”:
– Rule of thumb: 𝒏 ≥ 𝟑𝟎
Blanchenay ECO220
– If distribution of 𝑋 almost normal, can use less
– If distribution of 𝑋 very different from normal,
probably need more
Blanchenay ECO220
Average of 𝑛 = 50 dice rolls
Blanchenay ECO220
True CLT
For 𝒏 large enough, the distribution of a Linear
combination of 𝒏 independent RVs follows a
Normal, regardless of the individual
distribution of the RVs
• But sample mean is just a rescaled sum, so a

linear combination
Blanchenay ECO220
Empirical method
MONTE CARLO SIMULATION
Blanchenay ECO220
Monte Carlo simulation
Manhattan Project: secret
development of nuclear weapon
• Q: how much do neutrinos
travel through material
(shielding)?
• Von Neumann & Ulam: can’t
solve equations exactly
• Solution: simulate them as
random process a large
Blanchenay ECO220
number of time using ENIAC
(first computer)
Monte Carlo simulation
1. Fix 𝑛
2. Draw 𝑛 values at random from population
3. Compute statistics and record it
4. Repeat steps 2 and 3 a higher number of times
(eg 100,000)
• Use the 100,000 samples to infer distribution of

sample statistic
Blanchenay ECO220
– Graphically
– Numerically (STATA)
Firefox for Android – Google Play
ratings X (Firefox rating)
-------------------------------------------------------------
Percentiles Smallest
1% 1 1
5% 1 1
10% 3 1 Obs 2,954,953
25% 4 1 Sum of Wgt. 2,954,953
50% 5 Mean 4.376244

Largest Std. Dev. 1.112566
75% 5 5
90% 5 5 Variance 1.237804
95% 5 5 Skewness -1.903797
• 𝐸 𝑋 ≈ 4.376 99% 5 5 Kurtosis 5.68557
Draw 𝑛 = 10 users at random, compute mean

rating 𝑋ത
Blanchenay ECO220
• Expectation, Variance, Distribution?
• What is 𝑃(𝑋ത ≥ 4.5)?
Firefox Monte Carlo simulation
1. Fix 𝑛 = 10
2. Draw 10 values at random from population
3. Compute statistics (here 𝑋)
ത and record it
4. Repeat steps 2 and 3 for 1,000,000 samples
Use the 1,000,000 simulated samples to infer

distribution of 𝑋ത
Blanchenay ECO220
Distribution of 𝑋ത
• 68.0% of values
between 4.378-
0.351 and
4.378+0.351
• 96.8% of values
within 2sd
• 99.3% of values
within 3sd
Blanchenay ECO220
𝑃 𝑋ത ≥ 4.5 ?
If we had assumed 𝑋ത ∼ 𝑁 4.376,0.124 :
4.5 − 4.376
𝑃 𝑋ത ≥ 4.5 = 𝑃 𝑍 ≥ = 0.362
0.124
Using Monte Carlo: 𝑃 𝑋ത ≥ 4.5 = 0.459
Blanchenay ECO220
Discrepancies
If we take one sample of 10 and find 𝑋ത ≥ 4.5:
• Sampling error
• Simulation error: chance difference between
true probability distribution and simulated
probability distribution
– Do large number of samples to reduce it
Blanchenay ECO220
Distribution of 𝑋ത50
• 68.8% of values
between 4.376-
0.351 and
4.378+0.351
• 95.2 % of values
within 2sd
• 99.6% of values
within 3sd
Blanchenay ECO220
Reminder: Distrib. of 𝑃෠ as sample size ↑
Blanchenay ECO220
Precision
2
𝜎𝑋
𝐸 𝑋ത = 𝜇𝑋 𝑉 𝑋ത =
𝑛
With larger sample size,

• Distribution becomes Normal (CLT starts
working)
Blanchenay ECO220
• Variance of 𝑋ത becomes smaller
Distribution of 𝑋ത as 𝑛 ↗
Blanchenay ECO220
Distribution of 𝑋ത (100,000 repetitions)
Sample ഥ)
𝑬(𝑿 ഥ)
𝑽𝒂𝒓(𝑿 % obs. % obs. % obs.
size 𝒏 within within within
1sd 2sd 3sd
10 4.378 0.123 68.2 96.7 99.3

50 4.377 0.025 68.8 95.2 99.6
100 4.377 0.012 67.5 95.3 99.7
Blanchenay ECO220
1000 4.376 0.0012 68.8 95.5 99.7
+∞ 4.376 0 68.3 95.4 99.7
Sample mean as an estimate of population
mean
Imagine we do not know 𝜇𝑋 and would like to find

its value:
• Collect a sample of 𝑛 observations in population

• In expectation, sample mean equal to 𝜇𝑋
– Because 𝐸 𝑋ത = 𝜇𝑋
Blanchenay ECO220
• Higher 𝑛 ⇒ lower V 𝑋ത ⇒ 𝑋ത more precise
estimate of 𝜇𝑋
Key messages
𝑉 𝑋
For a given 𝑛: 𝐸 𝑋 = 𝐸(𝑋) and 𝑉 𝑋 =
ത ത
𝑛
– 10 percent rule
Shape: CLT: distribution of 𝑋ത approximately Normal if
𝑛 large enough (the less Normally distributed 𝑋 is, the
higher 𝑛 should be)
• If not sure: Monte Carlo simulations let us
empirically estimate distribution
Blanchenay ECO220
– Can be generalized to other statistics than 𝑋ത
• As 𝑛 increases, variance of 𝑋ത decreases
• Estimate 𝜇𝑋 using 𝑋ത

Distribution of Sample Mean Monte Carlo Simulations: ECO220Y - Intro To Data Analysis and Applied Econometrics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distribution of Sample Mean Monte Carlo Simulations: ECO220Y - Intro To Data Analysis and Applied Econometrics

Uploaded by

Copyright:

Available Formats

Lecture 11:

Distribution of sample mean 𝑋ത

Draw 𝑛 = 10 users at random, compute mean

• Either draw with replacement (eg from

• Or draw without replacement, but with small

• NB: if we take a sample of different size 𝑛′, we

• But sample mean is just a rescaled sum, so a

MONTE CARLO SIMULATION

• Use the 100,000 samples to infer distribution of

50% 5 Mean 4.376244

Draw 𝑛 = 10 users at random, compute mean

Use the 1,000,000 simulated samples to infer

Using Monte Carlo: 𝑃 𝑋ത ≥ 4.5 = 0.459

With larger sample size,

10 4.378 0.123 68.2 96.7 99.3

Imagine we do not know 𝜇𝑋 and would like to find

• Collect a sample of 𝑛 observations in population

You might also like