Professional Documents
Culture Documents
Overview: Illustrating Central Limit Theorem (CLT) and its properties by application of 1000 simulations
of average of 40 exponential distributions.
The distribution is parameterized by lambda. In this experiment the lambda is taken to be 0.2
Simulations:
First, let’s see the distribution of 1000 exponential distributions with lambda = 0.2
> #generate histogram of 1000 simulations of exponential distribution with lambda = 0.2
> n<-1000
> lambda<-0.2
> hist(rexp(n,lambda))
This is how an exponential distribution looks like. However, we learn from CLT that the distribution of
averages of iid variables becomes that of a standard normal. There are no caveats about the underlying
distribution of the original data. It may come from a normal, Poisson or like in our case the exponential
distribution.
Now we will be investigating the distribution of 40 averages the exponential distribution repeated 1000
times. For this we submit the following code in R:
> avgexp = NULL
# simulate 1000 times the distribution of average of 40 random exponential distributio
n with parameter lambda = 0.2
> for (i in 1 : 1000) avgexp = c(avgexp, mean(rexp(40,0.2)))
# generate histogram
> hist(avgexp)
The output chart from the code above looks something like this:
1. Show the sample mean and compare it to the theoretical mean of the distribution.
We also know that this distribution should be centered about the theoretical mean of the distribution,
The theoretical mean of the distribution is 1/lambda. We compare the theoretical mean with the mean
of the distribution in the code below.
#calculate theoretical mean
> lambda<-1/0.2
> lambda
[1] 5
#calculate sample mean of the distribution
> x<-avgexp
> mean(x)
[1] 4.938934
We notice that the sample mean turns out to be very close to the theoretical mean. Just as an exercise
let’s consider the sample mean from a larger sample of 10,000 iids of averages of 40 iid exponential
distribution, we should expect the sample mean to be even closer to the theoretical mean.
> avgexp = NULL
#for 10,000 simulations instead of 1000
> for (i in 1 : 10000) avgexp = c(avgexp, mean(rexp(40,0.2)))
> x<-avgexp
> mean(x)
[1] 4.998599
Notice that the sample mean is almost equal to the theoretical mean, which illustrates that if we collect
a large (infinite) amount of data, we can estimate the population mean perfectly.
2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the
distribution.
For this part, we compare exponential distribution with n = 1000 and average of 40 exponential
distributions repeated 1000 times. We compare the qqplot of both:
2. Average of 40 exponential distribution with lambda simulated 1000 times with lambda = 0.2
Code:
> qqnorm(avgexp);qqline(avgexp, col = 2)
Using the QQ plot it is clear that the second distribution is approximately normal (we can see some
noise)