You are on page 1of 3

Coursera Statistical Inference Project1

Author: Kartik Sahni

Overview: Illustrating Central Limit Theorem (CLT) and its properties by application of 1000 simulations
of average of 40 exponential distributions.

The PDF of the exponential distribution is given by:

f(x) =  λe−λx x ≥ 0 0 x < 0

The distribution is parameterized by lambda. In this experiment the lambda is taken to be 0.2

 The theoretical mean is thus equal to 1/lambda which is equal to 5


 The S.D. is also equal to 1/lambda, again equal to 5
 Thus the theoretical variance is equal to 25

Simulations:

First, let’s see the distribution of 1000 exponential distributions with lambda = 0.2
> #generate histogram of 1000 simulations of exponential distribution with lambda = 0.2
> n<-1000
> lambda<-0.2
> hist(rexp(n,lambda))

The resulting chart looks something like this

This is how an exponential distribution looks like. However, we learn from CLT that the distribution of
averages of iid variables becomes that of a standard normal. There are no caveats about the underlying
distribution of the original data. It may come from a normal, Poisson or like in our case the exponential
distribution.

Now we will be investigating the distribution of 40 averages the exponential distribution repeated 1000
times. For this we submit the following code in R:
> avgexp = NULL
# simulate 1000 times the distribution of average of 40 random exponential distributio
n with parameter lambda = 0.2
> for (i in 1 : 1000) avgexp = c(avgexp, mean(rexp(40,0.2)))
# generate histogram
> hist(avgexp)
The output chart from the code above looks something like this:

Notice that this looks like an approximately normal distribution.

1. Show the sample mean and compare it to the theoretical mean of the distribution.

We also know that this distribution should be centered about the theoretical mean of the distribution,
The theoretical mean of the distribution is 1/lambda. We compare the theoretical mean with the mean
of the distribution in the code below.
#calculate theoretical mean
> lambda<-1/0.2
> lambda
[1] 5
#calculate sample mean of the distribution
> x<-avgexp
> mean(x)
[1] 4.938934

We notice that the sample mean turns out to be very close to the theoretical mean. Just as an exercise
let’s consider the sample mean from a larger sample of 10,000 iids of averages of 40 iid exponential
distribution, we should expect the sample mean to be even closer to the theoretical mean.
> avgexp = NULL
#for 10,000 simulations instead of 1000
> for (i in 1 : 10000) avgexp = c(avgexp, mean(rexp(40,0.2)))
> x<-avgexp
> mean(x)
[1] 4.998599

Notice that the sample mean is almost equal to the theoretical mean, which illustrates that if we collect
a large (infinite) amount of data, we can estimate the population mean perfectly.

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the
distribution.

The sample variance is derived using the code below:


> avgexp = NULL
> # simulate 1000 times the distribution of average of 40 random exponential distribut
ion with parameter lambda = 0.2
> for (i in 1 : 1000) avgexp = c(avgexp, mean(rexp(40,0.2)))
>
> y<-avgexp
> variance<-var(y)
> variance
[1] 0.6296424
To compare variability, we derive the sample variance from our distribution, this is equal to 0.6296424,
recall that the theoretical variance of the distribution is equal to 25, however we know that by CLT the
distribution above is approximately normal with mean mu and standard deviation sigma/sqrt(n). To see
if it is the case then our sample variance should approximate to sigma square/n i.e. 25/40 or 0.625.
Notice that the sample variance is equal to 0.6296, which is very close to the theoretical variance.

3. Show that the distribution is approximately normal

For this part, we compare exponential distribution with n = 1000 and average of 40 exponential
distributions repeated 1000 times. We compare the qqplot of both:

1. Exponential distribution with lambda = 0.2 and n =1000


Code:
> ## Plot using a qqplot
> n<-1000
> lambda<-0.2
> exp<-rexp(n,lambda)
> qqnorm(exp);qqline(exp, col = 2)

2. Average of 40 exponential distribution with lambda simulated 1000 times with lambda = 0.2
Code:
> qqnorm(avgexp);qqline(avgexp, col = 2)

Using the QQ plot it is clear that the second distribution is approximately normal (we can see some
noise)

You might also like