CLT

Central Limit Theorem (CLT from now) states that irrespective of the distribution of a population (with mean and standard deviation of ), if you take a number of samples of size n from the population, then the sample means follow a normal distribution with a mean of and a standard deviation of / sqrt(n)

Keyword is irrespective of the distribution of the population - the sample means will be normally distributed. That is the power of the CLT We will soon see practical applications of this

Lets take a very simple example Suppose you have a uniform distribution of 5 numbers 1,3,4,5,7. This has mean of 4 and Standard Deviation of 2.24 Now you take all possible samples of 2 out of this population and take the means of all those samples. Samples can possibly be - (1 & 3); (1 & 4); (1 & 5); (1 & 7); (3 & 4); (3 & 5); (3 & 7); (4 & 5); (4 & 7) and (5 & 7)

The means of these samples are 2, 2.5, 3, 4, 3.5, 4, 5, 4.5, 5.5, and 6. Now if you take the distribution of these means, they have mean of 4 and standard deviation of 1.29 Now if you take samples of size 3 (like we did for 2), and then take their means, you get the following values: 2.67, 3, 3.67, 3.33, 4, 4.33, 4, 4.67, 5 and 5.33. This has mean of 4 and standard deviation of 0.86

Next slides gives histograms of the main distribution and the distributions of means of the samples (of size 2 and 3)

1.0

Frequency

0.5

Frequency

1 2 3 4 5 6 7

0.0

C1

C3

We have also imposed a normal distribution over the means distribution with the same mean and standard deviation.

2.5 3.0 3.5 4.0 4.5 5.0 5.5

Frequency

C5

This simplistic example just demonstrates that even a very simple uniform distribution moves towards normality with small sample sizes As you take larger sample sizes and your populations are bigger, you see a very close approximation to the normal curve This is the amazing nature of the central limit theorem

Where can you apply CLT in Six Sigma? Most important is in hypothesis testing. Most of our hypothesis tests assume the data is normal (There are other tests also for non normal data but the number and quality of tests for normal data are more)

Suppose you have data that is has a non normal distribution (e.g. uniform or exponential or whatever). Now if you make an improvement in the process and want to do a statistical test to prove the improvement To do so, you will have to use one of the tests that can show statistical validation for that kind of distribution. Or you can apply the benefits of the central limit theorem

You know that if you use samples of data, then the sample means follow a normal distribution

So to apply a normal data approach, you can take a few groups of samples from the pre improvement data and then take a few samples from the post project data The means of these data groups will follow a normal distribution as per the central limit theorem. And you can use the hypothesis tests on these means to see of as a result of your project the means of samples have changed statistically or not

Another application of the Central Limit Theorem is in control charts. There again a lot of charts assume the data to be normal (in calculating the control limits) So if your data is non normal, you can plot groups instead of individual values because they would approximate a normal distribution and therefore would appear better on the control charts

