If you compute themeanof a sample of 10 numbers, the value you obtain will not equalthe populationmean exactly; by chance it will be a little bit higher or a little bit lower. If you sampled sets of 10 numbers over and over again (computing the mean for each set),you would find that some sample means come much closer to the population mean thanothers. Some would be higher than the population mean and some would be lower.Imagine sampling 10 numbers and computing the mean over and over again, say about1,000 times, and then constructing a relativefrequency distributionof those 1,000 means.This distribution of means is a very good approximation to the sampling distribution of the mean. The sampling distribution of the mean is a theoretical distribution that isapproached as the number of samples in the relative frequency distribution increases.With 1,000 samples, the relative frequency distribution is quite close; with 10,000 it iseven closer. As the number of samples approaches infinity, the relative frequencydistribution approaches the sampling distribution.The sampling distribution of the mean for a sample size of 10 was just an example; thereis a different sampling distribution for other sample sizes. Also, keep in mind that therelative frequency distribution approaches a sampling distribution as the number of samples increases, not as the sample size increases since there is a different samplingdistribution for each sample size.A sampling distribution can also be defined as the relative frequency distribution thatwould be obtained if all possible samples of a particular sample size were taken. For example, the sampling distribution of the mean for a sample size of 10 would beconstructed by computing the mean for each of the possible ways in which 10 scorescould be sampled from the population and creating a relative frequency distribution of these means. Although these two definitions may seem different, they are actually thesame: Both procedures produce exactly the same sampling distribution.Statistics other than the mean have sampling distributions too. The sampling distributionof the median is the distribution that would result if the median instead of the mean werecomputed in each sample.Students often define "sampling distribution" as the sampling distribution of the mean.That is a serious mistake.Sampling distributions are very important since almost allinferential statisticsare basedon sampling distributions.
Given a populationwith a mean of μ and a standard deviation of σ, the samplingdistribution of the mean has a mean of μ and a standard deviation of , where N is thesample size.The standard deviation of the sampling distribution of the mean is called thestandard error of the mean.It is designated by the symbol: . Note that thespreadof thesampling distribution of the mean decreases as the sample size increases.An example of the effect of sample size is shown above. Notice that the mean of thedistribution is not affected by sample size.
/N as N, thesample size,increases.The amazing and counter-intuitive thing about the central limit theorem is that no matter what the shape of the original distribution, the sampling distribution of the meanapproaches a normal distribution. Furthermore, for most distributions, a normaldistribution is approached very quickly as N increases. Keep in mind that N is the samplesize for each mean and not the number of samples. Remember in asampling distributionthe number of samples is assumed to be infinite. The sample size is the number of scoresin each sample; it is the number of scores that goes into the computation of each mean.On the next page are shown the results of a simulation exercise to demonstrate the centrallimit theorem. The computer sampled N scores from auniform distributionand computedthe mean. This procedure was performed 500 times for each of the sample sizes 1, 4, 7,and 10.2
Below are shown the resulting frequency distributions each based on 500 means. For N =4, 4 scores were sampled from a uniform distribution 500 times and the mean computedeach time. The same method was followed with means of 7 scores for N = 7 and 10scores for N = 10.Two things should be noted about the effect of increasing N:1. The distributions becomes more and more normal.2. The spread of the distributions decreases.
Area under the sampling distribution of the mean
Assume a test with ameanof 500 and astandard deviationof 100. Which is more likely:(1) that the mean of asampleof 5 people is greater than 580 or (2) that the mean of asample of 10 people is greater than 580? Using your intuition, you may have been able tofigure out that a mean over 580 is more likely to occur with the smaller sample. One wayto approach problems of this kind is to think of the extremes. What is the probability thatthe mean of a sample of 1000 people would be greater than 580. The probability is practically zero since the mean of a sample that large will almost certainly be very closeto the populationmean. The chance that it is more than 80 points away is practically nil.On the other hand, with a small sample, the sample mean could very well be as many as80 points from the population mean. Therefore, the larger the sample size, the less likelyit is that a sample mean will deviate greatly from the population mean. It follows that it ismore likely that the sample of 5 people will have a mean greater than 580 then will thesample of 10 people.To figure out the probabilities exactly, it is necessary to make theassumption that the distribution isnormal.Given normality and theformula for thestandard error of the mean,the probability that themean of 5 students is over 580 can be calculated in a manner almostidentical to that used in calculating the area under portions of thenormal curve.3