There's an important result that arises out of these facts and the important result is that the variance

of the sample mean of a collection of independent and identically distributed random variables is sigma squared over n. So let's assume that we have a collection Xii equals one to n, that are independent and identically distributed, IID, and that the variance of the distribution that they're drawn from is sigma squared. Okay? So let's calculate variance of X bar. Well that's just the variance of one / n times the sum of the Xs, right. That's just the sample mean formula, the sum of everything divided by the number of things you added up. The one / n is a constant, so we can pull it out. One / n^2, and we get the variance of the sum. The variance of the sums, the sum of the variances because the Xs are independent. Hence uncorrelated, and then because they're IID, the variance of each XI is the same, its sigma squared. And we've added up n sigma squared so we get n sigma squared, so it works out to be sigma squared over n for the final line here. So, what does this mean? It's really quite an interesting fact. What this says is, if I want to know what's the variance of the distribution of averages of ten random variables, say, from a distribution. I don't actually have to know what that distribution of averages actually is. So I don't have to do that. All I have to know is what the variance from the original distribution that the individual observations are drawn from and that gives it to me. I just have to divide that by n. Right? So if I want the variance, I divide by n. If I want the standard deviation, take the original standard deviation and divide by square root n. And why is this important? Because remember, eventually we'd like to connect all of these ideas, these population model ideas, to data. And, if we have a bunch of things that we're willing to model as if they were IID. Well, we get multiple draws from the distribution of individual observations. All the XI's are separate draws from the

original distribution. So we can estimate things like sigma squared. But we only get one sample mean. Let's say we have a sample of 100 observations, we only get one sample of 100. So if we calculate the sample mean of all those 100 observations we have nothing empirically to estimate the variance of sample means of a 100 variables, we don't have repeated samples of 100 variables. We only have the one. What this result says, you don't need that, right? Because all you need is the variance of the original population and divide it by n. The variance of the original population is something we can estimate. And so it's a very nifty result. Let me give you an example of this property that you could do at home to just test this result to make sure it's true. Recall in the last lecture. We said the variance of a die roll, which takes values one to six with equal likelihood. One, six for each number. The variance of a die roll was 2.92. Okay. So what that says is if you roll a die over and over and over and look at the distribution, you'll get about one-sixth of each number. And that the variance of that distribution, so if you were to roll it thousands and thousands of times and take the variance of the thousands of measurements, you would get around 2.92. So do that, roll a die a lot of times and take the sample variance of the thousand die rolls for example and you'll get about 2.92. Why is that? That's saying because the sample variance of lots of die rolls estimates the distribution of the population of die rolls which is this uniform distribution of one to six and its variance is 2.92, so you'll get that. Now here's the question that the calculation on the slide is answering. Suppose now instead of rolling a die over and over again you roll ten dice and took their average. And repeated that process over and over again. Right? So now this would no longer be uniform on

the numbers one to six. Still, the minimum would be one, right? If you got all ten 1s, the average of ten 1s is one. And it, the maximum would still be six. The average of ten 6s is still six. And so, the bounds are one and six. But it would not look like a uniform distribution on the numbers between one and six cuz you can get all sorts of different numbers, right? You can get numbers between one and two, two and three, and so on. So it has kind of a funny distribution, the distribution of averages of ten die rolls. So imagine if you were to do that. Roll your ten dice, and take the average. And do that over and over again so that you got, say, 10,000 averages of ten die rolls. Right? And you wanted to know what was the variance of that distribution. Well it seems kind of like a hard calculation. First you'd have to figure out what's the distribution of the average of ten die rolls which seems kinda like a hard distribution. We'll actually later on discuss that that's maybe even a little bit easier to calculate than you might've thought. But this calculation says you don't even have to worry about that. We know that the variance of the distribution of individual die rolls is 2.92, so the variance of the distribution of averages of ten die rolls is 2.92 / ten, so it will be 0.292. And so we could run this experiment in R. For example, where we rolled a digital die thousands of times and took the variance of a 1,000 die rolls and you'd find it's about 2.92. And then we could also do this experiment where we roll ten dice, took the average, and repeated that process over and over again and got 10,000 averages of ten die rolls and you would find that the variance of those averages was about 0.292. Very interesting, and so it's a very simple formula. And so, let's belabor this point, on the next slide. So when the xs are independent with a common variance, the variance of x-bar is sigma squared over n. The quantity sigma over n, the square root of this is so important and we give it a

name and we call it the standard error of the sample mean. Basically, a standard error is nothing other than the standard deviation of a statistic, in this the statistic is the sample mean, but you might have a standard error another statistic for example the median, then itself has a standard error. It's may be little hard to calculate but nonetheless it has a standard error. So, what is the standard error? The standard error of a sample mean is the standard deviation of the distribution of the sample mean. So, sigma, the standard deviation talks about how variable the population is. Sigma over square root n talks about how variable the population of average is of size n from that population R. So two different statements and they estimate different things. So, for example, if the Xs are IQ measurements, Sigma talks about how variable IQs are. Sigma over square root ten, say, then talks about how variable averages of ten IQs are. Okay, so they're different, they're obviously related, but they're different concepts, and it's easy to confuse the two. An easy way to remember this, by the way, is that the sample mean has to be less variable than a single observation, therefore its standard deviation is divided by square root n, so that also gives you a sense of how the rate at which standard aviation's decline as you collect more data. So, since we've talked about the sample variance a lot why don't we actually define it. So the sample variances, that entity that we used data to estimate the population variance. So recall the population variance was the expected value or the average, the expected deviation of a random variable around its population mean. Right? So what is the sample variance? Well it's the average deviation of the sample values around the sample mean. So it's quite convenient. Now notice it's not exactly the average. We divide by n - one instead of n, which is a little annoying but we do it. So imagine for the time being that this an n in the denominator, not an n - one. Then the sample variance is nothing other

than the average square deviation of the observations around the sample mean. So the sample variance is an estimator of the population variance sigma squared. And just like the population variance has a short-cut formula, the sample variance also has a short-cut formula. Summation Xi minus X bar squared the top of the variance calculation is summation Xi squared minus nX bar squared. So, if some one gives you the sum of the squared observations in the sample mean, then you can calculate the sample variance really quickly. So, why do we divide by n - one instead of'n? And again, for large samples it's irrelevant right. The factor n - one / n is small. So you are going to get about the same answer either way. But for small samples it can make a difference. So why do we choose to divide by n -one? So, recall we have this property unbiasness. And the property of unbiasness meant that the statistic, it's expected value equal to the quantity that it's estimating. So, just to remind you, the sample variance is a function of our observed data. It's a function of our random variables, right? So itself is a statistic. So it is a random variable itself. So it has a distribution, and so, that distribution has a variance, and that distribution has a mean. Okay? That's what were going to talk about right now, is that the mean of that distribution turns out to be sigma squared if you happen to do the calculation where you divide by n - one. So I'm going to show it by showing that the expected value of the numerator of the statistic is equal to n - one times sigma squared, that's the same thing as showing that the, the sample variance is on biased because then you just divide both sides of this equation by n - one and you get the result. So let's do that. Just to say it again because it's important, what are we doing? Remember the sample variance is itself a random variable, that random variable has a distribution, that distribution has a population mean, and we want to say that,

that population mean is in fact sigma squared. Okay. So expected value of the numerator part of the sample variance calculation, the sum of the squared deviations around the sample mean. If we use the shortcut formula, that's sum of the expected value of the Xi^2 of minus expected value of X bar squared. Okay. And, now let's use a really kind of nifty fact. Recall for the variance. The shortcut variance formula was defined as the expected value of a random variable squared, minus the expected value of the random variable quantity squared. Well, we can shift that formula around, to get it to say that the expected value of a random variable squared is the variance plus the mean squared. And that's what we do right here, so the expected value of Xi^2, is variance Xi + mu^2. Okay. And then the same thing is true of course for the mean because the mean itself is another random variable. So expected value of' X bar squared is variance x bar + mu^2 and then we have this NR front. Okay. And so the variance of Xi is sigma squared, so you wind up with some sigma square + mu^2 which is the constant so we wind up with n of those and then the variance of X bar we just arrived a little bit ago is being sigma squared over n. So we get n times sigma squared over n + mu^2 and just collect terms now and you get n - one sigma squared. So this is really interesting fact. So this says that the expected value of the variance is in fact the quantity its trying to estimate if in fact you divide by n - one instead of n, and that's why we divide by n - one. Another way to think about this is that well, you know, we don't know the population mean, mu, and if we knew it, instead of plugging X bar into the sample variance formula, we would plug mu into the sample variance. We would calculate the deviations of the observed observations around the population mean rather than the deviations around the sample mean. And so, the idea is that we will sort of lose a degree of freedom by plugging in X

bar, its sample analog, instead of plugging in that mu. So that's the kinda heuristic behind why you divide by n - one. It's an interesting fact tough. It's not a 100 percent clear that you do want to divide by n - one, it's sort of every introductory statistics textbook divides by n - one but there's this interesting phenomenon called the bias-variance tradeoff and in this case we've obtained an unbiased estimator by dividing by n - one instead of n but what if we'd divided by n. Maybe as exercise, I could ask you to calculate the expected value of the sample variance. If it was calculated with n in the denominator instead of n - one. Okay, so basically, what is the expected value of n - one / n s^2 And you can calculate that very easily, it is not sigma square but it is quite close to it. So it's, it's a biased estimator but the other thing I would ask is well which of the two estimators, the estimator s^2 calculated with n - one in the denominator or calculation of the variance with an n in the denominator, has a lower variance, and what I do mean by that. Remember the sample variance is a random variable. It has a distribution, that distribution has a variance. And the question is, which of the two calculations dividing by n or dividing by n - one results in a smaller variance of that distribution. And what does that mean, that would mean how precise your estimate of the variance is. I'll give you the punch line. The sample variance divided by n has a slightly lower variance, than the sample variance divided by n - one. So, it's another kind of classic bias variance trade-off. In this case, we divide by n -one because we want unbiasedness. But then we wind up with slightly, greater variance. If we divide by n, we wind up with a slightly lower variance of our sample variance but it's slightly biased. I know extremely well established statisticians that say they would prefer to have the lower variance. But pretty much every introductory statistics textbook divides by n - one. It's kind of an interesting discussion,

you know, one of the confusions that always comes up seems quite simple. We divide by an n - one when we calculate the sample variance. People have a tendency to confuse that with the n that we divided by when we talked about the standard error of the mean. And so let's just try to avoid some of this confusion. Suppose you have a bunch of observations that you're willing to model as IID with population mean mu and population variance sigma squared. Then the sample variance, S^2 estimates the population variance, sigma squared. The calculation of S^2 involves dividing by n - one, and we just spent forever talking about the difference between dividing by n and dividing by n - one. Then, the standard error of the mean is Sigma over square root end. So, S over square root n will estimate the standard error of the mean. So we've already divided S^2 by n - one then we square rooted. And then we divide by an additional square root of n if we want the standard error of the mean. Okay, and I am just trying to avoid some confusion because people seem to get confused by that. So, I, I guess if you wanted to attach a label to the quantity S over square root n, it's the sample standard error of the mean. What does it estimate? It estimates the population standard error of the mean, sigma over square root n. Let's tie this down with some actual numbers. So I was involved in a study where there was a lot of organolead workers in this case, I took a subset of 495 of them and the total brain volume for the lead workers, they were interested in studying how their exposure to lead in their job changed their brain volume. So TBV stands for total brain volume, in this case as a measure of the brain volume on the inside of the skull, so and all of the measures are in cubic centimeters. So the mean, in this case, is 1151. If we're willing to assume these organolead workers are, say, an IID draw of organolead workers from a population that we're interested in. Then the sample mean, 1151 would be an estimate of that population mean. The sum of the squared observations works

out to be this number. So the standard deviation, the sample standard deviation works out to be that number Minus 495 times the sample mean squared all divided by 494 that minus one in the denominator. Square root the whole thing you end up with 112. So what does 112 describe? 112 describes the variance of the population of brain volumes of organolead workers. Okay, so it, its a direct estimate of my sample variation, right, and then its an attempt estimate, if you view my data as a sample from a population of organolead workers. It attempts to then, estimate the population standard deviation of that distribution. So we can, for example, use Chevey Cheves rule to interpret what the combination of the mean and the standard deviation say about brain volumes of lead workers in the population. Now, what does, if I take this 112.6 and divide it by square root 495 give me, gives me five as the numerical result but what does that five actually estimate or do for us? Well, the five is no longer talking about the variation in total brain volumes in the population. It's talking about the variation in averages of 495 organolead workers. So the idea is if we're willing to model our 495 organolead workers from as a draw from a population of organolead workers, then five estimates the distribution of averages of 495 draws of organolead workers from that population. It talks about how variable averages of 495 brain volumes are. The 112 talks about how variable brain volumes are, okay? So, let me just repeat that cuz it's very important. The 112 talks about how variable brain volumes are of organolead workers in the population and it directly talks about it in the sample. But it's an estimate of our population variance and the five is an estimate of the population standard deviation of averages of 495 organolead workers. So, I hope you're getting a sense of what these numbers are, are trying to calculate. So, there's several concepts that are being used here, first we have our

observed data, right? And these quantities, the sample mean, sample standard deviation and standard error tell us things about out observed data. Right? And then, there's the assumptions, for example, that they're IID, that help to try and connect it to a population. So that we can maybe generalize the results from this data to a population of organalead workers. Say, for example, if you wanted to use this data to inform policy and then, these numbers would then be estimates of these population quantities. And then dividing by the square root 495, it's telling us things about. How variable this mean is relative to the variability in the population. Okay? So, that's the concepts that we're trying to use, and we'll formalize these much more when we actually do things like generate confidence interval and perform hypothesis test in these things [music].