# Okay. So, welcome back troops.

We are going to talk about covariance and correlation now and what happens when random variables are independent. So, if we have two random variables X and Y then their covariance is defined as here Cov(X, Y) the expected value of X minus its mean times Y minus its mean, whole quantity. And just like the variance, there's a short cut formula for covariances, and it works out to be E(XY) - E(X)E(Y). So there's some very useful facts about covariants. First of all, the covariants you can interchange the variables and you get the same number, so Cov(X, Y) = Cov(Y, X) The covariants can be negative or positive. But, in an application of so-called Jensen's inequality will tell you that the absolute value of the Cov(X, Y) is less than or equal to the square root of the variance of X times the variance of Y. So we could just write this right hand side as the standard deviation of X times the standard deviation of Y. This final property is very useful because as we go onto the next slide we use it to define the correlation the correlation of X and Y is nothing other than the covariance divided by the product of the standard deviations. So, there is couple of reasons why we might want to do this. From the previous slide, notice that this then normalizes the covariance so that's between -one and + one. And that's very useful thing to do so that we maybe we can somehow utilize the idea of covariance across different kinds of random variables. Now, another rational for doing this is, if we look back at our covariance formula, right, it's expected value of X minus the mean of X times Y minus the mean of Y. Well that has units of X times units of Y. Right? So the top part of the calculation has units X time units Y. The standard deviation of X has units of X, and standard deviation of Y has units Y. So the bottom part says units X times units Y. So the correlation is a unit free measurement, that's a useful property to have. So with X is in inches and Y is in pounds then the covariance is in inches times

pounds but then the correlation is unit free, which is useful. Correlations have some nice properties, one is that the correlation is only plus or minus one if and only if the random variables are linearly related X = a + bY for some constants a and b. Correlation is unitless as we have discussed already. And then we say X and Y are uncorrelated if correlation X and Y are zero. And sort of the more positively correlated they are, the closer correlation core X going to Y gets to one, and the more negatively correlated as core XY gets close to -one. And this is again, a description of a population quantity. Not a sample quantity right, so this is a description of your using a joint probability maths function of joint probability function density function to model the population behaviour of X and Y then we want ways to summarize that joint maths function or joint density function and the correlation is a summary of how related joint random variables are from this distribution. So it's a summary of a population quantity. And of course if something's a population quantity we want sample quantities that are able to estimate them. So probably what you've heard of if you've never had a mathematical statistics class before, is the sample correlation and, again, the goal of the sample correlation is to estimate the population correlation, if you're using a probability model. So the sample correlation estimates the population correlation. So, if you've ever had a sample correlation and you've had a probability model, what you are trying to estimate is the population correlation. The S demand is the population correlation. So it follows the same rule we have so far for everything. The sample variance estimates the population variance. The sample standard deviation estimates the population standard deviation. The sample median estimates the population median. So all these sample quantities have analogous population quantities. So, if two random variables X and Y are independent then their correlation is zero.

The reverse is not true. Things can be uncorrelated. But not the independence so if they're independent then they're uncorrelated, but if they're uncorrelated they're not necessarily independent. In the case of Gaussian random variables, by the way the two things agree always, but in general that's not, not always the case. So, let's talk about some useful results that rely on correlation and covariances and probably the most useful one we'll talk about is this variance idea. So if we have a collection of random variables X1 to Xn. When the Xs are uncorrelated, and here I wrote out a very general forms. The sum of the X's may be times some constants AI plus a constant B works out to be the sum of AI^2 times event of the individual X's. And, let's think about this specific case when B is zero and the A's are all one. That just means that the variance of the sum is the sum of the variances. And here we just wrote off the slightly more general term. We know in general that constants pull out variances and get squared. That's why you have the AI^2 and we also know that when you have a random variable and you shift it by a constant B, it doesn't change its variance. It just moves the density to the left or right. So it doesn't change the variant. So the A's and the B's are kind of fluff on top of this equation. The core of this equation is, just think about the instance when B is zero and A is one That when the X's are uncorrelated, they don't have to be independent. They just have to uncorrelated for this result to hold. Then the variance of the sum is the sum of the variances. If they are not uncorrelated, then you can actually calculate what their variance is in a way that it depends on the covariance. It works out to be the sum of the variances with vii squares of one plus twice the sort of sum of all the pairs of covariances. So this is a very useful formula. We won't use it in this class but I thought I'd give it to you and, and so notice if they're all uncorrelated all these terms here zero and then we get the

top formula. The top formula is what we're really going to use in this class, and it basically says that the variance of the sum is the sum of the variances if you have independent events. The other important thing that this kind of says is that, you know, you shouldn't be adding standard deviations, you should probably be adding variances is another way to kind of think about it in general. So this leads to an interesting proof of a useful property that the variance of X bar, the sample mean, is sigma squared over n and it also leads to the fact that the expected value of the sample variance is sigma squared. These are two very important properties that we'll go on and on about. Okay. So, I don't want to prove the general facts from the previous slide. Let's just go through the heuristic, because it's pretty easy to do. So let's prove that the variance of X + Y is the variance of X plus the variance of Y plus twice the covariance of X and Y. So, at the top line, let's start with the variance of X + Y. Well just by the definition of variance, right, that's the E[(X + Y)(X - Y)] - E(X + Y)^2 Right? So, if you confused by that, I just replace X + Y with the random variable Z, and it's expected value Z squared minus expected value of Z quantity squared directly plugging into the short-cut variance calculation formula. Well, the right hand element here is expected value is always coming across sums so we have mu of X + mu of Y quantity squared. And the left hand side, let's just expand out this X + Y squared, to get X^2 + 2XY + Y^2 and then let's just move the expected value across the, three elements of this expression. Well then, let's just organize terms. And we get expected value of X^2 minus mu of X^2 plus expected value of Y^2 minus mu of Y^2. And into expected value of XY - mu X mu Y. Well this first one, expected value of X^2 - mu X^2 that's variance of X. The second one, expected value of Y^2 - mu Y^2, that's variance of Y. And this latter part too, expected value of XY - mu X mu Y, right? Expected value of X times expected value of Y for this right hand part.

Well, that's just Cov(X, Y) completing the proof. So you can see that it, it only requires the basic rules for expected values to perform this calculation. And the definition of covariance to perform this calculation. So, just to reiterate some things we discussed earlier. If a collection of random variables are uncorrelated then sum of the variances, this is the variances with the sum. So, what it basically means is that sums of variances tend to be useful and not sums of standard deviations, and this is just the issue I'm trying to raise is. Don't sum standard deviations. So, in other words, the standard deviation of the sum of a bunch of independent random variables is the square root of the sum of the variances not the sum of the standard deviations. So, its just a common little problem so maybe try and avoid it from the start.