Lectures Prepared By: Elchanan Mossel Yelena Shvets

Introduction to probability
Stat 134 FAll 2005

Berkeley
Lectures prepared by:

Elchanan Mossel
Yelena Shvets
Follows Jim Pitman’s

book:
Probability
Sections 6.4
Do taller people make more money?
Question: How can this be measured?
- Ave (height)
wage at 19
Ave (wage)
height at 16
National Longitudinal Survey of Youth 1997 (NLSY97)
Definition of Covariance
Cov (X,Y)= E[(X-X)(Y – Y)]
Alternative Formula
Cov (X,Y)= E(XY) – E(X)E(Y)
Variance of a Sum
Var (X+Y)= Var (X) + Var (Y)+2 Cov (X,Y)
Claim: Covariance is Bilinear
Cov ( aX  b, cY  d)  E[( aX  E( aX ))(cY  E(cY ))]
 E[ ac( X   x )( Y   y )]
 acCov (X, Y).
What does the sign of covariance mean?
Look at Y = aX + b.
Then: Cov(X,Y) = Cov(X,aX + b) = aVar(X).
Ave(X) Ave(X)
y y
Ave(Y) Ave(Y)
a>0 a<0
x x
If a > 0, above the average in X goes with above the ave in Y.

If a < 0, above the average n X goes with below the ave in Y.
Cov(X,Y) = 0 means that there is no linear trend which connects
X and Y.
Meaning of the value of Covariance
Back to the National Survey of Youth study :
the actual covariance was 3028 where height is
inches and the wages in dollars.
Question: Suppose we measured all the heights in
centimeters, instead. There are 2.54 cm/inch?
Question: What will happen to the covariance?
Solution: So let HI be height in inches and HC be the
height in centimeters, with W – the wages.
Cov(HC,W) = Cov(2.54 HI,W) = 2.54 Cov (HI,W).
So the value depends on the units and is

not very informative!
Covariance and Correlation
Define the correlation coefficient:
X  E( X ) Y  E( Y )
  Corr( X, Y)  E(  )
SD( X) SD( Y)
Using the linearity of Expectation we get:

Cov ( X, Y)
 
SD( X)SD( Y)
Notice that (aX+b, cY+d) = (X,Y). This
new quantity is independent of the change in
scale and it’s value is quite informative.
Properties of correlation:
(X  X ) (Y  Y )
X 
*
and Y 
*
SD ( X) SD ( Y)
E ( X )  E ( Y )  0 and SD ( X )  SD ( Y )  1
* * * *
Corr ( X, Y)  Cov ( X * , Y * )  E ( X * Y * )
Claim: The correlation is always between –1 and +1
E ( X *2 )  E ( Y *2 )  1
0  E ( X *  Y * )2  1  1  2E ( X * Y * )
0  E ( X  Y )  1  1  2E ( X Y )
* * 2 * *
1  E ( X Y )  1
* *
1  Corr( X, Y)  1
= 1 iff Y = aX + b.
Correlation and Independence
X & Y are uncorrelated iff any of the following hold
Cov(X,Y) = 0,
Corr(X,Y) = 0
E(XY) = E(X) E(Y).
In particular, if X and Y are independent they are
uncorrelated.
X2
Example: Let X» N(0,1) and Y = X2, then
Cov(XY) =E(XY) – E(X)E(Y) = E(X3) = 0,
since the density is symmetric.
X
Roll a die N times. Let X be #1’s, Y be #2’s.
Question: What is the correlation between X and Y?
Solution:
To compute the correlation directly from the multinomial
distribution would be difficult. Let’s use a trick:
Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y).
Since X+Y is just the number of 1’s or 2’s, X+Y»Binom(p1+p2,N).
Var(X+Y) = (p1+p2)(1 - p1+p2) N.
And X»Binom(p1,N), Y»Binom(p2,N), so
Var(X) =p1(1-p1)N; Var(Y) = p2(1-p2)N.
Correlations in the Multinomial
Distribution
Hence
Cov(X,Y) = (Var(X+Y) – Var(X) – Var(Y))/2
Cov(X,Y) = N((p1+p2)(1 - p1-p2) - p1(1-p1) -p2(1-p2))/2 = -N p1 p2
Np1p2
 
Np1 (1  p1 ) Np2 (1  p2 )
 - p1p2
(1  p1 )(1  p2 )
In our case p1 = p2 = 1/6, so  = 1/5. The formula holds for

a general multinomial distribution.
Variance of the Sum of N Variables
Var(i Xi) = i Var(Xi) + 2 j<i Cov(Xi Xj)
Proof:
Var(i Xi) = E[i Xi – E(j Xi) ]2
[i Xi – E(j Xi) ]2 = [i (Xi –i) ]2

= i (Xi –i) 2 + 2 j<i (Xi –i) (Xj –j).
Now take expectations and we have the result.
Variance of the Sample Average
Let the population be a list of N numbers x(1), …, x(N).
Then
-x = (
i=1
N
x(i))/N - 2)/N
2 = (i=1N (x(i)-x)
are the population mean and population variance.
Let X1, X2,…, Xn be a sample of size n drawn from this

population. Then each Xk has the same distribution as
the entire population and
E( Xk )  x & Var(Xk ) = 2
(X1 +X2 +...+Xn )

Let Xn = be the sample average.
n
By linearity of expectation E( Xn )  x , both

for a sample drawn with and without replacement.
When X1, X2,…, Xn are drawn with replacement, they

are independent and each Xk has variance 2 . Then
n 2
 2

Var(Xn ) = = SD(Xn ) =
n 2
n n
Question: What is the SD for sampling without
replacement? Xn = Sn / n
Solution: Let Sn = X1 + X2 + … + Xn. Then .
Var(Sn) = i Var(Xi) + 2 j<i Cov(Xi Xj)

By symmetry: Cov(Xi,Xj) = Cov(X1,X2), so
Var(Sn) = n2 + n(n-1) Cov(X1 X2) .
In particular, 0 =Var(SN) N 2 + N(n-N) Cov(X1,X2)
Therefore: Cov(X1 X2) = -2/(N-1).
And hence Var(Sn) = 2 n(1- (n-1)/(N-1)).
Var(Sn )  Nn
Var(Xn ) = ; SD(Xn ) =
n 2
n N 1

Lectures Prepared By: Elchanan Mossel Yelena Shvets

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lectures Prepared By: Elchanan Mossel Yelena Shvets

Uploaded by

Copyright:

Available Formats

Introduction to probability

Stat 134 FAll 2005

Lectures prepared by:

Follows Jim Pitman’s

Question: How can this be measured?

If a > 0, above the average in X goes with above the ave in Y.

Cov(HC,W) = Cov(2.54 HI,W) = 2.54 Cov (HI,W).

So the value depends on the units and is

Using the linearity of Expectation we get:

In our case p1 = p2 = 1/6, so  = 1/5. The formula holds for

Var(i Xi) = i Var(Xi) + 2 j<i Cov(Xi Xj)

[i Xi – E(j Xi) ]2 = [i (Xi –i) ]2

Let X1, X2,…, Xn be a sample of size n drawn from this

(X1 +X2 +...+Xn )

By linearity of expectation E( Xn )  x , both

When X1, X2,…, Xn are drawn with replacement, they

Var(Sn) = i Var(Xi) + 2 j<i Cov(Xi Xj)

You might also like