You are on page 1of 7

Variance and Covariance of Random Variables

Variance and covariance are mathematical terms frequently used in statistics,


and despite the similar-sounding names they actually have quite different meanings.
A covariance refers to the measure of how two random variables will change together
and is used to calculate the correlation between variables. The variance refers to the
spread of the data set-how far apart the numbers are in relation to the mean, for
instance. Variance is particularly useful when calculating the probability of future
events or performance.

Definition.
The variance of a random variable 𝑋 with the expected value 𝔼𝑋 = 𝜇𝑥 is
defined as 𝑣𝑎𝑟(𝑋) = 𝔼((𝑋 − 𝜇𝑥 )2 ). The covariance between random variables
𝑌 and 𝑍, with expected value 𝜇𝑦 and 𝜇𝑧 , is defined as 𝑐𝑜𝑣(𝑌, 𝑍) = 𝐸 ((𝑌 −
𝜇𝑦 )(𝑍 − 𝜇𝑧 )). The correlation between 𝑌 𝑎𝑛𝑑 𝑍 is defined as

𝑐𝑜𝑣(𝑌, 𝑍)
𝑐𝑜𝑟𝑟(𝑌, 𝑍) =
√𝑣𝑎𝑟(𝑌)𝑣𝑎𝑟(𝑍)

How to calculate Variance

Variance is calculated by taking the differences between each number in a data


set and the mean, squaring those differences to give them positive value, and dividing
the sum of the resulting squares by the number of values in the set.

The formula for variance is as follows:


∑(𝑥−𝜇)2
𝜎2 = 𝑁

In this formula, X represents an individual data point,

𝜇 represents the mean of the data points

𝑁 represent the total number of data points.

Note that while calculating a sample variance in order to estimate a population


variance, the denominator of the variance equation becomes 𝑁 − 1. This removes
bias from the estimation, as it prohibits the research from underestimating the
population variance.
Simplifying the Variance formula

We have seen that variance of a random variable is given by:

𝜎 2 = 𝐸[(𝑋 − 𝜇)2 ]

We can attempt to simplify this formula by expanding the quadratic in the formula
above as follows:

(𝑋 − 𝜇)2 = 𝑋 2 − 2𝑋𝜇 + 𝜇 2

We shall see in the next section that the expected value of a linear combination
behaves as follows:

𝐸[𝐴 + 𝐵] = 𝐸[𝐴] + 𝐸[𝐵]

Substituting the expanded form into the variance equation:

𝜎 2 = 𝐸[𝑋 2 − 2𝑋𝜇 + 𝜇 2 ]

= 𝐸[𝑋 2 ] + 𝐸[−2𝑋𝜇] + 𝐸[𝜇 2 ]

Remember that after you’ve calculated the mean 𝜇, the result is a constant and the
expected value of a constant is that same constant.

𝐸[𝜇 2 ] = 𝜇 2

The simplifies the formula as shown below:

𝜎 2 = 𝐸[𝑋 2 ] − 2𝜇𝐸[𝑋] + 𝜇 2

But

𝐸[𝑋]=𝜇 2

Which means that,

𝜎 2 = 𝐸[𝑋 2 ] −2𝜇 2 + 𝜇 2.
Example 1.

A software engineering company tested a new product of theirs and found that
the number of errors per 100 CDs of the new software had the following probability
distribution:

x f(x)

2 0.01

3 0.25

4 0.4

5 0.3

Find the Variance of X. 6 0.04

Solution

The probability distribution given is discrete and so we can find the variance
from the following:

𝑉𝑎𝑟(𝑋) = 𝜎 2 = ∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2 𝑓(𝑥)

We need to find the mean 𝜇 first

𝜇 = [(2 × 0.01) + (3 × 0.25) + (4 × 0.4) + (5 × 0.03) + (6 ×


0.04)] × 100

𝜇 = 4.11 × 100

𝜇 = 4.11 𝑒𝑟𝑟𝑜𝑟𝑠 𝑜𝑟

𝜇 = 4.11 𝑒𝑟𝑟𝑜𝑟𝑠 𝑝𝑒𝑟 100 𝐶𝐷𝑠

Then we find the variance:


2
𝜎 2 = ∑𝑛𝑖=1(𝑥𝑖 − 𝜇) 𝑓(𝑥)

𝜎 2 = (2 − 4.11)2 + (3 − 4.11)2 (0.25) + (4 − 4.11)2 (0.25) +


(4 − 4.11)2 (0.4) + (5 − 4.11)2 (0.3) + (6 − 4.11)2 (0.04)

𝜎 2 = 0.74
Example 2.

Suppose that the cost of maintaining a car is given by a random variable, 𝑋,


with mean 200 and variance 260. If a tax of 20% is introduced on all items associated
with the maintenance of the car, what will be the variance of the cost of maintaining
a car?

Solution:

The new cost is 1.2𝑋, so its variance is

𝑉𝑎𝑟(1.2𝑋) = 1.22 𝑉𝑎𝑟(𝑋) = 1.44 ∙ 260 = 374

Exercises

Exercises 1.

The profit for a new product is given by 𝑍 = 3𝑋 − 𝑌 − 5, where 𝑋 and 𝑌 are


independent random variables with 𝑉𝑎𝑟(𝑋) = 1 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑌) = 2. What is the
variance of Z?

Exercise 2.

A company insures homes in three cities, J,K,L. the losses occurring in these
cities are independent. The moment-generating functions for the loss distributions of
the cities are

𝑀𝑗 (𝑡) = (1 − 2𝑡)−3 , 𝑀𝑘 (𝑡) = (1 − 2𝑡)−2.5 , 𝑀𝑙 (𝑡) = (1 − 2𝑡)−4.5

Let 𝑋 represent the combined losses from the three cities. Calculate 𝐸(𝑋 3 ).
How to calculate the Covariance

The formula for covariance is as follows:


∑𝑛
𝑖−1(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦
̅)
𝐶𝑂𝑉(𝑥, 𝑦) = 𝑛−1

In this formula, 𝑋 represent the independent variable,

𝑌 represents the number of data point in the sample,

𝑥̅ represents the mean of the 𝑋

𝑦̅ represents the mean of the dependent variable 𝑌.

Example 1.

Find the covariance between the given two sets of data.

𝑋: 2,5,8,11

𝑌: 5,9,1,4

Solution:

The mean of both the vectors can be given as,

𝑥̅ = 6.5

𝑦̅ = 4.75

The covariance can be calculated by using the formula.


∑𝑛 ̅)
𝑖=1(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦
𝑐𝑜𝑣(𝑋, 𝑌) = 𝑁−1

Putting the values and calculating we get,


(2−6.5)+(5−6.5)+(11−6.5))((5−4.75)+(1−4.75)
𝑐𝑜𝑣(𝑋, 𝑌) = 4−1

= −5.5

Example 2.

The marks of 6 students of a class for Mathematics and English is given here.
Find the covariance between them.

English 65 76 61 86 88 70

Mathematics 72 67 56 94 81 98
Solution:

The mean of both the data set is given as,

English, 𝑥̅ = 74.33

Mathematics 𝑦̅ = 78

Total number of observations, 𝑁 = 6

The covariance can be calculated by using the formula,


∑𝑛 ̅)
𝑖=1(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦
𝑐𝑜𝑣(𝑋, 𝑌) = 𝑁−1

Putting the values and calcutating we get,

𝑐𝑜𝑣(𝑋, 𝑌) = 94.4

Exercise 1.

Suppose the Joint PMF (probability mass function) is given by the insurance
company in the accompanying joint probability table:

𝑝(𝑥, 𝑦) 0 100 200

X 100 .20 .10 .20

250 .05 .15 .30

What is the covariance between 𝑋 𝑎𝑛𝑑 𝑌?

Exercise 2.
An insurance agency services customer who have both a homeowner’s policy
and an automobile policy. For each type of policy, a deductible amount must be
specified.

For an automobile policy, the choices are $100 and $250, whereas for a
homeowner’s policy, the choices are $0, $100, and $200.

Suppose an individual -Bob- is selected at random from the agency’s files.

You might also like