Professional Documents
Culture Documents
Often it is convenient to describe a random variable using some location measure. The most important
location measure is the expected value (a.k.a. the mean or the weighted average). For a discrete random
variable X its expected value, denote E[X], is given by
X
E[X] = xk p(xk ).
k
Remark: Strictly speaking E[X] exists only if both E[X + ] and E[X − ] exists and are not both ∞ where
X = max(X, 0) and X − = max(−X, 0).
+
Example: Suppose X takes values {0, 1, 2, 3} with probabilities {1/8, 3/8, 3/8, 1/8}. Then, E[X] = 0(1/8)+
1(3/8) + 2(3/8) + 3(1/8) = 12/8 = 1.5.
Remark: E[X] need not be a possible outcome of X.
Frequency, or long run average interpretation of the expected value: Example: Suppose a random
variable X represents the profit associated with the production of some item that can be defective or non-
defective. Suppose that the profit is −2 when the item is defective and 10 when the item is non-defective.
Finally, assume that p(−2) = 0.1, p(10) = 0.9. Then E[X] = −2(0.1) + 10(.9) = 8.8. Suppose that a very
large number n of items are produced and let n(G) be the number of good items and n(D) is the number of
defective items. Then the average profit per items is
n(D) n(G)
−2 + 10
n n
The frequency interpretation is that n(G)/n converges in some sense to be defined later to p(10) = 0.9. This
convergence is know as the Law of Large Numbers.
E[X]
P (X ≥ c) ≤ .
c
Notice that the inequality is non-trivial if c > EX. Proof (Discrete case):
X X X
E[X] = kp(k) ≥ kp(k) ≥ c p(k) = cP (X ≥ c).
k≥0 k≥c k≥c
1
Relevant for k ≥ 1.
Example: Expected life of a machine is 10 years.
10
P (L ≥ 20) ≤ = 0.5.
20
where pG is the probability mass function of the random variable G(X). This requires computing pG the
probability mass function of the random variable G(X).
Example: Suppose X take values {−2, −1, 0, 1, 2} with probabilities {1/8, 1/4, 1/4, 1/4, 1/8} and Y = X 2 .
Then Y takes values {0, 1, 4} with probabilities {1/4, 1/2, 1/4}. Therefore,
E[X 2 ] = (−2)2 (1/8) + (−1)2 (1/4) + 02 (1/4) + 12 (1/4) + 22 (1/8) = 4/8 + 1/4 + 0 + 1/4 + 4/8 = 1.5.
1.3 Variance
Consider G(X) = (X − E[X])2 . Notice that G(X) is a r.v. so we can calculate its expected value.
Definition: E[(X − E[X])2 ] is known as the variance of X and written Var[X] or σ 2 .
For discrete random variables we have
X
Var[X] = E[G(X)] = (xk − E[X])2 p(xk ),
k
2
p
Note σ = + Var[X] is called the standard deviation of X.
Variance as a measure of spread of a distribution: Example: p(a) = p(−a) = 0.5, then E[X] = 0 and
Var[X] = a2 and σX = a. In this case the standard deviation gives us a good idea of how the distribution is
spread around the mean. Example: p(0) = 1−1/a, p(a) = 1/a. Then E[X] = 1 and Var[X] = a−1. In this
example the variance increases to infinity while the random variable becomes more and more concentrated
on zero.
σ2
P (|X − µ| ≥ a) ≤ .
a2
Proof: Let Y = (X − µ)2 . Then Y ≥ 0. Therefore Markov’s inequality applies and
E[(X − µ)2 ] σ2
P ((X − µ)2 ≥ a2 ) ≤ = .
a2 a2
The result follows since P ((X − µ)2 ≥ a2 ) = P (|X − µ| ≥ a). 2.
Special case a = kσ then
1
P (|X − µ| ≥ kσ) ≤ 2 .
k
One sided Tchebychev’s inequality:
1
P (X − µ ≥ kσ) ≤ for all k > 0,
k2 +1
and
1
P (X − µ ≤ −kσ) ≤ for all k > 0.
k2 +1
3
and then use the definition of expectation. Fortunately, there is no need to do this as the following formulas
allow a direct calculation of E[Z]. In the discrete case
XX
E[g(X, Y )] = g(x, y)pX,Y (x, y).
x y
|Cov(X, Y )| ≤ σx σy .
The term
Cov(X, Y )
ρ=
σx σy
is known as the correlation coefficient. Notice that |ρ| ≤ 1.
Example: If X is the number of heads in the first two tosses of a fair coin and Y is the number of heads in
the first three tosses then E[X] = 1, E[Y ] = 1.5, Var[X] = 0.5, Var[Y ] = .75, Cov(X, Y ) = 0.5 and ρ = 0.82.
It is easy to see that
Cov(aX + bY, X) = aCov(X, X) + bCov(Y, X).
4
where F (x, y) = P (X ≤ x, Y ≤ y) is the joint cumulative distribution function and FX (x) = P (X ≤ x) and
FY (y) = P (Y ≤ y) are the marginal cumulative distribution functions.
Independence is a very important concept. It is easy to see that independent random variables are
uncorrelated, e.g., have zero correlation. To see this notice that
XX
Cov(X, Y ) = (x − E[X])(y − E[Y ])p(x, y)
x y
XX
= (x − E[X])(y − E[Y ])pX (x)pY (y)
x y
X X
= (x − E[X])pX (x) (y − E[Y ])pY (y)
x y
= (EX − EX)(EY − EY )
= 0.
However, uncorrelated random variables are not necessarily independent as shown in the following ex-
ample.
Example: Toss two coins. Let Xi be the number of heads in the ith toss. Let D = X1 −X2 and S = X1 +X2 .
Are D and S independent? No. Because if D = 0 then S 6= 1. Notice that E[D] = 0, consequently
Example: