You are on page 1of 10

CL 202: Introduction to Data Analysis

Expectation of a Random Variable and Function of a Random Variable


Sachin C. Patwardhan and Mani Bhushan
Dept. of Chemical Engineering, IIT Bombay

Expectation of a Random Variable

Random variables are complicated objects, containing a lot of information on the experiments
that are modeled by them. If we want to summarize a random variable (RV) by a single
number, then this number should undoubtedly be its central tendency. The expected value,
also called the expectation or mean, gives the center in the sense of average value of the
distribution of the random variable.

1.1

Discrete Random Variable

Example 1 (Average Life of a Drill Bit) An oil company needs drill bits in an exploration project. Suppose that it is known that (after rounding to the nearest hour) drill bits
of the type used in this particular project will last 2, 3, or 4 hours with probabilities 0.1, 0.7,
and 0.2. If a drill bit is replaced by one of the same type each time it has worn out, how
long could exploration be continued if in total the company would reserve 10 drill bits for the
exploration job?
Let X denote life of a drill bit, which is a random variable that takes values 2, 3, or 4
hours. Question is, if we want to generate an informed estimate of how long the exploration
job will last, which value of X should we use? A logical way to answer this question can be
to use some central tendency, such as average value of X; to answer the question. Since
the probability mass associated with each value of X is dierent, i.e.
P (X = 2) = 0:1; P (X = 3) = 0:7; P (X = 4) = 0:2

(1)

we can use these probabilities to compute a weighted average lifeof a drill bit as follows
X = 0:1

2 + 0:7

3 + 0:2

4 = 3:1

(2)

and then conclude that the exploration could continue for 10 3:1 = 31 hours.
This weighted average is what we call the expected value or expectation of the random
variable X; whose distribution is given by equation (1). It might happen that the company
is unlucky and that each of the 10 drill bits has worn out after two hours, in which case
exploration ends after 20 hours. At the other extreme, they may be lucky and drill for 40
1

Figure 1: Drill Bit Example: Expectation as Center of Gravity [1]

hours on these bits. However, it is a mathematical fact that the conclusion about a 31-hour
total drilling time is correct in the following sense: for a large number N of drill bits the
total running time will be around N times 3.1 hours with high probability.
Denition 2 The expectation of a discrete random variable X taking the values a1 ; a2 , . .
. and with probability mass function
fX (ai ) = P (X = ai ) =

for i = 1; 2; ::::

fX (a) = 0 otherwise
is the number
E [X] =

ai P (X = ai ) =

X
i

ai fX (ai ) =

i ai

(3)

We also call E [X] the expected value or mean of X. Since the expectation is determined
by the probability distribution of X only, we also speak of the expectation or mean of the
distribution.
Looking at an expectation as a weighted average gives a more physical interpretation of
this notion, namely as the center of gravity of weights i = fX (ai ) placed at the points ai
(see Figure 1). This point of view also leads the way to how one should dene the expected
value of a continuous random variable.

Continuous Random Variable

Let X be a continuous random variable whose probability density function fX (x) is zero
outside the interval [a; b]. Now, let us divide the interval into n small sub-intervals of equal
size as follows
xi = a + i x for i = 0; 1; :::n
(4)
where

(5)
n
Thus, we have x0 = a and xn = b: It seems reasonable to approximate X by a discrete
random variable, Y , taking the values
x=

(6)

Y = x1 ; Y = x2 ; ::::::; Y = xn

the interval [a; b] with as probabilities the masses that X assigns to the intervals [xi 1 ; xi ];
i.e.
Zxi
fX (x)dx
x [fX (xi )]
(7)
X xi ) =
i = P (Y = xi ) = P (xi 1
xi

For large n; we can write


i

Zxi

xi

fX (x)dx

x [fX (xi )] =

fX a +

i
(b
n

a)

(8)

In other words, we have approximated (X; fX (x)) as (Y; fY (y)): The center-of-gravity interpretation suggests that the expectation E[Y ] of Y should approximate the expectation E[X]
of X, i.e.
n
n
X
X
E[X] E[Y ] =
( x [fX (xi )]) xi
(9)
i xi =
i=1

i=1

By the denition of a denite integral, for su ciently large n, we have


Z
n
lim X
( x [fX (xi )]) xi = xfX (x)dx
n ! 1 i=1
b

This motivates the the denition of expected value of a continuous RV.


Denition 3 The expectation of a continuous random variable X with probability density
function f is the number
Z1
= E[X] =
x fX (x) dx
(10)
1

Figure 2: Expected value as center of gravity, continuous case. [1]

Note that E[X] is indeed the center of gravity of the mass distribution described by the
function f; i.e.
Z1
x fX (x) dx
Z1
1
E[X] = Z1
=
x fX (x) dx
(11)
1

fX (x) dx

This is illustrated in Figure (2).


Remark 4 At this stage, it may appear that there are separate denitions for mean of a
continuous RV and for mean of a discrete RV. However, given a discrete RV, say X, taking
the values a1 ; a2 , . . . and with the probability mass function
fX (ai ) = P (X = ai ) =

for i = 1; 2; ::::

we can unify the denition of mean of a random variable, if we express the probability mass
function as follows
n
X
fX (x) =
ai )
i (x
i=1

It follows that

E[X] =

Z1

x fX (x) dx =

Z1

n
X
i=1

(x

ai ) dx

and using the properties of the Dirac delta function, we have


01
1
Z
n
n
X
X
@
A
E[X] =
x
(x
a
)dx
=
i
i
i=1

i ai

i=1

Expectation of a Function of a Random Variable

Now consider scenario where we are interested in computing expected value of some function
Z = g(X) of a random variable X. Such situations are very often encountered in engineering
applications.
Example 5 (Modied Drill Bit Example) Consider the drill bit example with the following additional assumptions. Let us assume that there three dierent quality drill bits
available in the market: low quality (with life X=2 hrs), average quality (with life X=3 hrs)
and high quality (with life X=4 hrs) and that dierent quality drill bits are stocked in dierent rooms. Moreover, on a given day, let us further assume that (a) all the drill bits issued
to an operator are of identical quality and (b) the room is chosen randomly. In other words,
on a given day, an operator can receive either 12 low quality drill bits or 8 average quality
drill bits or 6 high quality drill bits for carrying out the drilling operation. The associated
probability mass function is
P (X = 2) = 0:1; P (X = 3) = 0:7; P (X = 4) = 0:2

(12)

Under these constraints, assume that the daily operating cost of the exploration can be expressed as a function of the number of drill bit changes needed in 24 hrs as follows
Z = C(24=X)

Rs=day

where where C is a constant. Since X is a random variable, it follows that Z is also a


random variable. The oil exploration company would be interested in nding the average
daily operating cost of the exploration.
Example 6 An L-R-C circuit is to be designed as a part of a electronic device. Suppose that
the electronic device is to be placed in an industrial environment where temperature uctuates
randomly and between a T
b and the associated probability density function fT (x) is a
uniform distribution, i.e.
)
(
1
for
x
2
[a;
b]
(b a)
fT (x) =
0
otherwise
5

Let us further assume that the resistance is a monotonically increasing function of the temperature of the form
R = A + B(T a)2
where constants (A,B) are known. Obviously, the resistance is also a random variable and it
is di cult to design the circuit for a randomly changing value of resistance. Thus, a designer
would be interested in nding an average value of R that can be used for designing the circuit.
Thus, if X is a random variable, then a function of X, i.e. Z = g(X); is also a random
variable. The question is, if the probability density/mass function, fX (x); is known, how do
we nd average value of g(X)?

3.1

Discrete Random Variable

To begin with, let us assume that X is a discrete RV which can take values a1 ; a2 :::::and
has associated probability mass function fX (x): By virtue of the fact that X is a RV, it
follows that Z is a discrete RV that takes values z1 = g(a1 ); z2 = g(a2 ); ::::and so on.
Moreover, probability that X = ai is equal to i = fX (ai ) implies that the probability
that Z = zi (= g(ai )) is also equal to i . This leads us to the expected value of a function
of a discrete RV.
Denition 7 Consider a discrete random variable X taking the values a1 ; a2 , . . . and with
probability mass function fX (x). The expectation of a function g(X) of X is the number
X
X
E [g(X)] =
g(ai ) P (X = ai ) =
(13)
i g(ai )
i

Example 8 Consider the modied drill bit example. For the transformed variable, Z, we
can nd the probability mass function as follows
P (Z = 24C=2) = P (X = 2) = 0:1

(14)

P (Z = 24C=3) = P (X = 3) = 0:7

(15)

P (Z = 24C=4) = P (X = 4) = 0:2

(16)

The average daily operating cost can now be computed as follows


E [Z] = E [g(X)] = 0:1

12C + 0:7

8C + 0:2

6C = 8C

Note that
E [Z] 6= g(X) = 24C=3:1 = 7:742C
In fact, E [Z] 6= g(X) in general and the equality holds only when the transformation is
linear.
6

3.2

Continuous Random Variable

Now, let X represent a continuous random variable with probability density function fX (x)
and
Zx
FX (x) = P ( 1; x] =
fX ( ) d
1

Let us further assume that fX (x) is zero outside the interval [a; b] and fX (x)
0 when
x 2 [a; b]. Consider the approximation of X using a discrete RV Y which takes values
fx1 ; x2 ; :::xn g as given by equations (4) and (5) together with associated probability mass
function
x [fX (xi )]
(17)
i = P (Y = xi ) =
where x [fX (xi )] is given by equation (8). Now, consider a transformation Z = g(Y ), that
takes discrete values z1 = g(x1 ); z2 = g(x2 ); ::: and so on. Since we have P (Y = xi ) =
x [fX (xi )] = i ; it is obvious that
P (Z = g(xi )) = P (Y = xi ) =

x [fX (xi )]

Now, using denition of mean of a function of a discrete RV given by equation 13, it follows
that
X
E [Z] =
g(xi ) P (Z = g(xi ))
i

i g(xi ) =

By the denition of a denite integral,


side of equation (18) is close to

g(xi ) [fX (xi )] x

(18)

x ! 0 for su ciently large n and the right-hand

Z
n
lim X
( x [fX (xi )]) g(xi ) = g(x)fX (x)dx
n ! 1 i=1
b

This motivates the the denition of expected value of function of a continuous RV.
Denition 9 Consider a continuous random variable X with probability density function
fX (x): The expectation of a function Z = g(X) of X is the number
E[Z] = E [g(X)] =

Z1

g(x)fX (x) dx

(19)

An alternate way of nding E[Z] is to nd the associated probability density function


fZ ( ) and then evaluate E[Z] as follows
E[Z] =

Z1

(20)

fZ ( ) d

Example 10 Continuing with the L-R-C circuit design example, the average value of resistance R can be computed as follows:
Approach 1: Using fT (x)
E[R] = E [g(T )] =

(b

g(x)fT (x) dx =

A(b

a)

1
(b

a)

Zb

a)2 ) dx

(A + B(x

(21)

Z1

a) +

B
(b
3

a)3 = A +

B
(b
3

a)2

(22)

This result also shows that the average value of the function is not equal to the value of the
function evaluated at T = T = (a + b)=2, i.e.
g(T ) = A + B

a)2

(b

6= E[g(T )]

Approach 2: Using fR (r)


Since R is a monotonically increasing function of T , it follows that T 2 [a; b] implies
R 2 [Ra ; Rb ] where
Ra = A and Rb = A + B(b a)2
To derive the probability density function of the transformation, consider
"
r
FR (r) = P (R

a)2

r) = P (A + B(T

where r 2 [Ra ; Rb ]. Since the other root, i.e. a


"

P T

a+

A
B

a+

pr A
Z B

r) = P T

r A
B

a+

fT (x) dx =

pr A
Z B
a

a+

A
B

a, we can write

1
b

dx =

1
b

"r

The probability density function of R can now be derived as follows


9
8
1
< dFR (r) = p 1
p
when r 2 [Ra ; Rb ] =
dr
fR (r) =
2 B(b a) r A
:
;
0
when r 2
= [Ra ; Rb ]
8

A
B

Thus, the average value of R can be computed as


E[R] =

Z1

1
rfR (r) dr = p
2 B(b
RZ
b A

A
t+ p
t

ZRb

rdr
1
p
= p
r A
2 B(b

dt =

2 3=2
t + 2At1=2
3

a)

Ra

RZ
b A

a)

(t + A)dt
p
t

Ra A

Rb A
Ra A

Ra A

Since
A = 0 and Rb

Ra

a)2

A = B(b

it follows that
2 3=2
t + 2At1=2
3

Rb A

=
Ra A

and

2
B(b
3

a)2

1
E[R] = A + B(b
3

3=2

a)2

+ 2A B(b

1=2

a)2

If Z = g(X) is a complex function or integral of fT (x) does not have a closed from
expression, then deriving the probability density function, fZ ( ) can prove to be a di cult
task. For example,assume that the probability density function of the temperature in the
L-R-C circuit design example is given as follows
h
i
(
)
(x )2
exp
for
x
2
[a;
b]
2
2
fT (x) =
0
otherwise
where

2 [a; b] and

are constants. In this case, the integral


a+

FR (r) =

pr A
Z B

a+

fT (x) dx =

pr A
Z B
a

exp

"

(x

)
2

dx

does not have a simple closed form solution. Thus, if it is desired to compute only E[X],
then Approach 1 is generally preferred over Approach 2.

Mean as Central Tendency of a RV

An alternate interpretation of the mean or expected value of a random variable is as follows


[2]. Given a RV X; consider the problem of nding a (constant) number c that represents the
9

central tendency of X. This could be done by nding c such that some function of deviation
jx cj is minimized. Let us dene
n

= E [jX

cjn ]

(23)

and seek c that minimizes n for dierent values of n: In particular, consider minimizing
c)2 , which expands to
2 = E (X
2

= E X2

2Xc + c2 = E X 2

2cE[X] + c2

(24)

because c is a constant. Using the necessary condition for optimality, we have


d 2
=
dc

2E[X] + 2c = 0

(25)

which implies that


copt =

= E[X]

(26)

and
[ 2 ]min = E (X

E[X])2

(27)

Thus, the mean is the best single representative of the theoretical centroid of a random
variable if we are concerned with minimizing mean squared deviation from all possible values
of X [2]. Moreover, the quantity [ 2 ]min = E (X E[X])2 is known as variance of RV X.
Remark 11 Minimizing 1 = E [jX cj] w.r.t. c yields the median of the distribution while
minimizing n as n ! 1 w.r.t. c yields the mode of the distribution [2].

References
[1] Dekking, F.M., Kraaikamp, C., Lopuhaa, H.P., Meester, L. E., A Modern Introduction
to Probability and Statistics: Understanding Why and How, Springer, 2005.
[2] Ogunnaike, B. A., Random Phenomenon: Fundamentals of Probability and Statistics for
Engineers, CRC Press, 2010.

10

You might also like