You are on page 1of 12

© Baidurya Bhattacharya, IIT Kharagpur. All rights reserved.

CHAPTER 4. RANDOM VARIABLES


When the possible outcomes of an experiment (or trial) can be given in numerical terms,
then we have a random variable in hand. When an experiment is performed, the outcome
of the random variable is called a “realization.” A random variable can be either discreet,
or continuous. A random variable is governed by its probability laws.
If a quantity varies randomly with time, we model it as a stochastic process. A stochastic
process can be viewed as a family of random variables.
If a quantity varies randomly in space, we model it as a random field, which is the
generalization of a stochastic process in two or more dimensions.
Formally, a measurable function5 defined on a sample space is called a random variable
(Feller, vol 1, p. 212). That is, X is a random variable if X  X ( ) is a function defined
on the sample space  , and for every real x, the set

 : X ( )  x
is an event in  . Thus we confine ourselves to  -algebra of events of the type X  x .
Unless explicitly required, we suppress the argument  when referring to a random
variable in the rest of this text.





-∞  : X ( )  x x Real line

Random variables (RVs) are classified as discrete or continuous depending on whether


their sample space is countable or uncountable, respectively. Discrete random variables
typically arise from counting processes such that their range is the set of natural
numbers, although a discrete RV can assume any set of discrete values on the real line,
not necessarily integers. Continuous RVs on the other hand typically arise from
measurement processes and their range is continous intervals (possibly infinite) on the
real line. It may be useful to define “mixed” random variables if they exhibit properties
of discrete and continuous RVs in different ranges.

4.1 Probability laws for RVs


A random variable is governed by its probability laws. The probability law of a RV can
be described by any of the four equivalent ways:

5
Measurable functions have been defined in Section 2.6

page 39 Last updated: 16-Jan-20


Design maintenanceand reliability of engineering systems: a probability based approach

1. CDF (cumulative distribution function)


2. PDF/PMF (probability density function for continuous rv’s, probability mass
function for discrete rv’s)
3. CF (characteristic function)
4. MGF (moment generating function)

CF and MGF are introduced after the disucssion on moments of random variables in
Section 4.2.

4.1.1 CDF – cumulative distribution function


The cumulative distribution function of the random variable X is defined as:

FX ( x)  P[ X  x] (4.1)

It starts from 0, ends at 1, and is a non-decreasing function of x. It is piecewise


continuous for discrete RVs, and continuous for continuous RVs.
Properties of CDF:

FX ()  0
FX ()  1 (4.2)
FX ( x) is a non-decreasing function of x

Thus, the probability of finding the random variable X in the semi-open interval (a,b] is:

P[a  X  b]  FX (b)  FX (a) (4.3)

4.1.2 Probability density and mass functions

4.1.2.1 PDF
The probability density function (of continuous random variables) is defined as the
derivative of the CDF:

dFX ( x )
fX  x  (4.4)
dx

so that:
x
FX ( x)  

f X (t )dt (4.5)

and,
© Baidurya Bhattacharya, IIT Kharagpur. All rights reserved.

f
a
X ( x)dx  FX (b) FX ( a ) (4.6)

PDF may also be interpreted as giving rise to the small probability of observing the
random variable X around the point x:

f X  x  dx  P  x  dx  X  x  (4.7)

4.1.2.2 PMF
The probability mass function (of discrete random variables) is defined as the probability
that the random variable assumes a particular value:

Probability mass function (pmf):


p X ( xi )  P[ X  xi ]

Cumulative distribution function (cdf):


FX ( xi )  P[ X  xi ]

so that the PMF can be derived from the CDF as:

p X ( xi )  FX ( xi )  FX ( xi   ) where   min(| xi  x j |), i  j (4.8)

If the sample space of the discrete rv X is {x1 , x2 ,...} so that xi  xi 1 , then the PMF may
be given as the height of the step in the CDF curve:

p X ( xi )  FX ( xi )  FX ( xi 1 ) (4.9)

4.1.2.3 Use of delta functions to describe pmfs as pdfs


The delta function is defined as

 ( x)  dU ( x) / dx (4.10)

where U is the unit step function:

0if x  0
U  (4.11)
1 if x  0

The delta function is symmetric:  ( x )   (  x ) . Integrating the delta function from a to


b, if a  0  b , gives unity. More generally:

page 41 Last updated: 16-Jan-20


Design maintenanceand reliability of engineering systems: a probability based approach

b
g (0)    ( x) g ( x)dx (4.12)
a

If X is a discrete RV, and pi  p X ( xi ) , we can write an equivalent pdf as

f X ( x)   pi ( x  xi ) (4.13)
i

But, in the following, we will still write formulas and expressions separately for discrete
and continuous RVs.

4.2 Expectation
The expectation of any function g ( X ) of the random variable X is defined as:


E[ g ( X )]   g ( x) f

X ( x) dx if X continuous
(4.14)
  g ( xi ) p X ( xi ) if X discrete
all xi

The expectation of a constant is the identity operator:

E  c   c where c is a constant (4.15)

Expectation is a linear operator:

E  aX  b   a E  X   b (4.16)

and if Y  g1  X   g 2  X    , then

E Y   E ( g1  X )  E  g 2  X     (4.17)

Thus the mean of X is its expectation:


  x f X ( x) dx , continuous RV

  E ( X )    (9.18)
 x p ( x ) , discrete RV
 i X i
 all
 xi

and its variance is the expecation of its squared deviation from the mean:
© Baidurya Bhattacharya, IIT Kharagpur. All rights reserved.


  ( x   ) f X ( x)dx , continuous RV
2


 2  E ( X   )2    (9.19)
 ( x   )2 p ( x ), discrete RV
 i
 all X i
 xi

Higher order raw or central moments can be defined by choosing g ( X )  X n or ( X   ) n


above as appropriate. The characteristic function and the moment generating function
can both be described as expectations of appropriate functions of X.

4.3 Moments
The kth central moment of X:

c ,k  E ( X   ) k  (4.20)

Hence, by definition,  c ,0  1 and  c ,1  0 . The variance of X is the second central


moment:

 n
 pi ( xi   ) for discrete RV
2

Variance,  2  c ,2   i 1 (4.21)
  ( x   ) 2 f ( x)dx for continuous RV
  X

The kth raw moment of X: 0, k  E  ( X ) k  . Hence, the mean of X is simply the first
raw moment:
 n
 pi xi for discrete RV
  0,1   i 1
  x f ( x)dx for continuous RV
  X

General formula for central moments (by simply expanding the binomial series6):
n
n
c ,n     0,k (  ) n  k (4.22)
k
k 0  

Similarly, the general formula for raw moments:


n
n
0,n     c ,k (  X )n  k (4.23)
kk 0  

n n n n
6
If n is an integer, the binomial series is given by: ( x  y ) n   k 0   x n  k y k   k 0   x k y n  k
k  k 

page 43 Last updated: 16-Jan-20


Design maintenanceand reliability of engineering systems: a probability based approach

In particular,
c,3  0,3  30,2   2 3
, etc.
0,3   3  3 2  c ,3

4.4 Characteristic function


PDF/PMF and the Characteristic Function, MX form a Fourier Transform (FT)
pair. In the continuous case, the characteristic function is the Fourier transform of the
pdf:

M X ( )  E[exp(i X ]  e
i x
f X ( x)dx (4.24)


such that the inverse transform is:



1
f X ( x) 
2 M

X ( )e i x d (4.25)

1
If f X is discontinous at some x1 , this equals [ f X ( x1 )  f X ( x1 )] . In Eq (4.24) E is the
2
expectation operator discussed in Section 4.2. The requirement is that fX is absolutely

integrable, i.e., | f

X ( x) | dx   , which is no problem since fX is a pdf to begin with.

In the discrete case, the FT pair is given by:

M X ( )   p X ( x j )e
i x j

all x j

 (4.26)
1
M ( )e d
 i x
p X ( x) 
2
X


Proof:

1
2 M

X ( )e i x d


1
p
i x j
 ( x j )e e  i x d
2
X
 all x j

1

i ( x j  x )
 pX ( x j )  e d (4.27)
2 all x j 

1

2
p
all x j
X ( x j )2 ( x  x j )

 p X ( x)
© Baidurya Bhattacharya, IIT Kharagpur. All rights reserved.

The characteristic function of a RV uniquely determines its probability distribution.

4.5 Moment generating function


The moment generating function (MGF) of a distribution is the expectation:

GX  s   E e  
sX
e
sX
f X ( x)dx (4.28)


It exists if the integral is finite for all s in some interval I that contains 0 in its interior
(Resnick, p. 294). If it exists, the MGF uniquely determines the distribution of X. In
comparison, the CF of a RV always exists.
MGF is infinitely differentiable and the nth derivative,

dn
n
GX ( s)   x n e sx f X ( x)dx (4.29)
ds 

evaluated at s = 0, gives the nth raw moment of X:

dn
n
GX  s  |s 0  E  X n  (4.30)
ds

If instead we perform successive derivatives on ln GX ( s ) and evaluate them at s = 0, we


get the central moments of X:
d
ln GX ( s )   X at s  0
ds
d2
2
ln GX ( s)   X2 at s  0
ds
d3
ln GX ( s)  E  X   X   at s  o
3

ds 3  
etc.
where the identity GX (0)  1 has been used.

4.5.1.1 MGF and density function


Prove that if one did not know the functional form of a given distribution, one would
need moments of all orders to completely specify the distribution.
Proof: Expand the MGF in its Taylor series around the origin:
 
sn d n sn
GX  s    n
GX  s  |s 0   E  X n  (4.31)
n  0 n ! ds n 0 n !

where Eq 4.20 has been used. This is valid if all moments are finite and the series
converges absolutely near s = 0. Since fX can be determined in terms of GX, Eq 4.21

page 45 Last updated: 16-Jan-20


Design maintenanceand reliability of engineering systems: a probability based approach

shows that all moments of X are required for the complete specification of fX under the
conditions stated above.

4.5.1.2 Cramer’s Theorem:


Let X 1 and X 2 be r.v.s with PDFs f X1 and f X 2
and MGFs GX1 and GX 2
If G X1  G X 2
then f X1  f X 2

4.5.2 Moments from Characteristic function


The mean can be obtained as:

1  dM X 
E( X )  X    (4.32)
i  d  0

which can also be obtained using log transformation:

1 d
E( X )   ln M X ( )  0 (4.33)
i d

Higher moments are given by:

1  d nM X 
EX   n 
n
 (4.34)
i  d n  0

and in particular, the variance can be expressed through the log transformation as:

1 d2
var( X )   ln M X (0)  (4.35)
i 2 d 2

4.6 Basic properties of random variables

4.6.1 Markov’s inequality


If f X ( x)  0 for x  0 ,

1
P  X     ,  0 (4.36)

4.6.2 Chebyshev’s inequality

1
P  X       ,  0 (4.37)
2
© Baidurya Bhattacharya, IIT Kharagpur. All rights reserved.

4.6.2.1 Example involving a discrete RV


Take the Bernoulli RV X  p  for which P  X  0  1  p, P  X  1  p so that its mean
is p and variance is pq. Chebyshev’s inequality states:

1
P | X  p |  pq   2 for any   0 (4.38)

We prove it by considering three different ranges for X.

Case (1): X  p
The LHS of (4.38) =
 p if p   pq  1 Case 1a
P  X  p   pq   
0 if p   pq  1 Case 1b

Case (1a)

1
LHS  p and we need to prove p 
2
It is given that, p   pq  1
1 p q q
   
pq pq p
q
 2 
p
q
p
2
1
Since q  1, we have p  ... proved
2

Case (1b)

1
LHS  0 and we need to prove 0 
2
It is given that, p   pq  1
1 p q q
   
pq pq p
q
 2 
p
1
Since  2 is a positive quantity, we have 0  ... proved
2

page 47 Last updated: 16-Jan-20


Design maintenanceand reliability of engineering systems: a probability based approach

Case (2): 0  X  p

LHS =
P  p  X   pq   P  X  p   pq 
0 if p   pq Case 2a

 q if p   pq Case 2b

Case (2a)
1
LHS  0 and we need to prove 0 
2
It is given that p   pq
p
 2
q
1
Since  2 is a positive quantity, we have 0  ... proved
2
Case (2b)
1
LHS  q and we need to prove q 
2
It is given that p   pq
p p
 2  2  q
q 
1
Since p  1, we have q  ... proved
2

Case (3)
X 0
1
LHS  P  X  p   pq   0  2

 p   pq
p
 2
q
1
i.e. 2  0 if  2  p
 q

Proved.

4.6.3 Bienyame Inequality


n
E[ X  a ]
P X a   
n n
,  0 (4.39)
  n
© Baidurya Bhattacharya, IIT Kharagpur. All rights reserved.

n=2 reduces to Chebyshev’s inequality.

4.6.4 Lyapunov Inequality


k
Let  k  E[ X ]   where k => 1 is any integer, then

 k1/(1k 1)   k1/ k (4.40)

Proof
k 1 k 1
Let T  a X 2  X 2

k 1 k k 1
T 2  a2 X  2a X  X
E T 2   a 2  k 1  2a k   k 1  0

Dividing by  k 1 we get,

k  k2  k 1  k2
a  2a
2
   0
 k 1  k21  k 1  k21
k
Now choose, a   which yields
 k 1

 k 1  k2
 2  0   k 1 k 1   k2 (4.41)
 k 1  k 1

Set k=1,2,3 etc.:


1
k  1   0  2  12 ,  0  1 1   2 2
1 3 1 1
k  2  13   22   2 2 3   22   2 2  3   2 2  3 3
2 4 1 1
k  3   2  4  32  3 3  4  32  3 3   4  3 3   4 4

We see that proposition 4.40 is true for k = 1,2,3. Let us claim it is true for some k - 1:
i.e.,
1 1
 k k2 2   k k11 (4.42)

is true. We can rewrite 4.41 for k – 1 as:

page 49 Last updated: 16-Jan-20


Design maintenanceand reliability of engineering systems: a probability based approach

 k  2  k   k21
k 2
or,  k 1 k 1 k   k21
 k 2  2 k 2k  2
k (4.43)
or,  k    k 1 
2  
 k 1    k 1 k 1   k k11
1 1
  k k   k k11

Thus we prove (4.40) by induction: if (4.40) is true for k – 1, then it must be true for k.
Since we have proved (4.40) is true for k =3, it must be true for 4, 5, ….
Proved.
1 1 1 1 1 1
Since  n n   n n11   n n22   m m  ...  3 3   2 2  1  1 and for n  m, an equivalent
way of writing Lyapunov inequality is:

1 1
 n n   m m   nm   mn
(4.44)

You might also like