Chapter 4 Random Variables

© Baidurya Bhattacharya, IIT Kharagpur. All rights reserved.
CHAPTER 4. RANDOM VARIABLES

When the possible outcomes of an experiment (or trial) can be given in numerical terms,
then we have a random variable in hand. When an experiment is performed, the outcome
of the random variable is called a “realization.” A random variable can be either discreet,
or continuous. A random variable is governed by its probability laws.
If a quantity varies randomly with time, we model it as a stochastic process. A stochastic
process can be viewed as a family of random variables.
If a quantity varies randomly in space, we model it as a random field, which is the
generalization of a stochastic process in two or more dimensions.
Formally, a measurable function5 defined on a sample space is called a random variable
(Feller, vol 1, p. 212). That is, X is a random variable if X  X ( ) is a function defined
on the sample space  , and for every real x, the set
 : X ( )  x
is an event in  . Thus we confine ourselves to  -algebra of events of the type X  x .
Unless explicitly required, we suppress the argument  when referring to a random
variable in the rest of this text.


-∞  : X ( )  x x Real line
Random variables (RVs) are classified as discrete or continuous depending on whether

their sample space is countable or uncountable, respectively. Discrete random variables
typically arise from counting processes such that their range is the set of natural
numbers, although a discrete RV can assume any set of discrete values on the real line,
not necessarily integers. Continuous RVs on the other hand typically arise from
measurement processes and their range is continous intervals (possibly infinite) on the
real line. It may be useful to define “mixed” random variables if they exhibit properties
of discrete and continuous RVs in different ranges.
4.1 Probability laws for RVs

A random variable is governed by its probability laws. The probability law of a RV can
be described by any of the four equivalent ways:
5
Measurable functions have been defined in Section 2.6
page 39 Last updated: 16-Jan-20

Design maintenanceand reliability of engineering systems: a probability based approach
1. CDF (cumulative distribution function)

2. PDF/PMF (probability density function for continuous rv’s, probability mass
function for discrete rv’s)
3. CF (characteristic function)
4. MGF (moment generating function)
CF and MGF are introduced after the disucssion on moments of random variables in
Section 4.2.
4.1.1 CDF – cumulative distribution function

The cumulative distribution function of the random variable X is defined as:
FX ( x)  P[ X  x] (4.1)
It starts from 0, ends at 1, and is a non-decreasing function of x. It is piecewise

continuous for discrete RVs, and continuous for continuous RVs.
Properties of CDF:
FX ()  0
FX ()  1 (4.2)
FX ( x) is a non-decreasing function of x
Thus, the probability of finding the random variable X in the semi-open interval (a,b] is:
P[a  X  b]  FX (b)  FX (a) (4.3)
4.1.2 Probability density and mass functions
4.1.2.1 PDF
The probability density function (of continuous random variables) is defined as the
derivative of the CDF:
dFX ( x )
fX  x  (4.4)
dx
so that:
x
FX ( x)  

f X (t )dt (4.5)
and,
f
a
X ( x)dx  FX (b) FX ( a ) (4.6)
PDF may also be interpreted as giving rise to the small probability of observing the
random variable X around the point x:
f X  x  dx  P  x  dx  X  x  (4.7)
4.1.2.2 PMF
The probability mass function (of discrete random variables) is defined as the probability
that the random variable assumes a particular value:
Probability mass function (pmf):

p X ( xi )  P[ X  xi ]
Cumulative distribution function (cdf):

FX ( xi )  P[ X  xi ]
so that the PMF can be derived from the CDF as:
p X ( xi )  FX ( xi )  FX ( xi   ) where   min(| xi  x j |), i  j (4.8)
If the sample space of the discrete rv X is {x1 , x2 ,...} so that xi  xi 1 , then the PMF may
be given as the height of the step in the CDF curve:
p X ( xi )  FX ( xi )  FX ( xi 1 ) (4.9)
4.1.2.3 Use of delta functions to describe pmfs as pdfs

The delta function is defined as
 ( x)  dU ( x) / dx (4.10)
where U is the unit step function:
0if x  0
U  (4.11)
1 if x  0
The delta function is symmetric:  ( x )   (  x ) . Integrating the delta function from a to

b, if a  0  b , gives unity. More generally:

b
g (0)    ( x) g ( x)dx (4.12)
a
If X is a discrete RV, and pi  p X ( xi ) , we can write an equivalent pdf as
f X ( x)   pi ( x  xi ) (4.13)
i
But, in the following, we will still write formulas and expressions separately for discrete
and continuous RVs.
4.2 Expectation
The expectation of any function g ( X ) of the random variable X is defined as:

E[ g ( X )]   g ( x) f

X ( x) dx if X continuous
(4.14)
  g ( xi ) p X ( xi ) if X discrete
all xi
The expectation of a constant is the identity operator:
E  c   c where c is a constant (4.15)
Expectation is a linear operator:
E  aX  b   a E  X   b (4.16)
and if Y  g1  X   g 2  X    , then
E Y   E ( g1  X )  E  g 2  X     (4.17)
Thus the mean of X is its expectation:

  x f X ( x) dx , continuous RV

  E ( X )    (9.18)
 x p ( x ) , discrete RV
 i X i
 all
 xi
and its variance is the expecation of its squared deviation from the mean:

  ( x   ) f X ( x)dx , continuous RV
2

 2  E ( X   )2    (9.19)
 ( x   )2 p ( x ), discrete RV
 i
 all X i
 xi
Higher order raw or central moments can be defined by choosing g ( X )  X n or ( X   ) n

above as appropriate. The characteristic function and the moment generating function
can both be described as expectations of appropriate functions of X.
4.3 Moments
The kth central moment of X:
c ,k  E ( X   ) k  (4.20)
Hence, by definition,  c ,0  1 and  c ,1  0 . The variance of X is the second central

moment:
 n
 pi ( xi   ) for discrete RV
2
Variance,  2  c ,2   i 1 (4.21)
  ( x   ) 2 f ( x)dx for continuous RV
  X
The kth raw moment of X: 0, k  E  ( X ) k  . Hence, the mean of X is simply the first
raw moment:
 n
 pi xi for discrete RV
  0,1   i 1
  x f ( x)dx for continuous RV
  X
General formula for central moments (by simply expanding the binomial series6):
n
n
c ,n     0,k (  ) n  k (4.22)
k
k 0  
Similarly, the general formula for raw moments:

n
n
0,n     c ,k (  X )n  k (4.23)
kk 0  
n n n n
6
If n is an integer, the binomial series is given by: ( x  y ) n   k 0   x n  k y k   k 0   x k y n  k
k  k 

In particular,
c,3  0,3  30,2   2 3
, etc.
0,3   3  3 2  c ,3
4.4 Characteristic function

PDF/PMF and the Characteristic Function, MX form a Fourier Transform (FT)
pair. In the continuous case, the characteristic function is the Fourier transform of the
pdf:

M X ( )  E[exp(i X ]  e
i x
f X ( x)dx (4.24)

such that the inverse transform is:


1
f X ( x) 
2 M

X ( )e i x d (4.25)
1
If f X is discontinous at some x1 , this equals [ f X ( x1 )  f X ( x1 )] . In Eq (4.24) E is the
2
expectation operator discussed in Section 4.2. The requirement is that fX is absolutely

integrable, i.e., | f

X ( x) | dx   , which is no problem since fX is a pdf to begin with.
In the discrete case, the FT pair is given by:
M X ( )   p X ( x j )e
i x j
all x j
 (4.26)
1
M ( )e d
 i x
p X ( x) 
2
X

Proof:

1
2 M

X ( )e i x d

1
p
i x j
 ( x j )e e  i x d
2
X
 all x j

1

i ( x j  x )
 pX ( x j )  e d (4.27)
2 all x j 
1

2
p
all x j
X ( x j )2 ( x  x j )
 p X ( x)
The characteristic function of a RV uniquely determines its probability distribution.
4.5 Moment generating function

The moment generating function (MGF) of a distribution is the expectation:

GX  s   E e  
sX
e
sX
f X ( x)dx (4.28)

It exists if the integral is finite for all s in some interval I that contains 0 in its interior
(Resnick, p. 294). If it exists, the MGF uniquely determines the distribution of X. In
comparison, the CF of a RV always exists.
MGF is infinitely differentiable and the nth derivative,

dn
n
GX ( s)   x n e sx f X ( x)dx (4.29)
ds 
evaluated at s = 0, gives the nth raw moment of X:
dn
n
GX  s  |s 0  E  X n  (4.30)
ds
If instead we perform successive derivatives on ln GX ( s ) and evaluate them at s = 0, we

get the central moments of X:
d
ln GX ( s )   X at s  0
ds
d2
2
ln GX ( s)   X2 at s  0
ds
d3
ln GX ( s)  E  X   X   at s  o
3
ds 3  
etc.
where the identity GX (0)  1 has been used.
4.5.1.1 MGF and density function

Prove that if one did not know the functional form of a given distribution, one would
need moments of all orders to completely specify the distribution.
Proof: Expand the MGF in its Taylor series around the origin:
 
sn d n sn
GX  s    n
GX  s  |s 0   E  X n  (4.31)
n  0 n ! ds n 0 n !
where Eq 4.20 has been used. This is valid if all moments are finite and the series
converges absolutely near s = 0. Since fX can be determined in terms of GX, Eq 4.21

shows that all moments of X are required for the complete specification of fX under the
conditions stated above.
4.5.1.2 Cramer’s Theorem:

Let X 1 and X 2 be r.v.s with PDFs f X1 and f X 2
and MGFs GX1 and GX 2
If G X1  G X 2
then f X1  f X 2
4.5.2 Moments from Characteristic function

The mean can be obtained as:
1  dM X 
E( X )  X    (4.32)
i  d  0
which can also be obtained using log transformation:
1 d
E( X )   ln M X ( )  0 (4.33)
i d
Higher moments are given by:
1  d nM X 
EX   n 
n
 (4.34)
i  d n  0
and in particular, the variance can be expressed through the log transformation as:
1 d2
var( X )   ln M X (0)  (4.35)
i 2 d 2
4.6 Basic properties of random variables
4.6.1 Markov’s inequality

If f X ( x)  0 for x  0 ,
1
P  X     ,  0 (4.36)

4.6.2 Chebyshev’s inequality
1
P  X       ,  0 (4.37)
2
4.6.2.1 Example involving a discrete RV

Take the Bernoulli RV X  p  for which P  X  0  1  p, P  X  1  p so that its mean
is p and variance is pq. Chebyshev’s inequality states:
1
P | X  p |  pq   2 for any   0 (4.38)

We prove it by considering three different ranges for X.
Case (1): X  p
The LHS of (4.38) =
 p if p   pq  1 Case 1a
P  X  p   pq   
0 if p   pq  1 Case 1b
Case (1a)
1
LHS  p and we need to prove p 
2
It is given that, p   pq  1
1 p q q
   
pq pq p
q
 2 
p
q
p
2
1
Since q  1, we have p  ... proved
2
Case (1b)
1
LHS  0 and we need to prove 0 
2
It is given that, p   pq  1
1 p q q
   
pq pq p
q
 2 
p
1
Since  2 is a positive quantity, we have 0  ... proved
2

Case (2): 0  X  p
LHS =
P  p  X   pq   P  X  p   pq 
0 if p   pq Case 2a

 q if p   pq Case 2b
Case (2a)
1
LHS  0 and we need to prove 0 
2
It is given that p   pq
p
 2
q
1
Since  2 is a positive quantity, we have 0  ... proved
2
Case (2b)
1
LHS  q and we need to prove q 
2
It is given that p   pq
p p
 2  2  q
q 
1
Since p  1, we have q  ... proved
2
Case (3)
X 0
1
LHS  P  X  p   pq   0  2

 p   pq
p
 2
q
1
i.e. 2  0 if  2  p
 q
Proved.
4.6.3 Bienyame Inequality

n
E[ X  a ]
P X a   
n n
,  0 (4.39)
  n
n=2 reduces to Chebyshev’s inequality.
4.6.4 Lyapunov Inequality

k
Let  k  E[ X ]   where k => 1 is any integer, then
 k1/(1k 1)   k1/ k (4.40)
Proof
k 1 k 1
Let T  a X 2  X 2
k 1 k k 1
T 2  a2 X  2a X  X
E T 2   a 2  k 1  2a k   k 1  0
Dividing by  k 1 we get,
k  k2  k 1  k2
a  2a
2
   0
 k 1  k21  k 1  k21
k
Now choose, a   which yields
 k 1
 k 1  k2
 2  0   k 1 k 1   k2 (4.41)
 k 1  k 1
Set k=1,2,3 etc.:

1
k  1   0  2  12 ,  0  1 1   2 2
1 3 1 1
k  2  13   22   2 2 3   22   2 2  3   2 2  3 3
2 4 1 1
k  3   2  4  32  3 3  4  32  3 3   4  3 3   4 4
We see that proposition 4.40 is true for k = 1,2,3. Let us claim it is true for some k - 1:
i.e.,
1 1
 k k2 2   k k11 (4.42)
is true. We can rewrite 4.41 for k – 1 as:

 k  2  k   k21
k 2
or,  k 1 k 1 k   k21
 k 2  2 k 2k  2
k (4.43)
or,  k    k 1 
2  
 k 1    k 1 k 1   k k11
1 1
  k k   k k11
Thus we prove (4.40) by induction: if (4.40) is true for k – 1, then it must be true for k.
Since we have proved (4.40) is true for k =3, it must be true for 4, 5, ….
Proved.
1 1 1 1 1 1
Since  n n   n n11   n n22   m m  ...  3 3   2 2  1  1 and for n  m, an equivalent
way of writing Lyapunov inequality is:
1 1
 n n   m m   nm   mn
(4.44)

Chapter 4 Random Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 Random Variables

Uploaded by

Copyright:

Available Formats

© Baidurya Bhattacharya, IIT Kharagpur. All rights reserved.

CHAPTER 4. RANDOM VARIABLES

Random variables (RVs) are classified as discrete or continuous depending on whether

4.1 Probability laws for RVs

page 39 Last updated: 16-Jan-20

1. CDF (cumulative distribution function)

4.1.1 CDF – cumulative distribution function

It starts from 0, ends at 1, and is a non-decreasing function of x. It is piecewise

P[a  X  b]  FX (b)  FX (a) (4.3)

4.1.2 Probability density and mass functions

Probability mass function (pmf):

Cumulative distribution function (cdf):

so that the PMF can be derived from the CDF as:

p X ( xi )  FX ( xi )  FX ( xi   ) where   min(| xi  x j |), i  j (4.8)

4.1.2.3 Use of delta functions to describe pmfs as pdfs

where U is the unit step function:

The delta function is symmetric:  ( x )   (  x ) . Integrating the delta function from a to

page 41 Last updated: 16-Jan-20

If X is a discrete RV, and pi  p X ( xi ) , we can write an equivalent pdf as

The expectation of a constant is the identity operator:

E  c   c where c is a constant (4.15)

Expectation is a linear operator:

Thus the mean of X is its expectation:

Higher order raw or central moments can be defined by choosing g ( X )  X n or ( X   ) n

Hence, by definition,  c ,0  1 and  c ,1  0 . The variance of X is the second central

Similarly, the general formula for raw moments:

page 43 Last updated: 16-Jan-20

4.4 Characteristic function

such that the inverse transform is:

In the discrete case, the FT pair is given by:

The characteristic function of a RV uniquely determines its probability distribution.

4.5 Moment generating function

evaluated at s = 0, gives the nth raw moment of X:

If instead we perform successive derivatives on ln GX ( s ) and evaluate them at s = 0, we

4.5.1.1 MGF and density function

page 45 Last updated: 16-Jan-20

4.5.1.2 Cramer’s Theorem:

4.5.2 Moments from Characteristic function

which can also be obtained using log transformation:

Higher moments are given by:

4.6 Basic properties of random variables

4.6.1 Markov’s inequality

4.6.2 Chebyshev’s inequality

4.6.2.1 Example involving a discrete RV

We prove it by considering three different ranges for X.

page 47 Last updated: 16-Jan-20

4.6.3 Bienyame Inequality

n=2 reduces to Chebyshev’s inequality.

4.6.4 Lyapunov Inequality

 k1/(1k 1)   k1/ k (4.40)

Set k=1,2,3 etc.:

is true. We can rewrite 4.41 for k – 1 as:

page 49 Last updated: 16-Jan-20

You might also like