You are on page 1of 9

1

Module 3: Random Variables


Lecture 3: CDF and Descriptors of Random Variables
Cumulative Distribution Functions (CDF)
For a discrete or continuous random variable, the Cumulative Distribution Function, abbreviated
as CDF and denoted by F
x
(x), is the nonexceedance probability of X and its range is between 0
and 1.
( ) ( ) x X P x F
X
s =
Sometimes, CDF for discrete random variable is denoted as P
x
(x). This notation is followed for
this course.
Properties of Cumulative Distribution Function (CDF)
The properties of CDF are
- F
x
(x), or P
x
(x). in case of discrete RVs) is bounded by 0 and 1 i.e, ( ) 1 0 s s x F
X

- F
x
(x) (or P
x
(x) in case of discrete RVs) is a monotonic function which increases for
increasing values of x.

Cumulative Distribution Function (CDF) of a discrete random variable
For a discrete random variable, the CDF P
x
(x) is obtained by summing over values of the PMF.
The CDF P
x
(x) is the sum of the probabilities of all possible values of X that are less than or
equal to the argument x.
( ) ( )

=
x than less X all for
X X
x p x P
(1)
CDF of a continuous random variable
For a continuous random variable X, the value of CDF at x is obtained by the integration of pdf
from minus infinity to x
( ) ( )
}

=
x
X X
dx x f x F
(2)
Hence,
2

( )
( ) x f
dx
x dF
X
X
= (3)
Example problems of CDF
Problem 1. The time between two successive events of rail accidents can be expressed as

( ) < s =

x for e x f
x
X
0


where is a parameter estimated as 0.2. Find the probability of the time between two successive
events of rail accidents exceeding 10 units.
Solution
The CDF is given by
( ) ( ) < s = = =


} }
x for e dx e dx x f x F
x
x
x
x
X X
0 1
0



The probability of the time between two successive events of rail accidents exceeding 10 units is
( ) ( )
( )
135 . 0
1 1
10 1 10
2
10
= =
=
= >

e
e
F X P
X

Prob 2. Assume that daily rainfall at a raingauge station follow the given distribution
( )
elsewhere
x for ce
x for x f
x
X
0
0
0 4 . 0
4 /
=
> =
= =


(a)Findoutc.
(b)Whatisprobabilityofdailyrainfallexceeding10cm?
Solution
(a) From the axioms of probability, the given distribution to be a valid pdf, it must be
3

( )
15 . 0 ,
6 . 0 25 . 0
1 0 ,
6 . 0
25 . 0
,
6 . 0 ,
1 4 . 0 ,
1
0
25 . 0
0
25 . 0
0
25 . 0
=

=
=
(

=
= +
=


}
}
}
c or
c
or
c
e
or
dx e c or
dx ce or
dx x f
x
x
x
X

Formulation of CDF,
( ) ( ) 4 . 0 0 , 0 = = = = X P x F x For
X

( ) ( )
( )
x
x
x
x
x
x
X
e
e
e
dx e X P x F x For
25 . 0
25 . 0
0
25 . 0
0
25 . 0
6 . 0 1
1 6 . 0 4 . 0
25 . 0
15 . 0 4 . 0
15 . 0 4 . 0 0 , 0

=
=
(

+ =
+ = s = >
}

Thus CDF can be expressed as
( )
elsewhere
x for e
x for x F
x
X
0
0 6 . 0 1
0 4 . 0
25 . 0
=
> =
= =

(b) The probability of daily rainfall not exceeding 10 cm is given by


( ) ( ) 9507 . 0 6 . 0 1 10 10
10 25 . 0
= = = s

e F X P
X

So, the probability of daily rainfall exceeding 10 cm is =(1- 0.9507) =0.0493


4

Probabilistic Description of Random Variables


The probabilistic characteristics of random variables can be described completely if:
(a) The form of the distribution function (pdf or pmf) is known
(b) The associated parameters are specified
However, in many cases, the nature of the distribution function of the random variable may not
be known. In such cases, an approximate description becomes necessary. The approximate
description of probabilistic characteristics of random variables can be given in terms of the main
descriptors of the random variables.
Main Descriptors of Random Variables
The main descriptors of random variables are the following.
- Measure of Central Tendency
- Measure of Dispersion
- Measure of Skewness
- Measure of Peakedness
Central Value of a Random Variable
Within the range of the possible values of a random variable, the different values are associated
with different probability densities. So the central value cannot generally be expressed in terms
of the midpoint of the possible range. The central value of the random variable can be expressed
in terms of three quantities:
- Mean or Expected Value
- Mode
- Median
Mean or Expected Value of a Random Variable
The mean value or expected value of a random variable is the weighted average of the different
values of the random variable based on their associated probabilities.
- For a discrete random variable X with pmf p
x
(x
i
), the expected value is
5

| | ( )

=
i
x all
i X i
x p x X E
(4)
- For a continuous random variable X with pdf f
x
(x) the expected value is
| | ( )
}


= dx x f x X E
X
(5)
Expected value for a function
If g(X) is a function of the random variable X, then the expected value of that function is
expressed as
( ) | | ( ) ( )

=
i
x all
i X i
x p x g X g E , when X is discrete (6)
and, the same is
( ) | | ( ) ( )
}


= dx x f x g X g E
X
, when X is continuous (7)

Sample estimate of mean
If the number of observations for a sample is n and x
i
for i = 1,2., n are the observed values,
then the sample mean is given by

=
=
n
i
i
x
n
X
1
1
(8)
Mode of a Random Variable
The mode is the most probable value of a random variable. It is the value of the random variable
with the highest probability density. In other words, it is the value of the random variable at
which the pdf attains its peak value.
Median of a Random Variable
The median is the value of the random variable at which, the values on both sides of it, are
equally probable. If X
m
is the median of a random variable X, then
( ) 5 . 0 =
m X
x F (9)
6

It may be noted that the mean, mode and median are each a measure of the central value of the
random variable. The mean of a random variable X is conventionally denoted by x , whereas the
mode and median are denoted by x
~
and x
m
respectively. If the pdf of a random variable is
symmetric and unimodal then the mean, mode and median coincide. Random variable having a
Gaussian distribution is one example of such cases.
DispersionofaRandomVariable
The dispersion of a random variable corresponds to how closely the values of the variate are
clustered or how widely it is spread around the central value.
In figure 8, X
1
and X
2
have the same mean but their dispersion about the mean is different.

Fig. 8 pdf of two RVs with same mean and different dispersion about the mean

Measure of Dispersion of a Random Variable
The different measures of dispersion are:
- Variance
( )
2
o
- Standard Deviation ( ) o
- Coefficient of Variation (CV)

( ) x f
X
x
7

Variance
Variance Var(X) is a measure of the dispersion of the variate taking the mean as the central
value.
For a discrete random variable X with pmf p
x
(x
i
), the variance of X is
( ) ( ) ( )

=
i
x all
i X X i
x p x X Var
2
(10)
where ( ) X E
X
=

For a continuous random variable X with pdf f
x
(x),the variance of X is expressed as
( ) ( ) ( )dx x f x X Var
X X
}


=
2
(11)
Expanding the integrand,
( ) ( ) ( )
( ) ( ) ( )
( ) ( )
( ) ( )
2 2
2 2 2
2 2
2 2
,
2 ,
2 ,
2
X
X X
X X
X X X
X E X Var or
X E X Var or
X E X E X Var or
dx x f x x X Var




=
+ =
+ =
+ =
}



Thus, variance can be expressed as
( ) ( )
2 2
X
X E X Var = (12)
Standard Deviation
Standard Deviation
X
o is expressed as the positive square root of variance. Thus,
( ) X Var
X
= o (13)



8


Coefficient of Variation
Coefficient of Variation of a random variable X, denoted as CV
x
, is a dimensionless measure of
dispersion. It is the ratio of the standard deviation to the mean. Thus,
X
X
X
CV

o
= (14)
Sample estimate of measures of dispersion
The sample estimate for variance is given by
( )

=
n
i
i
x x
n
s
1
2
2
1
1
(15)
The sample estimate for standard deviation is given by
( )
2
1
1
2
1
1
(

=

=
n
i
i
x x
n
s (16)
The sample estimate for coefficient of variation is given by
x
s
CV
s
=
Example problem
Prob 3. The time between two successive rail accidents can be described with an exponential pdf
( )
0 0
0
< =
> =

t for
t for e t f
t
T


Find the mean, mode, median and the coefficient of variation for the distribution.
Soln.
The mean time between successive events of rail accidents is given by
| |
}

= =
0
dt e t T E
t
T


Integrating by parts, we get,

1
=
T

9

Thus, mean of t is

1
= =
T
t
From the pdf it can be observed that the probability density is highest at t = 0.
Thus, the mode is 0
~
= t
The median can be obtained from the expression
( )


693 . 0 5 . 0 ln
,
5 . 0
0
=

=
=
}

m
t
t
t or
dt e
m

Therefore median is
T m
t 693 . 0 =
The variance of T is
}

|
.
|

\
|
=
0
2
2
1
dt e t
t
T

o
Integrating by parts, we get,
2
2
1

o =
T

Thus, the standard deviation is given by

o
1
=
T

The coefficient of variation of the exponential distribution is 1 = = =

o
T
T
T
CV