You are on page 1of 38

Random Variables

A variable is defined as a characteristic that change or vary over time and/or for different objects
or individuals under consideration. A quantitative variable takes numerical values. It describes a
quantity that can take on numerical values in a certain range. However, a random variable asso-
ciates a probability with each possible value in the discrete case or, in the continuous case, with
each interval of values. A numerically valued variable is essentially a random variable as its val-
ues are associated with the outcomes of the experiment being measured.
In the study of random variables we are usually interested in their behavior or probability distri-
butions, that is, in the probabilities with which they take on different values.
Random Variable: For a given probability space (Ω, A , P(.)), a random variable, denoted by
X or X(.), is a real-valued or measurable function defined on the sample space Ω. The values of
a random variable are real numbers determined by the outcomes of a random experiment. The
domain of a random variable is a sample space. A random variable is denoted by uppercase let-
ters, often X, Y, or Z and their values are the corresponding lowercase letters, x, y, or z.
Consider an experiment of selecting three students from male (M) and female (F) students of the
University of Dhaka. Then the sample space is
Ω = {MMM, MMF, MFM, FMM, FFM, FMF, MFF, FFF}.
Suppose we are interested in the number of male students and let X denotes the number of male
students that occur per selection. Then a sample point will take one of the numerical values 0, 1,
2, or 3. That is, the sample point FFF takes the value 0 for X and each of the sample points
FFM, FMF, MFF takes the value 1 and so on. The real numbers 0, 1, 2, 3 are random quantities
determined by the outcomes of the experiment being conceptualized. Here each possible value of
X represents an event which is the subset of the sample space for the given experiment. Conse-
quently, X is a function defined on the sample space and so a random variable. The random vari-
able X takes on values 0, 1, 2 and 3 with respective probabilities
1
P(X = 0) = P{FFF} =
8
3
P(X = 1) = P{(MFF), (FMF), (FFM)} =
8
3
P(X = 2) = P{(MMF), (MFM), (FMM)} =
8
1
P(X = 3) = P{MMM} =
8
3
where∑ P ( X =x )=1.The random variable X thus has a probability distribution.
x=0

Example. A die is tossed once. Suppose that we will lose $1 if 1 appears, win $1 if 2 or 3 occur,
win $2 if 4 or 5 occur and win $3 if 6 appears. If X denotes the total winnings from the experi-
ment, then X is a random variable taking on possible values -1, 1, 2, and 3 with corresponding
probabilities
1 2 2 1
P(X = -1) = , P(X = 1) = , P(X = 2) = , and P(X = 3) = .
6 6 6 6
Probability Distributions of Random Variables
Quantitative random variables are classified as discrete, continuous or mixed type. A discrete
random variable X is a real valued function that can assume any of a specified finite or countable
infinite set of values. This means, the values of X are either finite or countable infinite and the
values are isolated. The variables cited in the above examples are discrete random variables.
A continuous variable can take on any numerical values (countless number of values) in a line
interval.
A random variable has a probability distribution whether it is discrete or continuous. A probabil-
ity distribution is an assignment of probabilities to each isolated value of a discrete random vari-
able or each interval of values of a continuous random variable. A discrete random variable X
can be described by a probability function ( or probability mass function) that assigns a probabil-
ity to each value in the image/range of X. A continuous random variable can be described by a
probability density function, which assigns probabilities to intervals.

Distributions of Discrete Random Variables


Probability Function: Let X be a discrete random variable. Let the possible values of X be

(finite or countable infinite) with corresponding probabilities

If the function satisfies the conditions,

i. f (x)≥ 0
ii. ∑ f ( x )= 1,

then is called the probability function (pf) of X, and the ordered pair is called the
probability distribution of X.
 The main features of a discrete probability distribution are that a probability is assigned
to each distinct value of the random variable, and the sum of the assigned probabilities
must be 1.
 A discrete probability distribution can be thought of as a relative frequency distribution
based on a very large n (large number of observations).
It is often useful to present the probability function by a graphical format. For example, if the
probability functions of X is

, this can be presented graphi-


cally as in the Figures below,
3

Fig-
ure 4.1: Bar Chart Figure 4.2: Probability Histogram

where Figure 4.1 is a bar chart, and Figure 4.2 is a probability histogram.
Example . A pair of dice is tossed once. Find the probability distribution of the sum total of face
values.
Solution: Let X be the random variable whose values are possible totals of top face values of
the two dice. Then X can assume integral values from 2 to 12. Two dice can fall in 36 different
1
ways, each with probability . The probability distribution of X is then
6
X | 2 3 4 5 6 7 8 9 10 11 12
|

where
Example. A fair coin is tossed 4 times. Find the distribution of X, where X is the number of
heads in each selection.
Solution: There are 24 = 16 sample points in the sample space. If X = number of heads, then X
can assume values 0, 1, 2, 3 or 4. The probability function of X is then given by

()
4
= x /16, x = 0, 1, 2, 3, 4

where
Example. A coin is tossed until the first head appears. Let X is the number of tosses required to
get the first head. Then X is a random variable with possible values 1, 2, 3, . . . . The probabili-
ties are evaluated as (since the sample space is Ω = {H, TH, TTH, TTTH, . . . .},
1
P(X = 1) = =2
1 1
P(X = 2) = =2 .2
1 1 1
P(X = 3) = =2 . 2 . 2 , and so forth .
4

Thus,
1 x
f (x) = ( ) , x = 1, 2, 3, . . .
2
is the pf of X and
Distributions of Continuous Random Variables

Probability Density Function. Let X be a continuous random variable and is a function


of X. If the function is such that

i. ≥0

ii. and
iii. For any a, b with - ∞ < a < b <∞ ,
b

P(a < X < b) = ∫ f ( x ) dx ,


a

then is called the probability density function (pdf) of X, and X is said to have a continuous
probability distribution. Condition iii is obvious from condition ii.

The probability is given by the shaded area of the graph under the
curve of above X-axis between x = a and x = b (Figure 4.3).

Figure 4.3: Probability of X between a and b


 For continuous X, probability of X equal to any number, say P(X = a), is always zero as
a

P(X = a) = ∫ f ( x ) dx=0.
a
 Only probability of X in a given interval exists. Thus, for a continuous random variable X,
P(c≤ X ≤ d), P(c≤ X <¿ d), P(c¿ X ≤ d ¿ and P(c¿ X <¿ d)
are all the same.
5

Example. Let { 2 x ; 0< x <1


= 0 ; elsewhere
1 1
1
Clearly, ≥ 0 ,∧∫ f ( x ) dx = ∫ 2 xdx = 1, and therefore, is a pdf. To compute P(X¿ 2 ),
0 0
we must evaluate the integral and get
1
1 2
P(X¿ ¿= ∫ 2 xdx= 1 .
2 4
0

We can also find P X< ( 12|31 < X < 23 )using the conditional probability law, which is given by
1 1
P( < X < )
( |
P X<
1 1
2 3
2
<X< =
3
3
1
2
2
P( < X < )
)
3 3
1 2
2 3
5
= ∫ 2 xdx /∫ 2 xdx = .
1 1
12
3 3

Example. A continuous random variable has the density function

{
x ;0< x 1
= 2−x ; 1 ≤ x<2
0 ;elsewhere
Show that P(0¿ X <2 ¿=1. Find P(X¿ 1.2 ¿. Also draw its graph.
Solution: Here
2 1 2
1
∫ f ( x ) dx = ∫ xdx +∫ ( 2−x ) dx= 12 + 2 = 1,
0 0 1

1 1.2
1
and P(X¿ 1.2 ¿=∫ xdx + ∫ ( 2−x ) dx = + 0.18 = 0.68.
0 1
2
The pdf is triangular in shape as can be seen in the Figure 4.4.
6

Figure 4.4: Graph of pdf


Example. Let X has the pdf
x+1
= ; 2¿ x <4 .
8
Show that P(2¿ X < 4 ¿=1. Find and
4
x+ 1
Solution: It can easily be shown that ∫ dx=1 . Also
2 8
3.5
x +1
P(X¿ 3.5 ¿=∫ dx =0.70 and
2 8
3.5
x +1
P(2.4 ¿ X <3.5 ¿=∫ dx=0.54 .
2.4 8

Example. Let k ¿ 0 be a constant, and


= {
kx (1−x ) ; 0< x <1
0 ; elsewhere

Then It follows that defines a pdf if k = 6. Then we have


0.3

P(X¿ 0.3 ¿=1−6 ∫ x ( 1−x ) dx=0.784 .


0

Distribution Function
Distribution Function: Let X be a random variable, either discrete or continuous. Then the
function F(.) is called the cumulative distribution function (cdf) or simply the distribution func-
tion (df) if
F(x) = P(X≤ x ¿
x
= ∑ f (x ), when X is discrete
−∞
x

= ∫ f ( x ) dx , when X is continuous
−∞
7

We shall refer F(x) as a cdf or df of the random variable X.


Example. Let the distribution of X is given by
X∨0 1 2
111
f ( x )∨
362
Then the distribution function of X is given by

{
0 ; x <0
1
; 0 ≤ x <1
3
F(x) =
1
; 1≤ x< 2
2
1; x≥2

The graph of F(x), as given in the Figure 4.5, is a step or jump function.

Figure 4.5: cdf

Example. If = 2x ; 0 < x < 1, then

{
0 ; x <0
2
F(x) = x ;0 ≤ x ≤1
1 ; x >1
The graph of F(x), as given in the Figure 4.6, is a non-decreasing continuous function.
8

Figure 4.6: Graph of cdf


Properties:

(i). and
(ii). F(x) is a monotone non-decreasing function; that is,
F( x 1) ≤ F(x 2 ), for x 1< x2 ,
(iii). F(x) is right continuous, that is,

−¿¿
In general, F( x )≤ F ( x ) ≤ F ¿.
(iv). For discrete X,
f ( x r ) = F( x r ) - F( x r−1),
and for continuous X,

'
F
= (x) =
(v). For discrete X, F(x) is a jump or step function. The graph of F(x), as can be seen in Figure
4.5, is made up of horizontal line segments. The function F(x) is continuous except at certain val-
ues x i of X. The magnitude of jump at x i is P(X = x i). The function F(x) has a jump at x 0 if
lim {F ( x 0 ) −F ( x 0−h ) }
P(X = x 0) = f ( x 0) = h→ 0
h> 0
+¿¿ −¿¿
= F( x 0 ) – F( x 0 ) ¿ 0 ,

x0
where is the size of the jump of at . For continuous X, there is no jump and
is a continuous function.
(vi). For continuous X,

=
9

For the discrete X, the following relations are satisfied:


For a < b,

Using the definition of F(x), these relations can easily be proved.


Example. Given that

{
0 ; x <0
x +1
= 2 ; 0 ≤ x<1
1; 1 ≤ x
1
Find P(-3 ¿X ≤ ), and P(X = 0).
2
Solution: Using the property ‘vi’ as above,
1 1
P(-3 ¿X≤ ) = F( ) – F(-3)
2 2

Also, P(X = 0) = F(0+¿ ¿) – F(0−¿¿)

Thus, there is a discontinuity in X at X = 0.


Example. Given that

{
0 ; x <0
1
2x ;0≤ x ≤
2
= 1
6−6 x ; < x ≤ 1
2
0; elsewhere

Show that defines a pdf. Find F(x) and draw the graphs of
f (x) and F(x).
Solution:
For x¿ 0 , F(x) = P(X≤0) = 0.
10

x
1
For 0 ≤x≤ , F(x) = ∫ 2 tdt = x 2.
2 0
1 x

For < x ≤ 1, F(x) = ∫ 2 xdx + ∫ (6−6 t) ¿ dt ¿ = 6x - 3 x 2 – 2.


1 2

2 1
0 2

{
0 ; x <0
1
x 2 ;0 ≤ x ≤
2
Thus, F(x) =
2 1
6 x−3 x −2; < x ≤1
2
1 ; x >1

Figure 4.7: Graph of pdf Figure 4.8: Graph of cdf

Example. If , find the cdf. Also draw its graph.


Solution: The cdf is given by
F ( x )=P ( X ≤ x )

{
0 ;x <1
1
;1 ≤ x <2
6
=
3
;2≤ x <3
6
1 ; 3≤ x

The graph of cdf is as given below.


11

Figure 4.9: cdf


12

Mixed Distributions and Decomposition Theorem


Most of the variables are either absolutely discrete or continuous. However, we may encounter
the situations where the random variables assume certain distinct values, say x 1, x 2, . . . x n with
positive probabilities, and also assume values in some interval. That means, X is partly discrete
and partly continuous. Such a random variable has mixed type distribution.
Theorem 4.1: A distribution function F(.) is called mixed if it can be written as a linear combi-
nation of two distribution functions, that is,
F(x) = α 1 F d (x) + α 2 F c(x),
where F d is the cdf of a discrete random variable and F c is a continuous cdf. The constants
α 1 ,∧α 2 are such that α 1+ α 2=1.
This theorem is known as Lévesque Decomposition Theorem on Distribution Function. We pro-
vide a proof of this as follows:
Proof: Since F(x) is a non-decreasing bounded function, it can have almost a countable infinite
x
number of discontinuity points x i, i = 1, 2, 3, . . . . Let F d(x) = ∑ F ( x i), the total jump up to x in
¿

1
F(x), which is due to discontinuities alone, and let
¿
Fd
(1) = F(x) - (x) .
¿
Then F (x) represents the residual probability corresponding to the continuous part of F(x). Let
c

∑ f (x i) = α and F ¿ ( ∞) = α α , α > 0 ,
1 c 2, 1 2
all x i ∈ F

α 1+ α 2
Since = 1, it follows that = 1.
Define
F ¿d ( x) F ¿c ( x)
(2) Fd( x ) = and F c ( x ) = .
α1 α2
Then F d and F c are distribution functions, the first being discrete and the second continuous.
From (1) and (2), we have
F(x) = α 1 F d (x )+ α 2 F c ( x), ∑ αi = 1 .
If X is discrete, α 1=1 , α 2 = 0. If X is continuous, then α 1=0∧α 2=1.It can be shown that the de-
composition is unique.
Example. Given that
13

{
x2
;0 ≤ x<1
2
3
;1≤ x <2
F(x) = 4
x +1
; 2 ≤ x <3
4
1; x ≥3
1
Find P(X = ), P(X = 1), P(X ¿1), P(X ≤ 1), and P(X ¿ 2),
2
1 1 3 1 3
(Ans.: 0, , , , , ).
4 2 4 4 4
Truncated Distribution:
Any distribution may be truncated at some point(s). The domain of the random variable X may
be truncated due to various reasons. For example, sometimes some data are not available, the un-
derlying variable cannot be observed in a part or parts of its range, the removed values are theo-
retically meaningless under the model or practically unobservable. Then the distribution of X be-
comes truncated.
Suppose that the admissible values of the discrete random variable X are -∞,…,a, a+1, a+2,
…,b, b+1, b+2, … , ∞. But in a given situation, the observable values of X are a, a+1, … ,b.
Then the variable X and hence its distribution is truncated below ‘a’ and above ‘b’. Truncation
by exclusion of values less than ‘a’ is called the truncation from below or left-truncation. Trun-
cation by exclusion of values greater than ‘b’ is called truncation from above or right-truncation.
If the values of the variable X are not available at the two end points ‘a’ and ‘b’, the distribution
is called doubly truncated, and ‘a’ and ‘b’ are called the truncation points. A distribution may ei-
ther be doubly truncated or truncated below or above a point.
If f (x) is the pf(pdf) of a random variable X, the pf (pdf) of doubly truncated distribution, trun-
cated below ‘a’ and above ‘b’, is given by

if X is discrete. If X is continuous, it is given by

If X is truncated below ‘a’ (only),

if X is discrete

= , if X is continuous.
Example. Suppose the pf of a discrete distribution is given by
,∞
14

where 0 < p < 1, and q = 1 – p. Then the zero-truncated (or truncated below 1) distribution is
given by

If the distribution is truncated at x = 1 (or below 2), the pf of the truncated distribution is given
by
15

Compound Distribution:
A compound probability distribution is the probability distribution that results from assuming
that a random variable is distributed according to some distribution with some unknown parame-
ter that is again distributed according to some other distribution.
Let the random variable X have a distribution f(x ; λ) with parameter λ. If the parameter λ is also
a random variable with the distribution f (λ), then the distribution of X is called a compound
probability distribution.
Example. Let
e−λ λ x
f (x: λ) = ; x = 0, 1, 2, …
x!
Suppose that λ follows the distribution
v
a v−1 −aλ
f (λ) = λ e ; λ, a > 0.
Г ( v)
Then the joint probability
P[X = x, λ in the interval dλ] = P[X = x|λ]. P[ λ in dλ]
v
e−λ λ x a v−1 −aλ
= λ e dλ.
x ! Г ( v)
Hence,
∞ −λ x
e λ a v v−1 −aλ
P(X = x) = f (x) = ∫ λ e dλ .
0 x ! Г (v )
v ∞
a
=
Г (v ) x ! 0
∫ e−λ(a+1) λ v+ x−1 dλ
av Г (v + x) ( v+ x −1 ) !
= v +x = ¿
Г ( v ) x ! (a+ 1) x
(1+a) x ! ( v−1 ) !
= ( x )
v+ x−1 pv q x
,
a
where p = 1+ a and q = 1 – p. This is the negative binomial distribution.

Jointly Distributed Random Variables


A random variable is a real-valued function defined on a sample space. In our study of random
variables, we have, so far, restricted our discussions to one-dimensional sample space, in that
outcome of the experiment is recorded as values assumed by a single random variable. However
many different random variables can be defined on the same sample space. For example, we
might study the heights (X) and the weights (Y) of randomly selected adults, giving rise to a two-
dimensional sample space consisting of outcomes (x, y). We might be interested in the totals (X),
products (Y) and the differences (Z) of face values in rolling a pair of dice, giving rise to a three-
dimensional random variable with values (x, y, z).
16

In this section, we shall focus our discussions mainly on two-dimensional (or bivariate) random
variables. Multivariate case will be mentioned occasionally during the discussions. Here also the
variables could be assumed discrete or continuous.
If X and Y are two discrete random variables, the probability distribution for their simultaneous
occurrence can be represented by a function f (x, y) of the random variables X and Y, given by

= P(X = x, Y = y),

where is the probability of joint occurrence of x, and y. Any joint function of two
variables may or may not be joint probability function. The function will be a joint prob-
ability function if it satisfies certain conditions.

Joint Probability function: Let X and Y are two discrete random variables and = P(X
= x, Y = y) be a function of X and Y. Then is a joint probability function (jpf) if it satis-
fies the conditions,

¿ 0 , for all pairs (x, y)

∑ ∑ f ( x , y) = 1,
x y

where the double summations extend over all pairs (x,y), and then X and Y are said to have a
joint probability distribution.
If we have K discrete random variables X1, X2, …, XK, then their joint function f ( x 1 , x 2 , … , x K ¿
will define a joint pf if
f ( x 1 , x 2 , … , x K ¿=P( X 1 =x1 , X 2=x 2 , … , X K =x K ) ≥ 0

and∑ ∑ ∑ f (x 1 , x 2 ,… , x K ) = 1.
x x … x
1 2 K

The joint distribution of two variables is called bivariate distribution and the distribution is multi-
variate distribution if more than two variables are involved in the pf.
Example. Consider the joint probability function (jpf)

; x = 1, 2, 3 and y = 1, 2, 3
where k is a constant.

Evaluating the double summation∑ ∑ f ( x , y), and equating the sum equal to 1, it can be seen
x y
1
that k = . That means,
36
1
= 36 xy ; x = 1, 2, 3 ; y = 1, 2, 3

is a joint probability function and defines a joint distribution of X and Y.


17

The joint probability distribution of two continuous variables can also be defined in the similar
fashion.
Joint Probability Density Function: Let X and Y are two continuous random variables defined
on a same sample space. Then the joint function is called a joint probability density
function (pdf) if it satisfies the conditions

≥ 0, for all x and y

and (X, Y) is then said to have a joint distribution.


Example. Consider the joint pdf of X and Y,
3
= 5 x ( y + x) ; 0 ¿ x <1, 0 ¿ 2.

1
Find P(0 ¿ X < , 1<Y <2).
2
Solution: We find that

=
Joint Distribution Function: If X and Y are jointly distributed random variables, either dis-
crete or continuous, then the function defined by

= if X and Y are discrete

= if X and Y are continuous


is called the joint cumulative distribution function (cdf) or simply the joint distribution function
of of X and Y. This is also called the bivariate distribution function. This can easily be extended
for more than two variables.

The bivariate distribution function possesses the following properties:


18

if a < b and c < d


2

= F(x, y), if X and Y are continuous.
∂x ∂ y
For any given numbers a < b and c < d,
(4.11) P(a < X < b, c < Y < d)
= P(a < X < b, Y < d) – P(a < X < b, Y < c)
= P(X < b, Y < d) – P(X < a, Y < d) – [P(X < b, Y < c) –
P(X < a, Y < c)]
= F(b, d) – F(a, d) – F(b, c) + F(a, c)
Thus, the joint probability of the random variables X and Y can be found in terms of their joint
distribution functions.
Example. Suppose that two marbles are selected from a box containing 3 blue, 2 red and 3 green
marbles. If X is the number of blue marble and Y is the number of red marbles selected, find the
joint pf and P(X+Y≤ 1).

Solution: From 8 marbles, 2 can be selected in (82 ) ways. The number of ways of selecting x
blue and y red marbles is (3x )( 2y)( 2−x−
3
y)
. Then the joint probability is given by

f (x, y) = P(X = x, Y =y) =


( x y 2−x− y )
)(
3 2
)( 3
; x = 0, 1, 2; y = 0, 1, 2
(2 )
8

and The probability P(X+Y≤ 1) is given by


P(X+ Y≤ 1) = f (0, 0) + f (1, 0) + f (0, 1)
3 3 9 9
= + + =
28 14 28 14
Example. The joint probability distribution of X and Y is given by
x+ y
= 30 ; x = 0, 1, 2, 3 , y = 0, 1, 2.

Find (i). P(X≤ 2 ,Y =1), (ii). P(X¿ 2 ,Y ≤ 1), (iii). P(X¿ Y ), and (iv). P(X + Y = 4).
19

Solution.
2
(i). P(X≤ 2 ,Y =1) = ∑ f (x , 1)
x=0

1 2 3 6 1
+ + = =
= = 30 30 30 30 5
7
(ii). P(X > 2, Y≤1) = f (3, 0) + f (3, 1) =
30
(iii). P(X > Y) = f (1, 0) + f (2, 0) + f (2, 1) + f (3, 0)

+ f (3, 1) + f (3, 2) =
8 4
(iv). P(X + Y = 4) = f (3, 1) + f (2, 2) = =
30 15

Example. The joint cdf of X and Y is given by

F(x, y) = {( 1−e− x )( 1−e− y ) ; x> 0 , y >0


0 ; elsewhere
Find F(x), F(y), f (x, y), and P(1< X < 3, 1 < Y < 2).
Solution. Here
−x −y
F(x) = F(x,∞ ¿= ylim (1−e ¿ )(1−e ) ¿
→∞
= (1 - e− x ¿; x > 0.
Similarly,
F ( y )=F ( ∞ , y ) =lim (1−e−x )(1−e− y )
x→∞

=
Partial differentiation of F(x, y) yields
2
∂ F(x, y) = −(x+ y) ; x > 0, y > 0.
¿e
∂x ∂ y
Then
3 2

P(1 < X < 3, 1 < Y < 2) = ∫ ∫ e−(x+ y)dxdy


x=1 y=1

= (e−1−e−3 )(e−1−e−2 ).
20

Example. The joint function of two random variables X and Y is given by

a. Verify whether defines a pdf.


b. Find the probability

Solution.

So, is a joint pdf.


Now,

Marginal Distributions

Suppose X and Y are jointly distributed discrete random variables with joint pf If we take
summation over Y or over X, that is,∑ y
f ( x , y )or ∑ f ( x , y ) ,then the resulting function will be a
x

function of one variable only. Similarly, in case of continuous joint pdf , if one variable is
integrated out, the resulting function will be a function of one variable only. These functions are
called marginal pf or marginal pdf.
Marginal Distribution: If X and Y are random variables, either discrete or continuous, with
joint pf (pdf) , then the function given by
f ( x)= ∑ f ( x , y) , when X and Y are discrete
y

= , when X and Y are continuous,


for all x within its range, and
f ( y )= ∑ f ( x , y) , for discrete X and Y
x
21

= for continuous X and Y


for all y, are called marginal density function of X and Y respectively. The distribution of X and
the distribution of Y are called the marginal distribution of X and marginal distribution of Y re-
spectively.
Marginal distributions of X and Y are probability distributions and satisfy all the properties of
pf/pdf.
Example. Let the joint pf of X and Y be
x+ y
f (x, y) = ; x = 1, 2, 3; y = 1, 2 .
21
Then the marginal pf of X is
2
x+ y x+1 + x +2 = 2 x +3
f (x) = ∑ = ; x = 1, 2, 3
y=1 21 21 21 21
and the marginal pf of Y is
3
x + y 1+ y + 2+ y + 3+ y = 2+ y
f (y) = ∑ = ; y = 1, 2
x=1 21 21 21 21 7
Then, for example, P(X = 3) or P(Y = 2) can be obtained either from the joint pf or from mar-
ginal pf as
3 4
P(X = 3) = and P(Y = 2) = .
7 7
Example. The joint pdf of X and Y is given by

f (x, y) = {2;0 ;0<elsewhere


x< y<1

1
2
1
(
Find the marginal probability density functions of X and Y. Also find P X ≤ , Y ≤ .
2 )
Solution. The marginal probability density functions are given by

{2(01−x
1
) ;0< x <1
f (x) = ∫ 2 dy =
x ; elsewhere

{02;yelsewhere
y
;0< y<1
f (y) = ∫ 2 dx =
0

Also,
1 1
2 2

( 1
2
1
)
P X ≤ , Y ≤ =∫ ∫ 2 dxdy = ½ .
2 0 0
Example. Suppose the joint pdf of X and Y is given by
f ( x , y ) =6 x ; 0< x < y <1
22

= 0; elsewhere
Find the marginal pdfs of X and Y.
Solution. The marginal pdf of X is given by

= .
Similarly, the marginal pdf of Y is given by

.
Example. The joint pdf of X, Y, and Z is given by
f ( x , y , z )=( x+ y ) e− z ; 0< x< 1, 0< y< 1,
and z >0. Find f (x, y), f (y, z) and f (z).
Solution. The joint pdf f (x, y) is given by
∞ ∞
f (x, y) = ∫ f ( x , y , z ) dz=( x+ y )∫ e dz
−z

z=0 0

= ( x + y ) ; 0 < x<1 , 0< y <1


For f (y, z), we have
1
f (y, z) = ∫ ( x + y ) e dx
−z

¿e
−z
( 12 + y ); 0< y <1 , z >0.
The pdf f (z) is obtained by integrating out x and y, given by
f (z) = e− z ; z >0.

Conditional Distributions
We have learned that the conditional probability of an event A, given B, is given by
P (A ∩B)
P( A|B ) = , P(B) ≠0 .
P( B)
Assuming that A and B are the events X = x, and Y =y, we can write
P ( X=x ,Y = y ) f (x , y )
P( X =x|Y = y ) = = , provided f (y) ≠ 0.
P (Y = y ) f ( y)
Denoting P( X =x|Y = y ) = f( x| y ), we give the following definition.
23

Conditional Distribution: If X and Y are jointly distributed random variables, either discrete
or continuous, with joint pf/pdf then the functions given by
f ( x , y)
f( x| y )= , ≠0
f ( y)
and
f ( x , y)
f( y|x )= , ≠0
f (x )
are called the conditional pf/pdf of X for given Y, and conditional pf/ pdf of Y, for given X re-
spectively.
Conditional probability function and conditional probability density function define distributions
called conditional distributions.
Example. Let
x+ y
= 21 ; x = 1, 2, 3 and y = 1, 2

we obtain the marginal probability function of X and marginal probability function of Y respec-
tively as
2 x +3
f (x) = ; x = 1, 2, 3
21
and
2+ y
f (y) = ; y = 1, 2 .
7
Then the conditional distribution of X for given Y, and the conditional distribution of Y for given
X are obtained respectively as
x+ y
f( x| y )= ; x = 1, 2, 3, for a given Y
3(2+ y )
and
x+ y
f( y|x )= ; y = 1, 2, for a given X
2 x +3
We also obtain,
x+1
P( X|Y =1 ) = f( x|1 )= ; x = 1, 2, 3
9
and
2
x +1 5
P( X ≤ 2|Y =1 ) = ∑ = .
x=1 9 9
Example. Let
24

f (x, y) = {2;0 ;0<elsewhere


x< y<1

the marginal pdfs are


f (x) = 2(1- x) ; 0 < x < 1
f (y) = 2y ; 0 < y < 1.
The conditional distribution of X, given Y and the conditional distribution of Y for given X are
obtained respectively as
2 1
f ( x| y ) = = ; 0<x<1
2y y
2 1
f( y|x ) = = ; 0 < y < 1.
2(1−x ) 1−x

(
For the P 0< X < |
1 1
2 2 )
<Y < 1 , we obtain

1 1
P (0< X < , <Y < 1)
(
P 0< X <
1 1
2 2 |
<Y < 1 = ) 1
2 2
P( <Y <1)
2
= .
3
2
25

Independence of Random Variables


/marginal pdfs.
The random variables The jointly distributed random variables can be dependent or independent.
X1 , X2 , … , XK
The K random variables are said to be independent, if for all

where, is the joint pdf of the variables and are


their individualX and Y are said to be independent if and only if
f (x, y) = f (x).f (y)
for all possible values of X and Y.
Example. Suppose that X and Y have the following joint probability function,

X
Y
2 4 f (y)

1 0.10 0.15 0.25

3 0.20 0.30 0.50

5 0.10 0.15 0.25

f (x) 0.40 0.60 1.00

(i). Find the marginal distributions of X and Y.


(ii). Verify whether the two variables are independent
(iii). Find P( Y =5| X=2 ).

Solution: Taking summation over Y, the marginal distribution of X is given by

X 2 4

f (x) 0.40 0.60

and the marginal distribution of Y, taking sum over X, is obtained as

Y 1 3 5

f (y) 0.25 0.50 0.25

Since f (x, y) = f (x). f (y) for all values of X and Y, the variables X and Y are independent.
26

P ( X =2, Y =5) 0.10


Also, P( Y =5| X=2 ) = = = 0.25.
P(X =2) 0.40
27

Example. Let be the joint pdf of the variables X and Y. Find


(i) (ii) Also find the marginal distributions of X and Y. Are X and Y inde-
pendent?
Solution. (i)

= +
=
(ii) The marginal distributions are given by

=
and

Since X and Y are independent.


Example. The joint pdf of X and Y is given by

Find the marginal pdfs of X and Y. Are X and Y independent?


Solution. The marginal pdfs are given by

and

Here, X and Y are not independent.


Example. Let X denotes number of hours per day a student watches Television and Y the daily
number of hours s/he spends on homework. The joint pdf is approximated by
f ( x , y ) =xy e−(x+ y) ; x >0 , y >0
What is the probability that a randomly chosen student spends at least twice as much time watch-
ing television as working on homework?
Solution. According to the question, we have to evaluate the probability P ( X ≥2 Y ) . It is given
by
x
∞ 2

P ( X ≥2 Y )=∫ ∫ xy e
−¿ x+ y ¿
¿ dydx
0 0

= ∫xe ¿
−x

0

x
= ∫ x e [¿ 1−( +1) e ]dx ¿
−x −x /2

0 2
28

∞ ∞ 2 ∞
x −3 x/ 2
= ∫ x e dx−¿ ∫ e dx−∫ x e−3 x /2 dx ¿
−x

0 0 2 0
16 4 7
= 1− − =
54 9 27
29

Functions of Random Variables


For discussing about functions of random variables, first we need to capture the idea of a func-
tion. We start with the definition of a function and then talk about functions of random variables
and their probability distributions.
Function: If a variable Y depends on a variable X in such a way that each value of X determines
exactly one value of Y, then we say that Y is a function of X, written Y = g(X). That is, a function
is a rule (law, formula, recipe) that associates each point in one set of points with one and only
one point in another set of points. In other words, a function is a rule that associates a unique
output with each input. For example, the area A of a circle depends on its radius r by the equa-
tion A = π r 2, then we say that A is a function of r.

Function of Random Variables: Let X is a random variable (rv) with a known probability dis-
tribution and Y = g(X) is a function of X. Then Y is also a random variable having a specified
probability distribution.

A function can be a function of one or more random variables. If X 1 , X 2 , … , X n is a set of ran-


dom variables, Y =g (X 1 , X 2 ,… , X n) is a function of the variables X 1 , X 2 , … , X n. A function can
be the function of discrete or continuous random variables.

Our concern here is to find the probability distribution of function Y of random variables when
the probability distributions of the variables involved in the function are known. Applicable
methods for finding distribution of functions of random variables would vary depending on
whether the function is discrete or continuous.

Functions of Discrete Random Variables


It X is a discrete random variable (rv), then the function Y =g (X ) is also a discrete rv. The prob-
ability associated with value x of X becomes associated with the value y corresponding to the
function of X. As long as the relationship between the values of X and Y =g ( X ) is one-to-one,
distribution of Y can easily be obtained by appropriate transformation or substitution. Consider
the following example.

Example. Suppose that the random variable X has the probability distribution
X | -1 0 1
1 1 1
f (x) |
3 2 6
Let Y = g(X) = 3X + 1. Then the possible values of Y are -2, 1 and 4 assume with probabilities
1
P(Y = -2) = P(3X+1 = -2) = P(X = -1) = .
3
1
P(Y = 1) = P(3X + 1 = 1) = P(X = 0) = ,
2
1
and P(Y = 4) = P(3X + 1 = 4) = P(X = 1) = .
6
That is,
30

Y | -2 1 4
1 1 1
f (y) |
3 2 6
where, f (y) = P(Y = y) is the pf of the random variable Y. It may be noted that the values of Y
are determined by the relation Y = 3X + 1, but the probabilities remain the same.
Example. Consider another example, where a housing complex has a service contract on its el-
evators. The complex pays $100 each month plus $25 for each service visit. Let X denotes the
number of repair visits during a month. Suppose it is known that the probability distribution of X
is as follows.
X∨0 1 23
f ( x )∨0.5 0.2 0.2 0.1

The cost Y is a function of X which is Y = 100 + 25X. For each X = x, there is a corresponding Y
= y.

The possible values of Y for given values of X are 100, 125, 150 and 175 with f (y) = f (x). Thus,
Y ∨100 125150 175
f ( y )=f ( x ) ∨0.5 0.20.2 0.1
In general, we may summarize the distribution of a function of a discrete random variable as fol-
lows.
Theorem: If x 1 , x 2 , … … are possible values of a discrete random variable X with corresponding
probabilities P(X = x i) =
f ( x i), i = 1, 2, …and if Y = g(X) is a function such that to each value of Y there corresponds ex-
actly one value of X, then probability distribution of Y is given by
Possible values of Y y i=g( x i)
∨ .
Probabilities of Y f ( y i )=P ( Y = y i )=f ( x i)
The function Y = g(X) does not always possess the above characteristics, and it may happen that
several values of X lead to the same value of Y.
For example, let the distribution of X be
X | -1 0 1
1 1 1
f (x) |
3 2 6
If Y = g(X) = X 2 in the above example, possible values of Y are 0 and 1. For X = 0, Y = 0 with
probability ½, but we have Y = 1 for two values of X i.e. X = -1 or X = 1 with corresponding
probabilities 1/3 and 1/6. These probabilities need to be combined to form the probability for Y =
1. Thus,
1 1 1
P(Y = 1) = P(X = -1) + P(X = 1) = + = .
3 6 2
Thus, the distribution of Y = g(X) = X 2 is given by
31

Y | 0 1
1 1
f (y) |
2 2
32

Example. Let X have the possible values 1, 2, 3, … and suppose that Let

if X is even

if X is odd.
Find the pf of Y.
Solution. Here Y assumes two values -1 and +1. Y = 1 if and only if X is even, that is, x = 2, or
4, or 6, or ….. Then

Hence

Example. Among 10 applicants for an open position, 6 are female and 4 are male. Three appli-
cants are randomly selected for the final interview. Let X be the number of female applicants
among the final three.
a. Find the probability function of X.
b. Define Y, the number of male among the final three, as a function of X, and find the pf of
Y.
Solution.
a. The pf of X is given by

( )( ) ( )
f ( x )= 6 4 / 10
x 3−x 3
; x = 0, 1, 2, 3

b. Y, the number of male in randomly selected 3, is given by


Y = 3 – X. Then its pf is given by

( )( ) ( )
f ( y )= 4 6 / 10 ; y =0 ,1 , 2 ,3
y 3− y 3

( )( ) ( )
¿ 4 6 / 10 ; x=0 , 1 ,2 , 3
3−x x 3
¿ f (x)

This substitution technique can also be used when the form of the distribution of discrete X is
known. But it does not work for finding the distribution of functions of continuous random vari-
ables.
33

Functions of Continuous Random Variables


For finding the distribution of a function Y = g(X) of a continuous random variable X, the method
as discussed above will not work because Y will have infinite number of possible values for X. A
number of techniques are available for finding the probability density function for functions of
continuous random variables.

Method of Distribution Function


For finding the probability distribution of Y = g(X), it is convenient to first find the distribution
function of Y in terms of the distribution function of X, and then by differentiation of the distribu-
tion function, pdf of Y is obtained. As an illustration, we consider the following example.
Example. Suppose that X is a continuous random variable with pdf

{
2 x ; 0< x <1
= 0 ; elsewhere

Let To find the pdf of Y, first we obtain the distribution function of Y,


F Y ( y )=P(Y ≤ y) = P(2X+ 1 ≤ y ¿
y −1
= P(X≤ 2 ¿ =
y−1
2 2
= ∫ 2 xd x=[ y−1 ] .
0 2
By differentiation, we obtain
1
f( y )=F 'Y ( y) = (y - 1).
2
Since f (x) ¿ 0 , for 0< x< 1, f (y) ¿0 for 1 ¿ y <3 . Thus,

{
y−1
f ( y ) = 2 ;1< y< 3
0 ; elsewhere
Theorem: Let X is a continuous random variable with pdf f (x). Let Y = X 2 . Then
1
f Y ( y )= [ f X ( √ y ) + f X(-√ y )] ; y > 0
2√ y
Proof: This can be proved using cdf. We may write,
F Y ( y ) = P(Y≤ y ) = P( X 2 ≤ y ) = P(-√ y ≤ X ≤ √ y )
= F X ( √ y ) - F X (− √ y )
By differentiation, it follows that
1 1
f Y ( y )= f X ( √ y) - (- f X (−√ y ))
2√y 2√ y
34

1
¿ [f X ( √ y ) + f X (− √ y )].
2√ y
Example. Suppose that

{
1
;−1< x<1
f (x) = 2
0 ; elsewhere
Let Y= X 2 . Then pdf of Y is obtained using the rule as given above. We have
1
f Y ( y )= [ f X ( √ y ) +f X (−√ y )]
2√ y
11 1
= [ + ]
2√ y 2 2
1
= ; 0¿ y <1 .
2√ y
Example. Let the pdf of X is
1
f (x) = ; -1 ¿ x <2
3
Let Y = X 2 . We shall use the relation,
1
f Y ( y )= [ f X ( √ y ) + f X (− √ y )].
2√ y
Now, f X ( √ y ) will be non-zero only if -1 ¿ √ y <2 , or equivalently, when 1¿ y <4 . Similarly,
f X (−√ y )will be non-zero only if -1 ¿−√ y< 2 i.e, 0 ¿ y <1
1 1 1 1
Thus, for 0≤ y <1, f Y ( y )= + )=
( and
2√y 3 3 3√y
11 1
for 1≤ y <4 , f Y ( y )= ( + 0) = . Thus,
2√y 3 6√ y

{
1
; 0≤ y <1
3√y
f Y ( y) =
1
;1 ≤ y< 4
6√ y
A generalization of the distribution function method is the following method of transformation.

Method of Transformation: One variable


Method of transformation of finding probability distribution of a function of a continuous ran-
dom variable is simply a generalization of the distribution function method. It is given in the fol-
lowing theorem.
35

Theorem: Let X is a continuous random variable with pdf . Suppose that Y = g(X) is
strictly monotone (increasing or decreasing) function of X. Then the random variable Y, defined
as Y = g(X), has the pdf given by

| |
dx
= f X ( x ) dy ,
where, x is expressed in terms of y.

Proof. Let Y = g(X) is a function of X with inverse function X = h(Y) such that h is a one-to-one
continuous function. To find the pdf of Y= g(X) by the distribution function method, we write
F Y ( y ) = P(Y≤ y ¿=P[ g(X ) ≤ y ]
= P [ X ≤h ( y ) ]=F X [ h ( y ) ] ,
where X , F ( x) is the distribution function of X and x = h(y). Then
d FY ( y )
f Y (y) = = d F X ¿ ¿ = f X [ h ( y ) ] . h' ( y ) ,
dy
because h is an increasing function of x ,h ' ( y ) >0.
If g ( X ) isa decreasing function of x so that h' ( y )< 0,
'
f Y ( y )=−f X [h ( y ) ]. h ( y ) .
Thus, in general, if g is a continuous, one-to-one function of x,
f Y ( y )=f X [h ( y ) ].∨h' ( y )∨.

| |
= f X(x)
dx
dy
,
where, x is expressed in terms of y.
Example. Find the pdf of Y = -2X + 5 if X has the pdf given by

{
2 x ; 0< x <1
= 0 ; elsewhere

Solution. Solving Y = g(X) = -2X+5 for X, the inverse function is


5−Y
X = h(Y) =
2
1
where, h is a continuous, one-to-one function (3 < y < 5 for 0 < x < 1). Since h' (y) = - ≠ 0 for 3
2
< y < 5,
5− y 1
f Y ( y )=f X [h(y] |h' (y)| = 2( ) |- |
2 2

{
5− y
;3< y <5
= 2
0 ; elsewhere
Theorem : Let X be a random variable with pf (pdf) f (x). Let Y = aX + b, where a≠ 0 and b is
any constant. Then it follows that
y −b
f Y ( y )=f X ( ) ; if X is discrete
a
36

if X is continuous.
Proof: Let X be discrete and Y = aX + b. Then
f Y ( y )=P(Y = y ) = P(aX + b = y)
y−b
= P(X = ) = f X ¿)
a
For continuous X, the pdf of Y can be approached via cdf. Thus,
F y ( y )=P ( Y ≤ y )=P(aX + b ≤ y )
y −b y−b
= P(X≤ ) = FX ( ); if a ¿ 0.
a a
Differentiating both sides, gives
1
f Y ( y )= f X ¿ .
a
For a < 0, it can be shown that
f Y ( y )=
−1
a
.f X(y−b
a )
.
Thus, since probability cannot be negative,
1 f y−b
f y ( y )= ( ).
|a| x a
Example. Let f(x) = 6x(1-x); 0 ¿ x <1. Let Y = 5X + 2.
Then using the above rule,
1 y−2 1 y−2 y−2
f Y ( y )= f X( )= .6( )(1 - )
5 5 5 5 5
6
= (y - 2)(7 - y).
125
Here, f Y ( y )does not hold for all y. Y = 5X + 2 and 0 ¿ x <1 imply that f Y ( y ) will be non-zero for y
between 2 and 7. Thus,

{
6
f Y ( y )= 125
( y−2 ) ( 7− y ) ; 2< y <7
0 ; elsewhere

Other Methods:
Method of Conditioning
Method of Generating Functions

Convolutions
Convolution is a mathematical operation on two functions to produce a third function that ex-
presses how the shape of one is modified by the other. In statistics, convolution equation can be
37

used as a technique for finding the probability distribution for the sum of two independent ran-
dom variables.
The following theorem gives a formula for finding pf (pdf) for the sum of two independent ran-
dom variables.

Theorem: Let X and Y be independent random variables with respective pfs (pdfs) and

. Let Z = X + Y. Then the distribution of Z is given by

if X and Y are discrete

= if X and Y are continuous.

The sum or integral of the form is called the convolution of the functions and
Proof. Let the variables X and Y are discrete. The probability that
Z = z can be written as

=
= .
For continuous X and Y, the pdf of Z is to be obtained by differentiating the cdf of Z. We may
write,

=
Since X and Y are independent, the double integral can be expressed as

=

=∫ f X ( x ) ¿ ¿
−∞
By differentiation, it follows that

=
Convolutions have applications in many areas of mathematical statistics and mathematics. Be-
side their frequent applications in random variable problems, convolutions have applications in
computer vision, natural language processing, image and signal processing, engineering, differ-
ential equations and in other areas of mathematics.
38

Example. Given that X and Y are independent and Find


when Z = X +Y.
Solution. By convolution equation of and we have

Here, is non-zero if But is non-zero if or Thus,

=
=
This technique can also be used to find the distribution of sum of three or more varoables.
Example. Let , and , each be independent random variables with pdf
i = 1, 2, 3.
Find the pdf of
Solution. Let Then the pdf of Y is given by

=
Now The pdf of Z is given by

You might also like