You are on page 1of 12

Probability

1. Any action by which an observation or a measurement is obtained is called an experiment.


2. A random experiment is an experiment which has the following characteristics:
(a) All outcomes of the experiment are known in advance.
(b) Which of these possible outcomes will actually occur when the random experiment is
performed is not known in advance.
(c) The experiment can be repeated under identical conditions.
3. Any outcome of a random experiment is called an event connected with it.
4. An event connected with a random experiment that cannot be decomposed further into
simpler events is called a simple event whereas an event that can be decomposed further
into simpler events is called a compound event.
5. An event that is sure to occur when a random experiment is performed is called a sure
event or a certain event for that random experiment.
6. An event that can never occur when a random experiment is performed is called an
impossible event for the random experiment.
7. Events connected with a random experiment are said to be equally likely if one cannot be
expected to occur in preference to the others.
8. Let E be an event connected with a random experiment. The complement of the event E is
denoted by E , and is defined to be the event that occurs if E does not occur.
9. Among all the simple events connected with a random experiment those simple events
whose occurrence results in the occurrence of an event E, are called the simple events
favorable to the occurrence of E.
10. Classical definition of probability.
Suppose the number of simple events connected with a random experiment is N, a finite
number, which are all equally likely. Suppose m of these simple events are favorable to the
occurrence of an event E, then the probability of the event E is denoted by P (E) and is
m
defined by P (E) = .
N
11. How to find the probability of an event E using the classical definition:
(a) Determine the total number of simple events connected with the random experiment.
Suppose this number is N. Verify that these simple events are all equally likely.
(b) Determine the number of simple events that are favorable to the occurrence of the
event E. Suppose this number is m.
m
(c) P (E) = N.

Remark: If E is a sure event then all the N simple events connected with the random
experiment are favorable to E. Thus P (E) = 1. If E is an impossible event then none of the
simple events connected with the random experiment are favorable to E, thus P (E) = 0. If
E is any event connected with the random experiment, then 0 ≤ m ≤ N , thus
0 ≤ P (E) ≤ 1. Also (N − m) simple events are favorable to the occurrence of E. Thus
P (Ē) = 1 − P (E).
12. Theorems on probability:
(i) The probability of a sure event is 1.
(ii) The probability of an impossible event is 0.
(iii) If E is any event connected with a random experiment then 0 ≤ P (E) ≤ 1.
(iv) If E is any event connected with a random experiment then P (E) = 1 − P (E).
13. Counting techniques
(a) Multiplication Principle
Suppose one operation can be performed in m ways and for each way of performing
this operation a second operation can be performed in n ways. Then the total number
of ways of performing these two operations together is mn.
(b) The Addition principle:
Suppose an opearation can be performed either in one of m ways or in one of n ways,
where none of the set of m ways is the same as any of the set of n ways then there are
(m + n) ways of performing this operation.
(c) Factorial
Definition: If n is a positive integer then factorial n is denoted by n! and is defined by
n! = 1.2.3 . . . n. By definition 0! = 1.
(d) Permutation
Any arrangement that can be made out of a given number of objects by taking some
or all of the objects at a time is called a permutation of the objects. If n objects are
given and we form an arrangement of r objects chosen from these n objects, then it is
called a permutation of these n objects taken r at a time. The number of
permutations of n distinct objects taken r at a time is denoted by n Pr .
Theorems on permutation:
n!
i. n Pr = = n.(n − 1).(n − 2) . . . (n − r + 1)
(n − r)!
Note: n Pn = n!, n P1 = n.
ii. The number of permutations of n objects taken r at a time, where p of the objects
are alike of one kind, q of the objects are alike of a second kind, r of the objects
n!
are alike of a third kind and the rest are all different is .
p! q! r!
iii. The number of permutations of n different objects taken r at a time, when each
object may occur any number of times is nr .
(e) Combination
Any group that can be made out of a given number of objects by taking some or all of
the objects at a time is called a combination of the objects. If n objects are given and
we form a group of r objects chosen from these n objects, then it is called a
combination of these n objects taken r at a time. The number of combinations of n
distinct objects taken r at a time is denoted by n Cr .
Theorems on combination:
n
Pr n!
i. n Cr = = .
r! r! (n − r)!
ii. n Cr =n Cn−r .
Note: n Cn = 1, n C1 = n, n C0 = 1.
iii. The number of combinations of n different objects taken some or all at a time is
(2n − 1).
iv. The number of combinations of n objects taken some or all at a time, where p of
the objects are alike of one kind, q of the objects are alike of a second kind, r of
the objects are alike of a third kind and the remaining s objects are all different is
2s (p + 1)(q + 1)(r + 1).
v. The number of ways in which (m + n) different objects can be divided into two
(m + n)!
groups one containing m objects, and the other containing n objects is .
m! n!
(f) Choosing balls from an urn:
The total number of ways of choosing m white balls and n black balls from an urn
containing M white and N black balls is M Cm .N Cn .
(g) Ordered Partitions (distinct objects):
The total number of ways of distributing n distinct objects into r boxes marked
1, 2, 3, . . . , r is rn .
The number of ways in which the n objects can be distributed so that the boxes
n!
contain respectively n1 , n2 , . . . , nr objects is .
n1 ! n2 ! . . . n r !
(h) Ordered Partitions (identical objects):
The total number of ways of distributing n identical objects into r boxes marked
1, 2, 3, . . . , r is n+r−1 Cr−1 .
If none of the boxes should remain empty, the total number of ways of distributing the
n objects is n−1 Cr−1 .
(i) Sum of points on dice:
When n dice are thrown, the number of ways of getting a total of r points is given by
the coefficient of xr in the expansion of (x + x2 + x3 + x4 + x5 + x6 )n
(j) Derangements and Matches:
If n objects numbered 1, 2, 3, . . . , n are distributed at random in n places also
numbered 1, 2, 3, . . . , n, a match is said to occur if an object occupies the place
corresponding to its number.
The number of permutations
 in which no match occurs (this  is also known as
1 1 1 n
1
derangement) is tn = n! 1 − + − + · · · + (−1)
1! 2! 3! n!
The number of permutations of n objects in which exactlyr matches occur is
n n! 1 1 1 1
Cr tn−r = 1 − + − + · · · + (−1)n−r
r! 1! 2! 3! (n − r)!
14. Odds in favor/ against an event
If E is an event connected with a random experiment, then
P (E)
Odds in favor of the event E = ,
P (E)
P (E)
Odds against the event E =
P (E)
m
Remark: If the odds in favor of event E are m:n, then P (E) = . If the odds against
m+n
n
event E are m:n, then P (E) = .
m+n
15. Union and intersection of events
The union of two events A and B connected with a random experiment is denoted by
(A + B) and is defined to be the event that occurs if at least one of the events A or B
occur, i.e. if A or B or both the events occur simultaneously. More generally, the union of
the events E1 , E2 , . . . , En connected with a random experiment is denoted by
(E1 + E2 + · · · + En ) and is defined to be the event that occurs if at least one of the events
E1 , E2 , . . . , En occurs.
The intersection of two events A and B connected with a random experiment is denoted
by AB and is defined to be the event that occurs if both the events occur simultaneously.
More generally, the intersection of the events E1 , E2 , . . . , En connected with a random
experiment is denoted by (E1 E2 · · · En ) and is defined to be the event that occurs if all the
events E1 , E2 , . . . , En occur simultaneously.
16. Events connected with a random experiment are said to be mutually exclusive if occurrence
of any one of the events prevents the occurrence of any of the others.
17. Events connected with a random experiment are said to be exhaustive if at least one of
them must necessarily occur if the random experiment is performed.
18. Addition theorems
(i) If A and B are any two events connected with a random experiment then
P (A + B) = P (A) + P (B) − P (AB).
(ii) If A, B, C are any three events connected with a random experiment then
P (A + B + C) = P (A) + P (B) + P (C) − P (AB) − P (BC) − P (CA) + P (ABC).
Corollary:
(a) If the events A and B connected with a random experiment are mutually exclusive,
then P (A + B) = P (A) + P (B).
(b) If the events E1 , E2 , · · · , En connected with a random experiment are mutually
exclusive then P (E1 + E2 + · · · + En ) = P (E1 ) + P (E1 ) + · · · + P (En ).
(c) If the events E1 , E2 , · · · , En connected with a random experiment are mutually
exclusive and exhaustive then P (E1 ) + P (E1 ) + · · · + P (En ) = 1.
(d) If A and B are any two events connected with a random experiment then
P (A B) = P (A) − P (AB) and P (A B) = P (B) − P (AB).
19. De Morgan’s Laws:
If A and B are events connected with a random experiment then (i) A + B = A B and
AB = A + B
20. Conditional probability
Suppose A and B are two events connected with a random experiment and P (B) 6= 0. The
conditional probability of A given B is denoted by P (A/B) and is defined to be the
probability of occurrence of the event A given that the event B has already occurred.
21. Multiplication theorem
If A and B are two events connected with a random experiment, then
P (AB) = P (A) P (B/A) if P (A) 6= 0
.
= P (B) P (A/B) if P (B) 6= 0
P (AB) P (AB)
Remark: P (A/B) = and P (B/A) = .
P (B) P (A)
22. Independent events
Let A and B be events connected with a random experiment and P (B) 6= 0. The event A is
said to be independent of the event B if the probability of occurrence event A is not
affected by the occurrence or non occurrence of B i.e. if P (A/B) = P (A).
Corollary:
(a) Let A and B be events connected with a random experiment and P (B) 6= 0, P (A) 6= 0.
If A is independent of B then B is also independent of A. Thus, we say that the events
A and B are independent.
(b) Let A and B be events connected with a random experiment and P (B) 6= 0, P (A) 6= 0.
Then A and B are independent if and only if P (AB) = P (A)P (B).
(c) Suppose A and B be events connected with a random experiment. If A and B are
independent events then
(i) A and B are independent events. Thus, P (AB) = P (A)P (B).
(ii) A and B are independent events. Thus, P (AB) = P (A)P (B).
(iii) A and B are independent events. Thus, P (A B) = P (A)P (B).
(d) Suppose A and B be events connected with a random experiment. If A and B are
independent events then P (A + B) = P (A) + P (B) − P (A)P (B).

23. Three events A, B, C connected with a random experiment are said to be mutually
independent (or simply independent) if (i) P(AB) = P(A) P(B) (ii) P(BC) = P(B) P(C)
(iii)P(CA)=P(C) P(A) (iv) P(ABC)=P(A) P(B) P(C)
24. Bayes’ Theorem
Suppose an event A can occur only if one of the mutually exclusive and exhaustive set of
events B1 , B2 , · · · , Bn occurs. Suppose the unconditional probabilities
P (B1 ), P (B2 ), · · · , P (Bn ) are known and the conditional probabilities
P (A/B1 ), P (A/B2 ), · · · , P (A/B1 ) are also known. Then the conditional probability
P (Bi /A) of the occurrence of the event Bi given that the event A has already occurred is
given by:

P (Bi )P (A/B1 )
P (Bi /A) =
P (B1 )P (A/B1 ) + P (B2 )P (A/B2 ) + · · · + P (Bn )P (A/Bn )
Random Variable
1. A random variable is a rule that associates with each simple event of a random
experiment a definite real number. Random variables will be denoted by capital letters and
their values by small letters.
Theorem
(a) If X1 , X2 are random variables and c1 , c2 are constants then c1 X1 + c2 X2 and X1 X2 are
also random variables.
(b) If f (x) is a continuous function and X is a random variable, then f (X) is a random
variable.
2. Discrete and Continuous Random Variables:
Random variables can be of two types- discrete and continuous. A random variable that
can assume a finite number of values or a countably infinite number of values is called a
discrete random variable whereas a random variable that can assume all possible values in
a certain interval is called a continuous random variable.
3. Probability Distribution of a Discrete Random Variable:
Let X be a discrete random variable which can assume a finite number of values
x1 , x2 , . . . , xn with probabilities p1 , p2 , . . . , pn respectively. Then the set of values of X
together with their probabilities is called the probability distribution of the discrete
random variable X. The probability distribution of a discrete random variable can be
represented by a formula, a table or a graph.
4. Probability Mass Function of a Discrete Random Variable:
Let X be a discrete random variable which can assume a finite number of values
x1 , x2 , . . . , xn . Then the function f (x) defined by
(
P (X = xi ), if x = xi
f (x) =
0, otherwise
is called the probability mass function or p.m.f. of the discrete random variable X.
X
The p.m.f. f (x) always satisfies the conditions (i) 0 ≤ f (x) ≤ 1 (ii) f (x) = 1
all x

5. Probability Distribution Function of a Discrete Random Variable:


Let X be a discrete random variable. A function F whose value for each real number x is
given by F (x) = P [X ≤ x] is called the distribution function for the random variable x.
6. Properties of Distribution Function:
The distribution function F (x) has the following properties:
(a) F(x) is non-decreasing, i.e. if x, y are real numbers such that x ≤ y then F (x) ≤ F (y).
(b) lim F (x) = 0 and lim F (x) = 1.
x→−∞ x→∞
(c) F (x) is continuous from the right, i.e. lim+ F (x + h) = F (x) for all x.
h→0

7. Expectation of a Discrete Random Variable:


Let X be a discrete random variable with p.m.f. f (x). Then the expectation
X or expected
value or mean value of X is denoted E(X) and is defined by E(X) = x f (x).
x

8. Properties of Expectation of a Discrete Random Variable:


If X is a discrete random variable and c is a constant, then
(a) E(c)=c
(b) E(cX)=cE(X)
(c) E[c g(X)]=c E[g(X)] where g(X) is any function of X.
(d) If X and Y are discrete random variables then E(X+Y)=E(X)+E(Y).
More generally, if X1 , X2 , . . . , Xn are discrete random variables and a, c1 , c2 , . . . , cn are
constants then
E(a + c1 X1 + c2 X2 + · · · + cn Xn ) = a + c1 E(X1 ) + c2 E(X2 ) + · · · + cn E(Xn )).
(e) If g1 (X), g2 (X), . . . , gn (X) are functions of X then
E(g1 (X) + g2 (X) + · · · + gn (X)) = E(g1 (X)) + E(g2 (X)) + · · · + E(gn (X)).
9. Variance of a Discrete Random Variable:
If X is a discrete random variable the variance of X is denoted by var(X) and is defined
by V ar(X) = E[X − E(X)]2 = E(X 2 ) − (E(X))2 . Variance of X is also denoted by σx2 . The
positive square root of the variance of X, denoted by σx , is called the standard deviation
of X.
10. Properties of Variance of a Discrete Random Variable:
If X is a discrete random variable and a, c are constants, then
(a) V ar(c) = 0.
(b) V ar(a + c X) = c2 V ar(X).
11. The Binomial Distribution:
(a) If a random experiment is performed repeatedly then each repetition is called a trial.
(b) A sequence of trials is called Bernoulli trials or a Binomial Experiment if they
possess the following properties:
i. The result of each trial can be classified into one of two categories -one we will call
“success” and the other “faliure”.
ii. The probability p of success is the same for each trial.
iii. Each trial is independent of all the others.
iv. The experiment consists of a fixed number of identical trials.
(c) Theorem: Let X be a random variable which is equal to the number of successes in a
sequence of n Bernoulli trials and let p be the probability of success in each trial then
the probability of exactly r (
successes in the n trials is denoted by b(r; n, p) and is given
n
Cr pr q n−r for r = 0, 1, . . . , n
by P (X = r) = b(r; n, p) =
0, otherwise
where 0 ≤ p ≤ 1, 0 ≤ q ≤ 1 and p + q = 1.
(d) If n is a positive integer greater than 1 and 0 ≤ p ≤ 1 then a random variable X is said
to have binomial distribution with parameters( n, p if the probability mass function
n
Cx px q n−x for x = 0, 1, . . . , n
f (x) of X is given by f (x) = P (X = x) =
0, otherwise
where 0 ≤ p ≤ 1, 0 ≤ q ≤ 1 and p + q = 1. In this case we write X v B(n, p).
2

(e) If X v B(n, p) then (i) E(X) = np (ii) σX = npq (iii) σX = npq

12. The Poisson Distribution:


(a) If λ > 0, a random variable X is said to have Poisson distribution with parameters
λ if the probability mass
 −λfunction f (x) of X is given by
x
e λ
for x = 0, 1, 2, 3, . . .
f (x) = P (X = x) = x! .
0, otherwise

In this case we write X v P (λ).


2

(b) If X v P (λ) then (i) E(X) = λ (ii) σX = λ (iii) σX = λ
(c) Poisson Approximation to Binomial Distribution:
Suppose X v B(n, p) where
(i) The number of trials is infinitely large i.e. n → ∞
(ii) The probability of success in each trial is very small i.e. p → 0.
(iii) The mean np = λ is finite.
Then X approximately follows the Poisson distribution with mean λ = np.
13. Continuous Random Variable:
A random variable X is said to be continuous if it can assume all possible values in a
certain interval.
14. Probability Density Function of a Continuous Random Variable:
The probability density function or the p.d.f. of a continuous random variable X, denoted
by fX (x) or simply by f (x) is a function such that for any two real numbers α and β
Z β
P (α ≤ X ≤ β) = f (x) dx (1)
α

Thus P (α ≤ X ≤ β) is the area bounded by the curve y = f (x), the x-axis, and the vertical
lines x = α and x = β.
Note: In case of a continuous random variable X, P (X = c) = 0 for any real number c.
Thus, P (α ≤ X ≤ β) = P (α < X ≤ β) = P (α < X < β) = P (α < X ≤ β).
15. Properties of the Probability Density Function:
The p.d.f. of a continuous random variable X has the following properties:
Z ∞
(i) f (x) ≥ 0 (ii) f (x) dx = 1
−∞

16. Probability Distribution Function of a Continuous Random Variable:


The probability distribution function or the cumulative distribution function of a
continuous random variable X, denoted by FX (x) of simply by F (x) and is defined by
Z x
FX (x) = P (X ≤ x) = f (x) dx where, −∞ < x < ∞ and f (x) is the p.d.f. of X.

17. Properties of the Probability Distribution Function:


(i) For −∞ < x < ∞, 0 ≤ F (x) ≤ 1.
(ii) F (−∞) = lim F (x) = 0.
x→−∞
(iii) F (∞) = lim F (x) = 1.
x→∞
(iv) F 0 (x) = f (x).
Z b
(v) F (b) − F (a) = f (x) dx.
a

18. Expectation of a Continuous Random Variable:


The expectation of a continuous random variable X, with p.d.f. f (x), denoted by E(X) is
defined by Z ∞
E(X) = xf (x) dx, (2)
−∞

provided the integral exists.


Theorem: If g(X) is a function of a continuous random variable X, with p.d.f. f (x), then
the expectation of g(X) is given by
Z ∞
E[g(X)] = g(x)f (x) dx, (3)
−∞

19. Properties of Expectation of a Continuous Random Variable:


If X is a continuous random variable and c is a constant, then
(a) E(c)=c
(b) E(cX)=cE(X)
(c) E[c g(X)]=c E[g(X)] where g(X) is any function of X.
(d) If X and Y are continuous random variables then E(X+Y)=E(X)+E(Y).
More generally, if X1 , X2 , . . . , Xn are continuous random variables and a, c1 , c2 , . . . , cn
are constants then
E(a + c1 X1 + c2 X2 + · · · + cn Xn ) = a + c1 E(X1 ) + c2 E(X2 ) + · · · + cn E(Xn )).
(e) If g1 (X), g2 (X), . . . , gn (X) are functions of X then
E(g1 (X) + g2 (X) + · · · + gn (X)) = E(g1 (X)) + E(g2 (X)) + · · · + E(gn (X)).
20. Normal Distribution:
A continuous random variable X is said to have Normal Distribution, with parameters µ
and σ if its probability density function (p.d.f.) is given by

1 (x−µ)2
f (x) = √ e− 2σ2 where ∞ < x < ∞, −∞ < µ < ∞, σ ≥ 0 (4)
σ 2π

21. Properties of Normal Distribution:


(a) The graph of the Normal distribution is a bell shaped curve that extends indefinitely
far in both directions.
(b) If a random variable X has Normal distribution then its p.d.f. is given by Equation (4).
It involves two parameters µ and σ. E(X) = µ and S.D.(X) = σ. If a random variable
X has Normal distribution with parameters µ and σ then we write X v N (µ, σ)
1 (h)2
(c) f (µ + h) = f (µ − h) = √ e− 2σ2 . This shows the Normal curve is symmetrical
σ 2π
about x = µ.
(d) For the Normal distribution Mean = Median = Mode.
1
(e) The Maximum value of f (x) occurs when x = µ and is equal to √ .
σ 2π
(f) The x axis is a horizontal asymptote of f(x). f (x) 6= 0 for any x, but as x → ±∞ then
f (x) → 0.
(g) (i) The total area under the normal curve is 1.
(ii) P (µ − σ < X < µ + σ) = Area under the normal curve between the ordinates
µ − σ and µ + σ = 0.6827.
(iii) P (µ − 2σ < X < µ + 2σ) = 0.9545
(iii) P (µ − 3σ < X < µ + 3σ) = 0.9973

X −µ
22. Let X v N (µ, σ). Let Z = . Then the random variable Z has normal distribution
σ
with mean 0 and S.D. 1. Z is called a standard normal variate. The p.d.f. of Z is given
by
1 (z)2
φ(z) = √ e− 2 (5)

The cumulative distribution
Z z function
Z zof Z is denoted by Φ(z) and is given by
(u)2
− 2
Φ(z) = P (Z ≤ z) = φ(z) dz = e du.
−∞ −∞
Remark: Any normal distribution can be converted into a standard normal distribution
X −µ
using the transformation Let Z = . Area under a standard normal curve between the
σ
ordinates 0 to z, for z > 0, as also the values of Φ(z) which gives the area under a standard
normal curve to the left of the ordinate at z are available in statistical tables, and are used
in the calculation of probabilities.
23. Given P (0 ≤ Z ≤ z) = Area under a standard normal curve between the ordinates 0 to z,
for z ≥ 0, the following probabilities can be obtained:
(a) P (z > 0) = P (Z < 0) = 0.5
(b) For b > a > 0, P (a ≤ Z ≤ b) = P (0 ≤ Z ≤ b) − P (0 ≤ Z ≤ a).
(c) For a > 0, P (−a ≤ Z ≤ 0) = P (0 ≤ Z ≤ a).
(d) For a > 0, b > 0, P (−a ≤ Z ≤ b) = P (0 ≤ Z ≤ a) + P (0 ≤ Z ≤ b).
(e) For b > a > 0, P (−b ≤ Z ≤ −a) = P (a ≤ Z ≤ b) = P (0 ≤ Z ≤ b) − P (0 ≤ Z ≤ a).
(f) For a > 0, P (−a ≤ Z ≤ a) = 2 P (0 ≤ Z ≤ a).
(g) For a > 0, P (Z ≥ a) = P (Z ≤ −a) = 0.5 − P (0 ≤ Z ≤ a).
(h) For a > 0, P (Z ≥ −a) = P (Z ≤ a) = 0.5 + P (0 ≤ Z ≤ a).
Joint Distribution of Two Discrete Random Variables:

1. Let X and Y be random variables associated with the same random experiment. Then the
pair (X,Y) is called a bivariate random variable. If both the random variables are
discrete, then (X,Y) is called a discrete bivariate random variable. The set of possible
values of (X,Y) will be called the range of (X,Y). If both the random variables are
continuous, then (X,Y) is called a continuous bivariate random variable.

2. Joint probability mass function:


Let (X,Y) be a discrete bivariate random variable. The joint (or bivariate) probability
mass function (pmf ) of X and Y is defined by

pXY (xi , yj ) = P (X = xi , Y = yj ), (6)

where X takes values x1 , x2 , x3 , . . . and Y takes values y1 , y2 , y3 , . . .


Properties of joint probability mass function:
(i) 0 ≤ pXY (xi , yj ) ≤ 1
XX
(ii) pXY (xi , yj ) = 1.
xi yj

3. Marginal probability mass function:


Let (X,Y) be a discrete bivariate random variable with joint pmf pXY (xi , yj ). Then the
marginal pmf is given by:
X
pX (xi ) = P (X = xi ) = pXY (xi , yj ) (7)
yj

X
pY (yi ) = P (Y = yi ) = pXY (xi , yj ) (8)
xi

pX (xi ) and pY (yi ) are called the marginal pmf ’s of X and Y respectively.
Suppose a random variable X takes three possible values x1 , x2 , x3 and a random variable Y
takes three values y1 , y2 , y3 . Let pij = P (X = xi , Y = yj ). The joint probability distribution
of the variables (X,Y) are described by the following table called the joint probability
table:
y1 y2 y3 Row sum
x1 p11 p12 p13 pX (x1 )
x2 p21 p22 p23 pX (x2 )
x3 p31 p32 p33 pX (x3 )
Column Sum pY (y1 ) pY (y2 ) pY (y3 ) 1

Thus pX (xi ) = Sum of entries in row i, and pY (yj ) = Sum of entries in column j.
4. Independent Random Variables:
Let (X,Y) be a discrete bivariate random variable. If for all (xi , yj ) in the range of (X,Y)
the events (X = xi ) and (Y = yj ) are independent, then we say X and Y are independent
random variables. Thus, a pair of discrete random variables X and Y are said to be
independent if and only if
pXY (xi , yj ) = pX (xi ) pY (yj ) (9)
for all (xi , yj ) in the range of (X,Y).
5. Expectation, Variance, Standard Deviation, Covariance:
Let (X,Y) be a discrete bivariate random variable with joint pmf pXY (xi , yj ), then
" #
X X X XX
(a) E(X) = µX = xi pX (xi ) = xi pXY (xi , yj ) = xi pXY (xi , yj )
xi xi j xi yj
XX X
Also, E(X 2 ) = x2i pXY (xi , yj ) = x2i pX (xi ),
xi yj xi
2
Var(X) = σX = E(X 2 ) − [E(X)]2 .
" #
X X X XX
(b) E(Y ) = µY = yj pY (yj ) = yj pXY (xi , yj ) = yj pXY (xi , yj ),
yj yj xi xi yj
XX X
Also, E(Y 2 ) = yj2 pXY (xi , yj ) = yj2 pY (yj ),
xi yj yj

Var(Y ) = σ = E(Y ) − [E(Y )] .


2
Y
2 2

(c) The standard deviation of X, denoted by σX , is the positive square root of the
variance.
(d) The covariance of X, Y denoted by Cov(X,Y) is defined by
Cov(X, Y )X X − µX )(Y − µY ) = E(XY ) − E(X)E(Y ), where
= E(X
E(XY ) = xi yj pXY (xi , yj )
xi yj

(e) If Cov(X, Y ) = 0 then X and Y are said to be uncorrelated. Thus if X and Y are
uncorrelated then E(XY ) = E(X)E(Y ).
Theorem: (i) If X and Y are independent they are uncorrelated, but the converse is
not true in general.
(ii) If (X,Y) is a discrete bivariate random variable, then
(a) E(X + Y ) = E(X) + E(Y )
(b) V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ).
(c) If X and Y are independent then V ar(X + Y ) = V ar(X) + V ar(Y ).
(f) The correlation coefficient of X and Y is denoted by ρXY or by ρ and is defined by

Cov(X, Y )
ρ= . (10)
σX σY
Theorem: −1 ≤ ρ ≤ 1. The equality holds when there is an exact linear relation
between X and Y.
(g) Conditional Distribution:
Let X and Y be discrete random variables with joint pmf pXY (xi , yj ) and marginal
pmf’s pX (xi ) and pY (yj ) respectively, then the conditional pmf of X given Y = yj
is denoted by pX/yj (xi ) and is defined by

P (X = xi , Y = yj ) pXY (xi , yj )
pX/yj (xi ) = P (X = xi /Y = yj ) = = (11)
pY (yj ) pY (yj )

provided pY (yj ) > 0. Similarly, the conditional pmf of Y given X = xi is denoted


by pY /xi (yj ) and is defined by

pXY (xi , yj )
pY /xi (yj ) = (12)
pX (xi )

provided pX (xi ) > 0.


Thus,
pXY (xi , yj ) = pX (xi )pY /xi (yj ) = pY (yj )pX/yj (xi ). (13)
Theorem: If X and Y are independent then

pY /xi (yj ) = pY (yj ) and pX/yj (xi ) = pX (xi ) (14)

6. Conditional Mean and Variance


Let X and Y be discrete random variables, the conditional expection (mean) of Y
given X = xi , is denoted by E(Y /xi ) or µY /xi and is defined by
X
E(Y /xi ) = yj pY /xi (yj ). (15)
yj

The conditional variance of Y given X = xi , is denoted by V ar(Y /xi ) and is defined by


X
V ar(Y /xi ) = E(Y 2 /xi ) − E(Y /xi )2 = yj2 pY /xi (yj ) − µ2Y /xi (16)
yj

The conditional expectation and variance of X given Y = yj are defined similarly.


Joint Distribution of Two Continuous Random Variables:

1. Let X and Y be continuous random variables associated with the same random experiment.
If there exists a function fXY (x, y) such that :
(i) fXY (x, y) ≥ 0,
(ii) fXY (x, y) is continuous for all (x,y) except possibly a finite set,
ZZ
(iii) For any region A of the two dimensional space, P ((X, Y ) ∈ A) = fXY (x, y) dx dy,
A
Z ∞ Z ∞
(iv) fXY (x, y) dx dy = 1,
x=−∞ y=−∞
Z b Z d

(v) P (a ≤ X ≤ b, c ≤ Y ≤ d)) = fXY (x, y) dx dy,


x=a y=c

Then X and Y are called jointly continuous random variables and fXY (x, y) is called
the joint (or bivariate) probability density function of the continuous bivariate
random variable (X,Y).
Remark: (i) Graphically z = fXY (x, y) represents a surface called the probability surface.
The total volume bounded by this surface and the x-y plane is 1.
(ii) It is possible to have continuous bivariate random variables X, Y that are not jointly
continuous i.e. X has probability density function fX (x), Y has probability density function
fY (y), but there is no joint density function fXY (x, y).
(iii) fXY (a, b) does not represent probability of anything.

2. Marginal probability density function:


Let X, Y be jointly continuous random variables with joint probability density function
fXY (x, y). Then the marginal probability density functions of X and Y are given by:
Z ∞

fX (x) = fXY (x, y) dy (17)


y=−∞

Z ∞

fY (y) = fXY (x, y) dx (18)


x=−∞

Z b Z b Z ∞

Note: P (a ≤ X ≤ b) = fX (x, y) dx = fXY (x, y) dy dx.


x=a x=a y=−∞

3. Independent Random Variables:


Let X, Y be jointly continuous random variables with joint probability density function
fXY (x, y) and marginal probability density functions fX (x) and fY (y) then we say X and Y
are independent random variables if fXY (x, y) = fX (x)fY (y) for all pairs of real numbers
(x,y). If X and Y are not independent then they are said to be dependent.

4. Expectation, Variance, Standard Deviation, Covariance:


Let X, Y be jointly continuous random variables with joint probability density function
fXY (x, y) and marginal probability density functions fX (x) and fY (y).
Z ∞ Z ∞ Z ∞ 
(a) E(X) = µX = x fX (x) dx = x fXY (x, y) dy dx =
Z ∞ Z ∞ x=−∞ x=−∞ y=−∞

xfXY (x, y) dy dx
x=−∞ y=−∞
Z ∞ Z ∞ Z ∞
2 2
Also, E(X ) = x fX (x) dx = x2 fXY (x, y) dy dx
x=−∞ x=−∞ y=−∞
2
Var(X) = σX = E(X 2 ) − [E(X)]2 .
(b) The standard deviation of X, denoted by σX , is the positive square root of the
variance.
(c) The covariance of X, Y denoted by Cov(X,Y) is defined by
Cov(X, Y )Z= E(XZ − µX )(Y − µY ) = E(XY ) − E(X)E(Y ), where
∞ ∞

E(XY ) = xyfXY (x, y) dy dx


x=−∞ y=−∞

(d) If Cov(X, Y ) = 0 then X and Y are said to be uncorrelated. Thus if X and Y are
uncorrelated then E(XY ) = E(X)E(Y ).
Theorem: (i) If X and Y are independent they are uncorrelated, but the converse is
not true in general.
(ii) If (X,Y) is a discrete bivariate random variable, then
(a) E(X + Y ) = E(X) + E(Y )
(b) V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ).
(c) If X and Y are independent then V ar(X + Y ) = V ar(X) + V ar(Y ).
(e) The correlation coefficient of X and Y is denoted by ρXY or by ρ and is defined by

Cov(X, Y )
ρ= . (19)
σX σY
Theorem: −1 ≤ ρ ≤ 1. The equality holds when there is an exact linear relation
between X and Y.
(f) Conditional Distribution:
Let X, Y be jointly continuous random variables with joint probability density
function fXY (x, y) and marginal probability density functions fX (x) and fY (y), then
for any real number y such that fY (y) > 0, the conditional probability density
function of X given Y = y is denoted by fX/y (x) and is defined by

fXY (x, y)
fX/y (x) = . (20)
fY (y)

Similarly, for any real number x such that fX (x) > 0, the conditional probability
density function of Y given X = x is denoted by fY /x (y) and is defined by

fXY (x, y)
fY /x (y) = (21)
fX (x)
.
Thus,
fXY (x, y) = fX (x)fY /x (yj ) = fY (y)fX/y (x). (22)
Theorem: If X and Y are independent then

fY /x (y) = fY (y) for all x, y such that fY (y) > 0 (23)

fX/y (x) = fX (x) for all y, x such that fX (x) > 0 (24)
5. Conditional Mean and Variance
Let X, Y be jointly continuous random variables, the conditional expection (mean) of
Y given X = x, is denoted by E(Y /x) or µY /x and is defined by
Z ∞
E(Y /x) = yfY /x (y) dy. (25)
y=−∞

The conditional variance of Y given X = x, is denoted by V ar(Y /x) and is defined by


Z ∞
V ar(Y /x) = E(Y 2 /x) − E(Y /x)2 = y 2 fY /x (y) − µ2Y /x (26)
y=−∞

The conditional expectation and variance of X given Y = y are defined similarly.

You might also like