Professional Documents
Culture Documents
SAMPLE
Eduncle.com
Mpa 44, 2nd Floor, Above Bank Of India, Rangbari Main Road,
Toll Free: 1800-120-1021
Mahaveer Nagar 2nd, Near Amber Dairy, Kota, Rajasthan, 324005
Website: www.eduncle.com | Email: Info@Eduncle.com
Mathematical Statistics (Sample Theory)
1. MOMENTS
The rth moment of a random variable X about the mean , also called the rth central moment,
is defined as
r = E[(X – )r] ...(3)
where r = 0, 1, 2, ... It follows that 0 = 1, 1 = 0, and 2 = 2, i.e., the second central moment
or second moment about the mean is the variance. We have, assuming absolute convergence
r (x )r f(x) dx (continuous variable) ...(5)
The rth moment of X about the origin, also called the rth raw moment, is defined as
2 2 2
MX (t) eix f(x)dx (continuous variable) ...(11)
dr
r MX (t)
dt r t 0
i.e., r’ is the rth derivative of Mx(t) evaluated at t = 0. Where no confusion can result, we often
write M(t) instead of Mx(t). ...(13)
3. CHARACTERISTIC FUNCTIONS
If we let t = i, where i is the imaginary unit, in the moment generating function we obtain an
important function called the characteristic function. We denote this by
X() = MX(i) = E(eiX) ...(16)
It follows that
X () eiX f(x)dx (continuous variable) ...(18)
Since |eiX| = 1, the series and the integral always converge absolutely.
The corresponding results (12) and (13) become
2 r
X () 1 i 2 ... ir r ... ...(19)
2! r!
dr
where r ( 1)r ir X () ...(20)
dr 0
( X a) / b () eai / b x ...(21)
b
Theorem. If X and Y are independent random variables having characteristic functions X() and
Y(), respectively, then
X+Y() = X() Y() ...(22)
More generally, the characteristic function of a sum of independent random variables is equal
to the product of their characteristic functions.
Theorem. (Uniqueness Theorem) Suppose that X and Y are random variables having
characteristic functions X() and Y(), respectively. Then X and Y have the same probability distribution
if and only if X() = Y() identically.
An important reason for introducing the characteristic function is that (18) represents the Fourier
transform of the density function f(x). From the theory of Fourier transforms, we can easily determine
the density function from the characteristic function. In fact,
1 ix
f(x) e x ()d ...(23)
2
which is often called an inversion formula, or inverse Fourier transform. In a similar manner we
can show in the discrete case that the probability function f(x) can be obtained from (17) by use of
Fourier series, which is the analog of the Fourier integral for the discrete case.
Another reason for using the characteristic function is that it always exists whereas the moment
generating function may not exist.
4. CHEBYSHEV’S INEQUALITY
An important theorem in probability and statistics that reveals a general property of discrete or
continuous random variables having finite mean and variance is known under the name of Chebyshev’s
inequality.
Theorem 3.18 (Chebyshev’s Inequality) Suppose that X is a random variable (discrete or
continuous) having mean and variance 2, which are finite. Then if is any positive number.
2
P(| X | ) ...(1)
2
or with = k,
1
P(| X | k) ...(2)
k2
Proof. Case (i) : X is a continuous r.v. By def.,
2 = x2 = E[X – E(X)]2 = E[X – ]2
2
= (x ) f(x) dx, where f(x) is p.d.f. of X.
k k
2
(x ) f(x)dx (x )2 f(x)dx (x )2 f(x)dx
k k
k
(x )2 f(x)dx (x )2 f(x)dx ...(*)
k
We know that :
x – k and x + k |x – | k ...(**)
Substituting in (*), we get
k
2 k 2 2 f(x) dx f(x) dx
k
2 2
P | x | c | 2 and P | X | c 1 2
c c
Var(X)
P X E(X) c
c2
Var(X) ...(3)
and P X E(X) c
c 2
Generalised Form of Bienayme-Chebychev’s Inequality. Let g(X) be a non-negative function
of a random variable X. Then for every k > 0.
we have
E{g(X)}
P{g(X) k} ...(1)
k
Proof. Here we shall prove the theorem for continuous random variable. The proof can be
adapted to the case of discrete random variable on replacing integration by summation over the given
range of the variable.
Let S be the set of all X where g(X) > k, i.e..,
S = {x : g(x) k}
E[g(x)]
P[g(X) k]
k
Remarks.1 If we take g(X) = {X – E(X)}2 = {X – }2 and replace k by k22 in (1), we get
E(X )2 2 1
P (x )2 k 2 2
2 2
2 2
2
k k k
P x k 1/ k 2 , ...(2)
2. Markov’s Inequality. Taking g(X) = |X| in (1) we get, for any k > 0
E| X |
P[| X | k] , ...(3)
k
which is Markov’s inequality.
Rather, taking g(X) = |X|r and replacing k by kr in (1), we get a more generalised form of Markov’s
inequality, viz.,
E| X |r
P[| X |r k r ] ...(4)
kr
3. If we assume the existence of only second-order moments of X, then we cannot do better than
Chebychev’s inequality . However, we can sometimes improve upon the results of Chebychev’s inequality
if we assume the existence of higher order moments. We give below (without proof) one such inequality
which assumes the existence of moments of 4th order.
Theorem 6.31a. E|X|4 < , E(X) = 0 and E(X2) = 2
4 4
P{| X | k} ...(5)
4 4k 4 2k 2 4
If X ~ U[0, 1], [c.f. Chapter 8], with p.d.f p(x) = 1, 0 < x < 1 and = 0, otherwise, then
E(Xr) = 1/(r + 1); (r = 1, 2, 3, 4)
E(X) = 1/2, E(X2) = 1/3, E(X3) = 1/4, E(X4) = 1/5 ...(*)
2 2
Var(X) = E(X ) – [E(X)] = 1/12
1 4
4 = E(X – )4 = E(X – ) = 1/80
2
Chebychev’s inequality with k = 2 gives:
1 1 1
P X 2 1 0.75
2 12 4
1 1
1 1 80 44 4
P X 2
2 12 1 1 1 49
80 9 8
1 1 4 45
P X 2 1 49 49 0.92,
2 12
which is a much better lower bound than the lower bound given by Chebychev’s inequality.
6.14. Convergence in probability. We shall now introduce a new concept of convergence, viz.,
convergence in probability or stochastic convergence which is defined as follows:
A sequence of random variables X1,X2,...,Xn,... is said to converge in probability to a constant a,
if for any > 0,
lim P(| Xn a | ) 1 ...(1)
n
or its equivalent
lim P(| Xn a | ) 0 ...(2)
n
and we write
p
Xn a as n ...(3)
p
If there exists a random variable X such that Xn – X a as n , then we say that the random
variable X.
Remark.1 If a sequence of constants an a as n , then regarding the constant as a random
p
variable having a one-point distribution at that point, we can say that as an a as n .
2. Although the concept of convergence in probability is basically different from that of ordinary
convergence of sequence of numbers, it can be easily verified that the following simple rules hold for
convergence in probability as well.
If Xn p
and Yn p
as n , then
(i) p
Xn Yn as n
(ii) p
Xn Yn as n
Xn p
(iii) as n , provided 0.
Yn
n2
P Xn n 0 as n
2
Here p
Xn n 0 as n
Example. 3.6 Letting k = 2 in Chebyshev’s inequality (59), we see that
P(|X – | 2) 0.25 or P|(X – | < 2) 0.75
In words, the probability of X differing from its mean by more than 2 standard deviations is less
than or equal to 0.25; equivalently, the probability that X will lie within 2 standard deviations of its mean
is greater than or equal to 0.75. This is quite remarkable in view of the fact that we have not even
specified the probability distribution of X.
Example. Roll a single fair die and let X be the outcome. Then, E(X) = 3.5 and var(X) = 35/12.
(Make sure you can compute this!) Suppose we want to compute p = P(X > 6).
(a) Exact: We easily see that p = P(X > 6) = P(X = 6) = 1/6 0.167.
(b) By Markov inequality, we get:
21/ 6
P(X 6) 0.583
6
(c) By the usual (two-sided) Chebyshev inequality, we can obtain a stronger bound on p:
35 / 12 7
P(X 6) P(X 6 OR X 1) P(| X 3.5 | 2.5) 0.467
(2.5)2 15
(d) By using the one-sided Chebyshev inequality, we can obtain an even stronger bound on
p:
35 /12 7
P(X 6) P(X 3.5 2.5) 2
0.318
(35 /12) (2.5) 22
Example. A post office handles, on average, 10,000 letters a day. What can be said about the
probability that it will handle at least 15,000 letters tomorrow.
Solution. Let X be the number of letters handled tomorrow. Then E(X) = 10,000. Further, by
Markov’s inequality we get
E(X) 10,000 2
P(X 15,000)
15,000 15,000 3
Example. Let X be uniform (0,1). Then for any a > 0 Chebyshev’s Inequality gives
1 1
P X a 1 2
2 a
whereas
1 1 1 1
P X a P a X a P a X a
2 2 2 2
a1/ 2 2a a
dx 2a , 0a 3
a1/ 2
12 3
a 1 a 1
One can show that 1 2 for any a > 0. Moreover, f(a) =
2 1 attains its minimum
3 a 3 a
on (0, ) at a = 6
12 = 1.513... with
6
f 12 0.3103...
Example. Let X be the outcome of a roll of a die. We know that
E[X] = 21/6 = 3.5 and Var(X) = 35/12
Thus, by Markov,
E[X] 21/ 6
0.167 1/ 6 P(X 6) 21/ 36 0.583
6 6
Chebyshev gives
35 / 12
P(X 2 or X 5) 2 / 3 P(| X 3.5 | 1.5) 1.296
1.52
SAMPLE QUESTIONS
SECTION-(A) MULTIPLE CHOICE QUESTIONS (MCQ)
1 2x
xe x0
f (x) = 4
0 otherwise
(A) 1 (B) 2
(C) 3 (D) 4
5
x5
f (x) = x 2
0 x5
(A) 2 (B) 0
(C) 5 (D) None of the above
4. Let X be a discrete random variable with values x = 0,1,2, and probabilities P(X = 0) = 0.25,
P(X = 1) = 0.50, and P(X = 2) = 0.25, respectively. Find E(X).
(A) 1 (B) 0
(C) 4 (D) 8
5. Y is an exponential random variable with variance Var [Y] = 25. What is E[y2] ?
(A) 10 (B) 50
(C) 30 (D) 20
1. If the moments of a variate X are defined by E(X’) = 0.6, r = 1,2,3... then which of the following
holds?
(A) P(X = 0) = 0.4 (B) P(x = 1) = 0.6
(C) P(X 2) = 0. (D) P(X = 3) = 0.8
x
3. Let X have the pmf f(x) = , x = 1,2,.3 then which of the following are true?
6
(A) mean = 7/3 (B) variance = 5/9
(C) E(X) = 0.6 (D) E(X2) = 5
4. If X is a random variable and ‘a’ is constant, then which of the following holds?
x
1. Let X have the pmf f(x) = , x = 1, 2, 3, 4. Find E(X)
10
4. Two unbiased dice are thrown. Find the expected value of the sum of number of points on them.
5. In four tosses of a coin, let X be the number of heads. Tabulate the 16 possible outcomes with
the corresponding values of X. By simple counting, derive the distribution of X and hence
calculate the expected value of X.
ANSWER KEY
SOLUTIONS
SECTION-(A) MULTIPLE CHOICE QUESTIONS (MCQ)
Using the definition of expectation, we find E (Xn) by simply replacing the stray x in the
integral by xn.
1
n
1 x n1 1
n
E (X ) = 0 x dx
n 1 0 n 1
1 1
E (X) = x xe 2 dx x 2 e 2 dx
4 4
x
First, let y – .
2
1
Then x – 2y and dx – 2dy. The 4 from x2 – (2y)2 – 4y2 cancels the in front of the integral,
4
and we must carry the 2 from dx outside the integral to get,
1 2 2 1 2 y
2 y
E (X) = 0 x e dx 0 (2y) e (2dy) 20 y e dy
4 4
Recall that when we have an integral fo the form h (x) = (x) y (x) dx we use
integration bt parts. We choose values for a and d. Then we compute dn and u and
2 y y
= 2 y e e (2y) dy
2 y y
= 2 y e 2 ye dy
integrate by parts again. Use: = y, d = e–y thus d = dy, = –e–y
2 y y y
= 2 y e 2 ye e dy
2 y y y
= 2 y e 2 ye e
= – 2 y2 e–y –4ye–y –4e–y
= 2e y y 2 2y 2
0
1 2 y 2
= 2 xlim y 2y 2 2e y 2y 2 0
ey
= The limit evaluates to so use l’ Hopital’s Rule.
2y 2
L ’ H 2 lim 2 e y y 2 2y 2
x ey 0
2
L ‘ H 2 xlim 2 e y y 2 2y 2
ey 0
0
= 4
5
3. (D) E(X) = x dx
5 x2
5
= dx
5 x
1
= 5 5 dx
x
= 5 ln x 5
= 5 xlim
ln x ln 5
5. (B) We observe that an exponential PDFY with parameter > 0 has PDF
ey y0
fy (y) = (1)
0 otherwise
1. (A,B,C)
The m.g.f of variate X is :
tr r
tr
Mx (t) = r 1 (0.6)
r 0 r! r 1 r !
tr
= 0.4 + 0.6 = 0.4 + 0.6 et ...(i)
r 0 r!
tx
But Mx (t) = E (e ) = tx e
r 0
P (X = x)
tx
= P(X = 0) + et . P(X = 1) + e
r 2
. P(X = x) ...(ii)
n 1
2. (A,D) n (t) = eitx dx ( fn (x) = Fn ‘ (x))
n 2n
itx itx
1 e e Sinnt
=
it
nt
2n
Sinnt
(t) Lim n (t) = Lim
n
n nt
1 if t 0
=
0 if t 0
i.e., (t) is discontinuous at t = 0
1
and Lim Fn (x) =
n 2
Hence F(x) is not a distribution function.
Contact Us : Website : www.eduncle.com | Email : support@eduncle.com | Call Toll Free : 1800-120-1021 13
Mathematical Statistics (Sample Theory)
x
x 7
3. (A,B) E(X) = x
x 1
1 f(x1)= x 6 3
x
2 2 x 36
E(X ) =2 x
x 1
1 f(x1) = x 6 6 6
2
2 2 2 7 5 5
Variance E(x ) (E(X)) 6 and
3 9 3
(x) f(x) dx a f (x) dx
= E[(x)] f (x) dx 1
E(X) = x.p(x)dx ( If , X 0, P (c) = 0 for x < 0)
x.p(x)dx 0,
x
x
1. 3 E(X) = x f x
x 1
i i = x 10 = 3,
93 11
= 4 4 1 = 209
2 2
3. 3.5 Let X be the random variable representing the number on a die when thrown. Then X
1
can take any one of the value 1, 2, 3, ...., 6 each with equal probability . Hence
6
1 1 1 1
E (X) = 1 2 3 ... 6
6 6 6 6
1
= (1 + 2 + 3 + ... + 6)
6
1 67 7
= = = 3.5
6 2 2
4. 7 The probability function of X (the sum of number obtained on two dice), is
Value of X : x 2 3 4 5 6 7 ….. 11 12
Probability 1/36 2/36 3/36 4/36 5/36 6/36 ….. 2/36 1/36
E (X) = p xi
i i
1 2 3 4 5 6 5
= 2 3 4 5 6 7 8 9
36 36 36 36 36 36 36
4 3 2 1
10 11 12
36 36 36 36
1
= (2 + 6 + 12 + 20 + 30 + 42 + 40 + 36 + 30 + 22 + 12)
36
1
= × 252 = 7
36
5. 2 Let H represent a head, T a tail and X, the random variable denoting the number of
heads.
No. of Heads
S. No. Outcome
(X)
1 HHHH 4
2 HHHT 3
3 HHTH 3
4 HTHH 3
5 THHH 3
6 HHTT 2
7 HTTH 2
8 TTHH 2
9 HTHT 2
10 THTH 2
11 THHT 2
12 HTTT 1
13 THTT 1
14 TTHT 1
15 TTTH 1
16 TTTT 0
The random variable X- takes the value 0, 1, 2, 3, and 4. Since, from the above table,
we find that the number of cases favourable to the coming of 0,1, 2, 3 and 4 heads are
1, 4, 6, 4 and 1 respectively, we have :
1 1 6 3
P (X = 0) = , P (X = 1) = , P (X = 2) = =
16 4 16 8
4 1 1
P (X = 3) = = and P (X = 4) =
16 4 16
Thus the probability distribution of X can be summarised as follows :
x : 0 1 2 3 4
1 1 3 1 1
p (x) :
16 4 8 4 16
4
1 3 1 1
E (X) = xp x
x 0
= 1
4
2 3 4
8 4 16
1 3 3 1
= = 2
4 4 4 4