Chapter 6: Order Statistics
Chapter 6: Order Statistics
1. X1 , X2 , . . . , X n are independent,
X(1) min{X1 , X2 , . . . , X n },
1
X(n−1) min {X1 , X2 , . . . , X n } − {X(1) , X(2) , . . . , X(n−2) } ,
max{X1 , X2 , . . . , X n }.
Then
X(1) < X(2) < · · · < X(n)
Statistical Inference, August 26, 2020
denotes the original random sample after arrangement in increasing order of magnitude,
and these are collectively termed the order statistics of the random sample X1 , X2 , . . . , X n .
The rth smallest, 1 ≤ r ≤ n, of the ordered X ’s, X(r) , is called the rth-order statistic. Some
authors use alternative notation
to denote the order statistics. Further, by taking Yi X(i) , i 1, . . . , n one can also define
order statistics as
Y1 < Y2 < · · · < Yn .
Some familiar applications of order statistics, which are obvious on reflection, are as
follows:
1. X(n) , the maximum (largest) value in the sample, is of interest in the study of foods
and other extreme meteorological phenomena.
2. X(1) , the minimum (smallest) value, is useful for phenomena where, for example,
the strength of a chain depends on the weakest link.
3. The sample median, defined as X([n+1]/2) for n odd and any number between X(n/2)
and X(n/2+1) , for n even, is a measure of location and an estimate of the population
central tendency.
4. The sample midrange, defined as (X(1) +X(n) )/2, is also a measure of central tendency.
6. In some experiments, the sampling process ceases after collecting r of the observa-
tions. For example, in life-testing electric light bulbs, one may start with a group of
n bulbs but stop taking observations after the rth bulb burns out. Then information
is available only on the first r ordered ”lifetimes” X(1) < X(2) < · · · X(r) , where r ≤ n.
This type of data is often referred to as censored data.
2
7. Order statistics are used to study outliers or extreme observations, e.g., when so-
called dirty data are suspected.
For example, if a random sample of five light bulbs is tested, the observed failure times
might be (in months) (x1 , . . . , x5 ) (5, 11, 4, 100, 17). Now, the actual observations would
Statistical Inference, August 26, 2020
have taken place in the order x 3 4, x1 5, x2 11, x5 17, and x4 100. But the
"ordered" random sample in this case is (y1 , . . . , y5 ) (4, 5, 11, 17, 100).
The joint distribution of the ordered variables (that is Y1 , . . . , Yn ) is not the same as the
joint distribution of the unordered variables (that is X1 , X2 , . . . , X n ). Note that because of
ordering
Let us do some mathematics to compute the joint density of Y1 , Y2 , Y3 . Note that whenever
I do mathematics I use lowercase letters. Observe that
which gives (3! 6) six different (distinct) possibilities for substitution, namely,
y1 x 1 , y2 x 2 , y3 x 3 ,
y1 x 2 , y2 x 1 , y3 x 3 ,
3
y1 x 1 , y2 x 3 , y3 x 2 ,
y1 x 2 , y2 x 3 , y3 x 1 ,
y1 x 3 , y2 x 1 , y3 x 2 ,
y1 x 3 , y2 x 2 , y3 x 1 ,
It is clear that the above defined transformation is no one-to-one, and it may be carried
Statistical Inference, August 26, 2020
out by partitioning the domain into subsets A1 , A2 , . . . , A6 such that the transformation is
one-to-one on each subset. Let
and
(x1 , x 2 , c) ∈ A2 if and only if x1 > x2
Since the conditions x1 < x2 and x 1 > x2 are contradictory, the triplet (x1 , x2 , c) can not
belong to A1 and A2 simultaneously. Hence A1 and A2 are disjoint. Further, observe that
S A1 ∪ A2 ∪ · · · ∪ A6 and the range of the transformation is B {(y1 , y2 , y3 ) : a < y l <
y2 < y3 < b}.
Now, writing x1 , x2 , x 3 in terms of y1 , y2 , y3 (see definitions of A1 , A2 , . . . , A6 ), we have
six possible inverse transformations as
x 1 y1 , x 2 y2 , x 3 y3 ,
4
x 1 y2 , x 2 y1 , x 3 y3 ,
x 1 y1 , x 2 y3 , x 3 y2 ,
x 1 y3 , x 2 y1 , x 3 y2 ,
x 1 y2 , x 2 y3 , x 3 y1 ,
x 1 y3 , x 2 y2 , x 3 y1 .
Statistical Inference, August 26, 2020
∂x ∂x ∂x
∂u ∂v ∂w
∂y ∂y ∂y
J(x, y, z → u, v, w)
∂u ∂v ∂w
∂z ∂z ∂z
∂u ∂v ∂w
| J(x 1 , x2 , x3 → y2 , y1 , y3 )| 1,
| J(x 1 , x2 , x3 → y1 , y3 , y2 )| 1,
| J(x1 , x2 , x 3 → y3 , y1 , y2 )| 1.
| J(x 1 , x2 , x3 → y2 , y3 , y1 )| 1,
| J(x 1 , x2 , x3 → y3 , y2 , y1 )| 1,
fY1 ,Y2 ,Y3 (y1 , y2 , y3 ) f X (y1 ) f X (y2 ) f X (y3 )(1) + f X (y2 ) f X (y1 ) f X (y3 )(1)
+ f X (y1 ) f X (y3 ) f X (y2 )(1) + f X (y3 ) f X (y1 ) f X (y2 )(1)
+ f X (y2 ) f X (y3 ) f X (y1 )(1) + f X (y3 ) f X (y2 ) f X (y1 )(1)
6 f X (y1 ) f X (y2 ) f X (y3 )
3! f X (y1 ) f X (y2 ) f X (y3 ), a < y1 < y2 < y3 < b
where the last line has been obtained because the product is commutative.
5
Note that we can write the joint density of Y1 , Y2 , Y3 as
But only the factorization of the joint density is not sufficient for independence. You have
to look at the support set of fY1 ,Y2 ,Y3 (y1 , y2 , y3 ). If the support set of the joint density is not
a cartesian product, the random variables are not independent. In this case the support
set of fY1 ,Y2 ,Y3 (y1 , y2 , y3 ) can not be written as cartesian product and therefore Y1 , Y2 , Y3
Statistical Inference, August 26, 2020
6
∫ 1
48y1 y2 y3 dy3
y2
" #1
y32
48y1 y2
2
y2
24y1 y2 (1 − y22 ),
where 0 < y1 < y2 < 1. Now, integrate y2 in the joint density fY1 ,Y2 (y1 , y2 ) to get
Statistical Inference, August 26, 2020
∫ 1
fY1 (y1 ) fY1 ,Y2 (y1 , y2 ) dy2
y1
∫ 1
24y1 y2 (1 − y22 ) dy2 .
y1
If we want to know the probability that the smallest observation is below some value, say
0.1, it follows that ∫ 0.1
p[Y1 ≤ 0.1] 6y1 (1 − y12 )2 dy1 0.03.
0
To derive the marginal pdf of the largest order statistic, Y3 , we proceed as follows:
first, by integrating y1 in the joint density fY1 ,Y2 ,Y3 (y1 , y2 , y3 ) we get the marginal density
fY2 ,Y3 (y2 , y3 ) and then integrating y2 in the joint density fY2 ,Y3 (y3 , y3 ) we get the marginal
density fY3 (y3 ). Let us integrate y1 to get
∫ y2
fY2 ,Y3 (y2 , y3 ) fY1 ,Y2 ,Y3 (y1 , y2 , y3 ) dy1
0 ∫ y2
48y2 y3 y1 dy1
0
" # y2
y12
48y2 y3
2
0
24y23 y3 ,
7
where 0 < y2 < y3 < 1. Now, integrate y2 in the joint density fY2 ,Y3 (y2 , y3 ) to get
∫ y3
fY3 (y3 ) fY2 ,Y3 (y2 , y3 ) dy2
0 ∫ y3
24y3 y23 dy2
0
6y , 5
0 < y3 < 1.
Statistical Inference, August 26, 2020
To compute the marginal density of Y2 we have two choices: we integrate y3 in the density
of fY2 ,Y3 (y2 , y3 ) or we may choose to integrate y2 in the density of fY1 ,Y2 (y1 , y2 ). The
marginal density of Y2 is derived as
∫ 1
fY2 (y2 ) fY2 ,Y3 (y2 , y3 ) dy3
y2
∫ 1
24y23 y3 dy3
y2
It is possible to derive an explicit general formula for the distribution of the kth order
statistic in terms of the pdf, f X (x), and CDF, FX (x), of the population random variable X.
If X is a continuous random variable with f X (x) > 0 on a < x < b (a may be −∞ and b
may be +∞), then, for example, for n 3,
"∫ #
∫ b b
f y1 (y1 ) 6 f X (y1 ) f X (y2 ) f X (y3 ) dy3 dy2
y1 y2
"∫ #
∫ b b
6 f X (y1 ) f X (y2 ) f X (y3 ) dy3 dy2 .
y1 y2
Now, by using the definition of cdf, the integral in the square bracket is evaluated as
∫ b b
f X (y3 ) dy3 FX (y3 ) FX (b) − FX (y2 ) 1 − FX (y2 ),
y2
y2
where the last line has been obtained by subtitling 1 − FX (y2 ) z and evaluating the
resulting integral.
8
Similarly,
"∫ #
∫ y2 b
f y2 (y2 ) 6 f X (y1 ) f X (y2 ) f X (y3 ) dy3 dy1
a y2
∫ y2 "∫ b
#
6 f X (y2 ) f X (y1 )dy1 f X (y3 ) dy3
a y2
Next, consider the joint density of Y1 , . . . , Yn and integrate it with respect to y1 to get
the joint density of Y2 , . . . , Yn as
∫ y2
fY2 ,...,Yn (y2 , . . . , y n ) n! f X (y2 ) · · · f X (y n ) f X (y1 ) dy1
a
FX (y2 )
n! f X (y2 ) · · · f X (y n ) , a < y2 < y3 · · · < y n < b.
1
Next, integrating y2 in the above density, the joint density of Y3 , . . . , Yn is derived as
y3
FX (y2 )
∫
fY3 ,...,Yn (y3 , . . . , y n ) n! f X (y3 ) · · · f X (y n ) dy2
a 1
[FX (y3 )]2
n! f X (y3 ) · · · f X (y n ) , a < y3 < y4 · · · < y n < b.
(1)(2)
Further, integrating y3 in the above density, the joint density of Y4 , . . . , Yn is derived as
y4
[FX (y3 )]2
∫
fY4 ,...,Yn (y4 , . . . , y n ) n! f X (y4 ) · · · f X (y n ) dy3
a (1)(2)
[FX (y4 )]3
n! f X (y4 ) · · · f X (y n ) , a < y3 < y4 · · · < y n < b.
(1)(2)(3)
Similarly, one-by-one, we can integrate y4 , y5 , . . . , y j−1 (in this order) to get
[FX (y j )] j−1
fYj ,...,Yn (y j , . . . , y n ) n! f X (y j ) · · · f X (y n ) , a < y j < · · · < y n < b.
(1)(2)(3) · · · (j − 1)
9
Next, let us start integrating variables from the right. Integrating y n in the above density
using the result
b
1 − FX (y n−1 )
∫
f X (y n ) dy n
y n−1 1
we get
(j − 1)! 1
a < y j < · · · < y n−1 < b.
we get
Similarly, one-by-one, we can integrate y n−2 , y n−3 , . . . , y k+1 , k ≥ j, (in this order) to get
Now, substituting j k in the above density we get Theorem 6.5.2 of the book.
n!
fYk (y k ) [FX (y k )]k−1 [1 − FX (y k )]n−k f X (y k )
(k − 1)! (n − k)!
For continuous random variables, the pdf’s of the minimum and maximum, Y1 and
Yn , which are special cases of the above result are
10
and
respectively.
The densities of Y1 and Yn can also be obtained by differentiating corresponding CDF.
For discrete and continuous random variables, the CDF of the minimum or maximum of
Statistical Inference, August 26, 2020
the sample can be derived directly by following the CDF technique. For the minimum,
Next step needs some explanation. Note the Y1 is the minimum (smallest among X1 , . . . , X n )
and therefore if Y1 > y1 then X i > y1 for all i 1, . . . , n. Conversely, if X i > y1 for all
i 1, . . . , n, then Y1 > y1 . Now,
If FX is differentiable, then
d
fY1 (y1 ) FY (y1 )
dy1 1
d
{1 − [1 − FX (y1 )]n }
dy1
n[1 − FX (y1 )]n−1 f X (y1 ), a < y1 < b.
FYn (y n ) P[Yn ≤ y n ].
FYn (y n ) P[Yn ≤ y n ]
P[X1 ≤ y n ]P[X2 ≤ y1 ] · · · P[X n ≤ y n ] (X1 , . . . , X n are independent)
11
{P[X ≤ y n ]} n (X1 , . . . , X n have identical distribution)
[FX (y n )]n .
If FX is differentiable, then
d
fYn (y n ) FY (y n )
dy n n
n[FX (y n )]n f X (y1 ), a < y n < b.
Statistical Inference, August 26, 2020
Next, we will derive the joint density of Yj and Yk , 1 ≤ j < k ≤ n. To obtain the joint
density of Yj and Yk , 1 ≤ j < k ≤ n, we integrate y j+1 , . . . , y k−1 in fYj ,...,Yk (y j , . . . , y k ).
In the following expression of the joint density, we indicate variables and factors to be
integrated
12
and
y j+4 [FX (y j+3 ) − FX (y j )]2 [FX (y j+4 ) − FX (y j )]3
∫
f X (y j+3 ) dy j+3 .
yj 2 (2)(3)
Now, observing the trend of the above three integrals, we can safely conclude that
y j+i+1 [FX (y j+i ) − FX (y j )]i−1 [FX (y j+i+1 ) − FX (y j )]i
∫
f X (y j+i ) dy j+i ,
yj (i − 1)! i!
for i 1, 2, . . . , k − j − 1.
Statistical Inference, August 26, 2020
Finally, integrating y j+1 , . . . , y k−1 in (A) by using (B), we get the joint density of Yj and
Yk , 1 ≤ j < k ≤ n as
n!
fYj ,Yk (y j , y k )
(j − 1)!(k − j − 1)!(n − k)!
× [FX (y j )] j−1 [FX (y k ) − FX (y j )]k− j−1 [1 − FX (y k )]n−k f X (y j ) f X (y k ),
a < y j < y k < b.
The smallest and largest order statistics are of special importance, as are certain func-
tions of order statistics known as the sample median and range. If n is odd then the
sample median is the middle observation, Yk where k (n + 1)/2 if n is even, then it
is considered to be any value between the two middle observations Yk and Yk+1 where
k n/2, although it is often taken to be their average. The sample range is the difference
of the smallest from the largest, R Yn − Y1 .
Chapter 6: Formulas
1 Joint density of Y1 , . . . , Yn
13
where a < y1 < y2 , · · · < y n < b.
2 Marginal density of Yk
n!
fYk (y k ) [FX (y k )]k−1 [1 − FX (y k )]n−k f X (y k )
(k − 1)! (n − k)!
For continuous random variables, the pdf’s of the minimum and maximum, Y1 and Yn ,
which are special cases of the above result are
and
respectively.
3 CDF of Y1 and Yn
For discrete and continuous random variables, the CDF of the minimum of the sample is
14
4 Joint density of Yj and Yk
n!
fYj ,Yk (y j , y k )
(j − 1)!(k − j − 1)!(n − k)!
Statistical Inference, August 26, 2020
15