You are on page 1of 5

Statistics & Probability Letters 19 (1994) 161-165 27 January 1994

North-Holland

Moments of order statistics


of the Cantor distribution
J.R.M. Hosking
IBM Research Diuision, Yorktown Heights, Nk: USA

Received March 1993


Revised June 1993

Abstract: We obtain recursive formulas for the expected values of order statistics from the Cantor distribution defined by Lad and
Taylor (Statist. Probab. Lett. 13 (1992) 307-310), and for a skew generalization of it. The formulas are used to calculate the
L-moments of the distribution.

Keywords: Cantor distribution; L-moments; order statistics; self-similarity.

1. The Cantor distribution

Lad and Taylor (1992) have defined a probability distribution based on the Cantor set. The distribution is
that of a random variable X defined by

X= &fj-lZi (1)
i=l

where the 2; are independent and identically distributed with Pr[Zi = 0] = 5, Pr[Zi = 1 - 4]= $.
Intuitively the distribution may be constructed as follows.
(1) Start with the unit interval [0, 11, and unit probability mass uniformly distributed over it.
(2) Delete the interval (4, 1 - 4) and rescale so that the total probability mass is still 1.
(3) Continue ad infinitum. At the (r + 1)th stage the probability mass is uniformly distributed over 2’
closed intervals each of length 4’: from each interval [p, q] delete the central 1 - 24 proportion of the
interval, i.e. the subinterval (p + $(q -p), q - cfdq -p)), and rescale so that the total probability mass is
still 1.
When 4 = $ this procedure yields the standard Cantor distribution, in which unit probability mass is
concentrated on those points in [0, l] whose ternary expansion contains only the digits 0 and 2.

2. Expected order statistics and L-moments

The cumulative distribution function of the Cantor distribution is mathematically awkward: it is


continuous and almost everywhere differentiable with derivative zero, but not constant. However, some

Correspondence to: Dr. J.R.M. Hosking, Mathematical Sciences Department, Thomas J. Watson Research Center, P.O. Box 218,
Yorktown Heights NY 10598, USA.

0167-7152/94/$07.00 0 1994 - Elsevier Science B.V. All rights reserved 161


SSDI 0167.7152(93)EOlOl-X
Volume 19, Number 2 STATISTICS & PROBABILITY LETTERS 21 January 1994

properties of the distribution can be derived quite simply. Lad and Taylor (1992) obtained expressions
for the moments of the Cantor distribution. Here we derive expressions for the expected values of the
order statistics of the distribution. From these the L-moments of the distribution can be calculated.
L-moments are quantities analogous to the moments but derived from expected values of linear
combinations of order statistics (Hosking, 1990).
As noted by Lad and Taylor (1992), the Cantor distribution with parameter 4 is self-similar in that the
distribution on each subinterval [O, $1 and [l - 4, 11 is a copy, at reduced scale, of the entire distribution.
Thus

with probability i,
with probability i,
(2)

where the distribution of Y is identical to that of X. A sample of size r drawn from the distribution will
be divided with k elements (0 G k G r) falling in the subinterval [O, $1 and r - k in the subinterval
[l - 4, 11. Thus the smallest element of the sample, X, :r, is distributed as

4Yr:, with probability 2-‘(L), k = 1, 2,. . . , r,


Xi:, = (3)
+Y, : r + 1 - 4 with probability 2-‘.
i
Thus

EX,:,=2_’ f$ i (;)EY,:,++EY,:.+l-+ ) (4)


i k=l 1

whence from the identity of the distributions of X and Y we obtain


r-1

(2’-2+)EX,:,=1-$+4 c (;)EX,:,. (5)


k=l

This equation is valid for r = 1, the sum being then vacuous, and yields EX,: I = i. It can be applied
recursively to obtain EX,: r for any r. We can further calculate the expected value of any order statistic
Xk : r, by successive application of the standard result

rEXk:,_, = (r-k)EX,,:, +kEXk+r:, (6)


(David, 1981, p. 46). Recursions similar to (5) and (6) can also be found for higher moments of X, :r.
Equations (5) and (6) can be used to calculate the L-moments, expectations of linear combinations of
order statistics that summarize the shape of a probability distribution analogously to the ordinary
moments (Hosking, 1990). The first four L-moments, which measure location, scale, skewness and
kurtosis respectively, are

h, = EXII1 = +,
A, = +(X2:, -Xl:,) = (I- 4)/(4 - 2+),
(7)
A, = +E( X3:, - 2X,:, +X1:3) = 0,
A,= :E(X,:, - 3Xx:4 + 3X,:, -Xi:,) = -(l - +)(l - 2$)/{(2 - #)(8 - 4)).

The skewness measure h, is zero since the distribution is symmetric. The dimensionless kurtosis measure
is r4 = h,/A, = -2(1 - 2+)/(8 - 4). In the limiting case 4 + $, we have r4 + 0: the Cantor distribution
tends to the uniform distribution, which has zero L-kurtosis. As (p + 0, we have r4 + - i: the Cantor
distribution approaches the distribution that has probability mass i at the points 0 and 1. For the
standard Cantor distribution with 4 = 4 we have TV= - 2/ 23 = - 0.086957.

162
Volume 19, Number 2 STATISTICS & PROBABILITY LETTERS 27 January 1994

-o.io\
0
I

10
7 ’ I

20
1 I I
40
I
50
7 3o
Fig. 1. L-moments of the Cantor($, $1 distribution.

As an example we have computed the first 50 L-moments of the standard Cantor distribution (4 = +).
Figure 1 shows the L-moment ratios 7r = h,/h,, I = 3, 4,. . . ,50.

3. A skew Cantor distribution

We can define a family of skew Cantor distributions by generalizing the constructions given above.
The intuitive construction is as before but at each stage the subinterval deleted is not the central
symmetric interval (4, 1 - 4) but the asymmetric interval (a, 1 - /I). At the (r + 1)th stage the probabil-
ity mass is uniformly distributed over 2’ closed intervals, of which (‘,) have length akprPk, k = 0, 1,. . . , r:
from each interval [p, q] delete the ((u, 1 - p) proportion of the interval, i.e. the subinterval (p + a(q -p),
q - p(q -P)), and rescale so that the total probability mass is still 1. The successive stages of the
construction are illustrated in Figure 2, for the case (Y= $, p = :.
Formally we define the Cantor(Lu, p) distribution, with parameters (Y and p satisfying (Y> 0, 0 > 0,
(Y+ p < 1, to be that of a random variable X defined by

i=l

mm II
mm HI II n nn no
IIll q nnu Ill nulu lull
-1
lullIll INn q n In q nlu 11111
uunl illlr lnllm mnll llrn
nurrrm--
Fig. 2. Successive approximations to the set on which the probability mass of the Cantor(cu, p) distribution is concentrated.

163
Volume 19, Number 2 STATISTICS & PROBABILITY LETTERS 21 January 1994

where the Zi are independent and identically distributed with Pr[Zi = 0] = (.w/((u + p), Pr[Zi = 1 - p] =
P/((Y + p), and the Wi are defined by IV, = 1,

CYry if Zi=O,
Y+, = pu/;.
( if Z, = 1-p. (9)

The Cantor%+, 4) distribution is the symmetric Cantor distribution defined in Section 1.


Expected order statistics can be found for the skew Cantor distribution. The same argument as before
yields the recursion
r-1
{(cI+p)r-LYr+l -p’+i}EX1:.=p’(l -p> + c (;;)Uk+lpr-%Xizk. (IO)
k=l

The first two L-moments of the distribution are

PC1-P>
A, =
a(l-(Y) +p(l-p) ’
a(1 -a)P(l -PI (11)
*2= (a(1 -a) +/3(1 -p)}{c@ -a) +P(I -P) +cG} .

Higher-order L-moments do not have a simple algebraic form. They can easily be calculated for given
values of cy and p, however. The L-skewness, 7s = A,/h,, and L-kurtosis TV have been calculated for
1000 ((Y, p) pairs chosen at random from the set (Y> p > 0, (Y+ p < 1. The condition (Y2 /3 means that
only distributions with positive skewness were simulated: this involves no loss of generality since if
X- Cantor(a, p) then 1 -X- Cantor@, a). The r3 and r4 values are plotted in Figure 3. It can be
seen that the values fall between two bounding curves. The lower curve is r4 = (57: - 1)/4, the lower
bound on r4 for all distributions (Hosking, 1990, (2.7)). The upper curve is TV= ~~(57, - 1)/(5 - T$, the
L-skewness-L-kurtosis relationship of the family of distributions with quantile function x(u) = up, p > 1.
This is apparently the upper bound for the possible (rl, TV) values of the Cantor(a, p) distribution,
though we do not have a formal proof of this conjecture.-

0.4 0.6
L-skewness, 79

Fig. 3. L-skewness and L-kurtosis values of 1000 skew Cantor distributions. The bounding curves are 74 = 73&X - 1)/(5 - TV) and
74 = (57; - 1)/4.

164
Volume 19, Number 2 STATISTICS & PROBABILITY LETTERS 27 January 1994

References

David, H.A. (19811, Order Sfatistics (Wiley, New York, 2nd Lad, F.R. and W.F.C. Taylor (19921, The moments of the
ed.). Cantor distribution, Sratist. Prob. Letf. 13, 307-310.
Hosking, J.R.M. (1990), L-moments: analysis and estimation
of distributions using linear combinations of order statis-
tics, .I. Roy. Statist. Sot. Ser. B 52, 105-124.

165

You might also like