Professional Documents
Culture Documents
North-Holland
Abstract: We obtain recursive formulas for the expected values of order statistics from the Cantor distribution defined by Lad and
Taylor (Statist. Probab. Lett. 13 (1992) 307-310), and for a skew generalization of it. The formulas are used to calculate the
L-moments of the distribution.
Lad and Taylor (1992) have defined a probability distribution based on the Cantor set. The distribution is
that of a random variable X defined by
X= &fj-lZi (1)
i=l
where the 2; are independent and identically distributed with Pr[Zi = 0] = 5, Pr[Zi = 1 - 4]= $.
Intuitively the distribution may be constructed as follows.
(1) Start with the unit interval [0, 11, and unit probability mass uniformly distributed over it.
(2) Delete the interval (4, 1 - 4) and rescale so that the total probability mass is still 1.
(3) Continue ad infinitum. At the (r + 1)th stage the probability mass is uniformly distributed over 2’
closed intervals each of length 4’: from each interval [p, q] delete the central 1 - 24 proportion of the
interval, i.e. the subinterval (p + $(q -p), q - cfdq -p)), and rescale so that the total probability mass is
still 1.
When 4 = $ this procedure yields the standard Cantor distribution, in which unit probability mass is
concentrated on those points in [0, l] whose ternary expansion contains only the digits 0 and 2.
Correspondence to: Dr. J.R.M. Hosking, Mathematical Sciences Department, Thomas J. Watson Research Center, P.O. Box 218,
Yorktown Heights NY 10598, USA.
properties of the distribution can be derived quite simply. Lad and Taylor (1992) obtained expressions
for the moments of the Cantor distribution. Here we derive expressions for the expected values of the
order statistics of the distribution. From these the L-moments of the distribution can be calculated.
L-moments are quantities analogous to the moments but derived from expected values of linear
combinations of order statistics (Hosking, 1990).
As noted by Lad and Taylor (1992), the Cantor distribution with parameter 4 is self-similar in that the
distribution on each subinterval [O, $1 and [l - 4, 11 is a copy, at reduced scale, of the entire distribution.
Thus
with probability i,
with probability i,
(2)
where the distribution of Y is identical to that of X. A sample of size r drawn from the distribution will
be divided with k elements (0 G k G r) falling in the subinterval [O, $1 and r - k in the subinterval
[l - 4, 11. Thus the smallest element of the sample, X, :r, is distributed as
This equation is valid for r = 1, the sum being then vacuous, and yields EX,: I = i. It can be applied
recursively to obtain EX,: r for any r. We can further calculate the expected value of any order statistic
Xk : r, by successive application of the standard result
h, = EXII1 = +,
A, = +(X2:, -Xl:,) = (I- 4)/(4 - 2+),
(7)
A, = +E( X3:, - 2X,:, +X1:3) = 0,
A,= :E(X,:, - 3Xx:4 + 3X,:, -Xi:,) = -(l - +)(l - 2$)/{(2 - #)(8 - 4)).
The skewness measure h, is zero since the distribution is symmetric. The dimensionless kurtosis measure
is r4 = h,/A, = -2(1 - 2+)/(8 - 4). In the limiting case 4 + $, we have r4 + 0: the Cantor distribution
tends to the uniform distribution, which has zero L-kurtosis. As (p + 0, we have r4 + - i: the Cantor
distribution approaches the distribution that has probability mass i at the points 0 and 1. For the
standard Cantor distribution with 4 = 4 we have TV= - 2/ 23 = - 0.086957.
162
Volume 19, Number 2 STATISTICS & PROBABILITY LETTERS 27 January 1994
-o.io\
0
I
10
7 ’ I
20
1 I I
40
I
50
7 3o
Fig. 1. L-moments of the Cantor($, $1 distribution.
As an example we have computed the first 50 L-moments of the standard Cantor distribution (4 = +).
Figure 1 shows the L-moment ratios 7r = h,/h,, I = 3, 4,. . . ,50.
We can define a family of skew Cantor distributions by generalizing the constructions given above.
The intuitive construction is as before but at each stage the subinterval deleted is not the central
symmetric interval (4, 1 - 4) but the asymmetric interval (a, 1 - /I). At the (r + 1)th stage the probabil-
ity mass is uniformly distributed over 2’ closed intervals, of which (‘,) have length akprPk, k = 0, 1,. . . , r:
from each interval [p, q] delete the ((u, 1 - p) proportion of the interval, i.e. the subinterval (p + a(q -p),
q - p(q -P)), and rescale so that the total probability mass is still 1. The successive stages of the
construction are illustrated in Figure 2, for the case (Y= $, p = :.
Formally we define the Cantor(Lu, p) distribution, with parameters (Y and p satisfying (Y> 0, 0 > 0,
(Y+ p < 1, to be that of a random variable X defined by
i=l
mm II
mm HI II n nn no
IIll q nnu Ill nulu lull
-1
lullIll INn q n In q nlu 11111
uunl illlr lnllm mnll llrn
nurrrm--
Fig. 2. Successive approximations to the set on which the probability mass of the Cantor(cu, p) distribution is concentrated.
163
Volume 19, Number 2 STATISTICS & PROBABILITY LETTERS 21 January 1994
where the Zi are independent and identically distributed with Pr[Zi = 0] = (.w/((u + p), Pr[Zi = 1 - p] =
P/((Y + p), and the Wi are defined by IV, = 1,
CYry if Zi=O,
Y+, = pu/;.
( if Z, = 1-p. (9)
PC1-P>
A, =
a(l-(Y) +p(l-p) ’
a(1 -a)P(l -PI (11)
*2= (a(1 -a) +/3(1 -p)}{c@ -a) +P(I -P) +cG} .
Higher-order L-moments do not have a simple algebraic form. They can easily be calculated for given
values of cy and p, however. The L-skewness, 7s = A,/h,, and L-kurtosis TV have been calculated for
1000 ((Y, p) pairs chosen at random from the set (Y> p > 0, (Y+ p < 1. The condition (Y2 /3 means that
only distributions with positive skewness were simulated: this involves no loss of generality since if
X- Cantor(a, p) then 1 -X- Cantor@, a). The r3 and r4 values are plotted in Figure 3. It can be
seen that the values fall between two bounding curves. The lower curve is r4 = (57: - 1)/4, the lower
bound on r4 for all distributions (Hosking, 1990, (2.7)). The upper curve is TV= ~~(57, - 1)/(5 - T$, the
L-skewness-L-kurtosis relationship of the family of distributions with quantile function x(u) = up, p > 1.
This is apparently the upper bound for the possible (rl, TV) values of the Cantor(a, p) distribution,
though we do not have a formal proof of this conjecture.-
0.4 0.6
L-skewness, 79
Fig. 3. L-skewness and L-kurtosis values of 1000 skew Cantor distributions. The bounding curves are 74 = 73&X - 1)/(5 - TV) and
74 = (57; - 1)/4.
164
Volume 19, Number 2 STATISTICS & PROBABILITY LETTERS 27 January 1994
References
David, H.A. (19811, Order Sfatistics (Wiley, New York, 2nd Lad, F.R. and W.F.C. Taylor (19921, The moments of the
ed.). Cantor distribution, Sratist. Prob. Letf. 13, 307-310.
Hosking, J.R.M. (1990), L-moments: analysis and estimation
of distributions using linear combinations of order statis-
tics, .I. Roy. Statist. Sot. Ser. B 52, 105-124.
165