Graf 1997 MN PDF

Math. Nachr.
183 (1997), 113-133
The Quantization of the Cantor Distribution

By S. GRAFof Passau, and H. LUSCHGYof Trier
(Received October 26, 1994)
Abstract. For a real- valued random variable whose distribution is the classical Cantor probabil-
ity, the n -quantization error and the n -optimal quantization rules are calculated for every natural
number n. Moreover, the connection between the rate of convergence of the logarithms of the quan-
tization errors for n going to infinity and the Hausdorff dimension of the Cantor set is indicated.
1. Introduction
Let P denote a probability measure on Rd. Fix n E IN and let Fn be the set
of all n-quantizers, i-e., the set of all Bore1 measurable maps f : Rd + ELd with
card (f(lRd)) 5 n. If X is a random variable in lRd with distribution P then, for each
f E Fn,.f(X)gives a quantized version of X . The quantization error is defined by
v, =
A quantizer f E .Fn is called an optimal n - quantizer for P if
VTl
The quantization problem is to find an optimal n-quantizer for P or at least an

asymptotically optimal sequence of n - quantieers as n -+ 00. This problem arose in
the context of coding speech and visual signals effectively (see ABUT[l]for an overview
of the relevant research articles). Most of the mathematical theory of quantization was
developed for probability measures P which are absolutely continuous (with respect to
Lebesgue measure) or discrete. In the present article we want to study the quantiza-
tion problem for the canonical probability measure on the classical Cantor set C C R
1991 Mathematics Subject Classification. 28 A 80.
Keywords and phrases. Quantization, Cantor distribution.
114 Math. Nachr. 183 (1997)
which is a rather simple example of a continuous singular distribution. We will calcu-

late the values V, and determine the optimal quantizers for the Cantor distribution.
For probability measures P on lRd with non - vanishing absolutely continuous part
lim n21dV, is finite and strictly positive (see BUCKLEW - WISE[2]). For the Cantor
n+w
measure we show that the sequence (n21PVn)nEIN is still bounded and bounded away
from 0, where /I= log 2/(log 3) denotes the HausdorfT dimension of the Cantor set C.
But (n2/PVn)nEIN does not converge: (n2/flVn),ENhas a non degenerate interval of
accumulation points (see Theorem 6.3). It is an interesting question whether this last
result is characteristic of (continuous) singular probability measures.
The considerations described above also lead to a new dimension formula for the
Cantor set (see Theorem 6.6), namely = -2
P'
This is a natural analogue to the result for probabiliy measures with non - vanishing
absolutely continuous part where the corresponding formula with p replaced by d
follows immediately from lim n2IdVnE 10, +00[ .
n--rw
2. Some basic facts about quantization

In this section P is an arbitrary probability measure on Rd and (1 11 denotes the
euclidean norm on Rd.
Proposition 2.1. If J 11z112dP(z) < 00, then
V, = inf
a
1min llz - a112 c i ~ ( z,)
aEa
where the infimum is taken over all subsets a of Rd with carda 5 n.

Proof. See, for instance, [3]. 0
Definition 2.2. A subset Q of Rd is called an optimal set of n -means for P if

carda 5 n a n d
V, = J min
a Ea
1
15 - a112 d ~ ( z ) .
Proposition 2.3. For every n 2 1, there is an optimal subset of n -means for P .

Proof. See, for instance, [3].
Definition 2.4. Given a finite subset a = { a l , . . . ,arn} C Rd we call a Bore1

measurable partition A1, .. . ,A, of IRd a Voronoi partition w. r. t. Q if
z E Rd I llz - ail1 = min 112 - all

aEa
for every i E (1,...,m}.

Graf/Luschgy, The Quantization of the Cantor Distribution 115
Remark 2.5. As a consequence of Proposition 2.1 the problem of finding an op-

timal n - quantizer for P is equivalent to the problem of finding an optimal subset of
n-means for P : If a = {al, .. . ,a,} is an optimal set of n-means for P and
A l , . . . , A , is a Voronoi partition w.r. t. a then f = C z l ailA, is an optimal
n-quantizer. Conversely, if f E F,,is an optimal n-quantizer then a = f (Rd)is an
optimal set of n - means for P.
It can be shown that for a continuous probability P an optimal set of n-means
always has exactly n elements. For d = 1and a finite set a = {al , . . . ,a,} c R = lRd
a Voronoi partition A1, .. , ,A , with respect to (Y always consists of intervals and half -
limes. Thus xEl ailAi is a monotone step function. The n - quantization problem on
IR is, therefore (up to modifications on P-nullsets), equivalent to finding all monotone
step function with at most n values that optimally approximate the identity of IR in
L2( P ).
Theorem 2.6. Let a be an optimal set of n -means and, for a E a, define
Then, for every a f a,

0) mw) > 0,
(ii) P(aM (a))= 0,
(iii) a = E ( X I X E M ( a ) ) .
Proof. See [3]. 0
3. An upper bound for the quantization error

F'rom now on P denotes the Cantor probability measure. The following is a well-
known characterization of P (see HUTCHINSON [5],Theorem 4.4(1) for a generalization
to self-similar measures) which provides the key tool for the results in the rest of the
paper.
Theorem 3.1. Let Sl, S2 :IR + R be defined by

1
1
S1x = - x
3
S2x = - x
and
3
+ -32
Then P as the unique Bore1 probability on R such that
P = -POSF1
1 + ,POS,-',
1
2
where P o 5'7' denotes the image measure of P with respect to Si (i = 1 , 2 ) .
Remark 3.2. Let S1 and SZ be as in Theorem 3.1. For u f {1,2}k with cr =

(GI,. . . ,a)
set S, = S
,, o oS,,, J, = S,([O, l ] ) and
, a, the midpoint of J,. The
116 Math. Nachr. 183 (1997)
sets (J,),,~(1,2}k are just the 2k intervals of length 1/(3k) in the k-th level of the
classical Cantor construction. The intervals Ju+l,Ju*2 into which J,, is split up at the
(k + 1) - th level are called the children of J,. The set C = nkEN
UaEt1,21k J, is the
Cantor set and equals the support of the Cantor measure P .
Corollary 3.3. Let f : IR -+R+ be Bore1 measurable and k E IN. Then
$
s fdP=
uE{1,2}~
/foS,dP.
Proof. The result follows from Theorem 3.1 by induction on k. 17
For later use we will need the expectation E ( P ) and variance V ( P )of P .
Lemma 3.4.
1 1
E(P)= - V(P)= v1 = -
2' 8'
and for fixed xo E IR,
/(x
( - -1)2 + -:
- z ~ ) ~ d P (=z ) zo I
P r o o f . (i) It follows from Theorem 3.1 that
E(P) = s x d P ( z )
1
= -E(P)
3
+ 31 .
This equation implies
1
E(P) = -
2
(ii) First we will calculate x2 dP(x).
Using 3.1 we obtain
This equation implies
J z2dP(z)= -3.
8
(iii) Using (i) and (ii) we deduce
3
V ( P ) = Jz2dP(z) - E(P)2 = -
8
- 1-4 --8-1'
+
(iv) Since [(z - Z O ) ~dP(z) = V ( P ) (E(P)
-~ 0 the
) last
~ equation in the lemma
holds. 0
Definition 3.5. For n E IN with n 2 1 let E(n)be the unique natural number with
-
2"") _< n < 2'((n)+'. For I C {l,2}'(n) with card(I) = n 2'(fl) let a,(I) be the set
consisting of all midpoints a, of intervals J, with u E (1,2}l(n)\I and of all midpoints
a,,~, au*2 of the children of J, with u E I. Formally,
%(I) = {a. I u E {1,2}1(4\I} u {a,*l I u E I} u {a,*2 I u E I}.

Remark 3.6. It is plausible and will be shown later that the sets a n ( I )are exactly
the optimal sets of n- means for P .
Proposition 3.7.
P r o o f . Obviously we have
It follows from Corollary 3.3 and Lemma 3.4 that, for every k E N and every
T E {l,qk>
118 Math. Nachr. 183 (1997)
Thus we deduce
s min l ~ - a ) ~ d ~ =( z-1). -
a€a,(I)
1 card ({1, 2}*(,)\1)
8 18i(n)
Corollary 3.8. For every n E IN,
P r o o f . Since card ( a n ( 1 ) ) = n it follows from Proposition 2.1 that
By Proposition 3.7 this implies the corollary. 0
R e m a r k 3.9. For integers n = 2l the upper bound takes the form t(i)'and
coincides with the upper bound given by BUCKLEW and WISE [2], p. 243.
4. On optimal sets of 2 - and 3-means

In this section we will determine the opitmal sets of n - means for n = 2 and 3. The
results and proofs for these two cases are the key for understanding the general case
and, moreover, contain the principal part of the proof of the general result. The first
step to this proof is the following lemma.
.
L e m m a 4.1. Let n 2 2 and let a = {al, .. ,a,} be an optimal set of n -means
for P and assume w. 1.0. g. a1 < a2 < --
< a,. Then
and, f o r j := max {i I ai 5 1/3},

(i) ai > 2 / 3 for all i > j + 1.
(ii) If aj+l E ] +,$[, then fr (aj + aj+l) < 1/3 and fr(aj+l + aj+2) > 2/3.
I n particular, an optimal set of 2 -means does not contain a point from ] $,$ [.
GrafILuschgy, The Quantization of the Cantor Distribution 119
Proof. Assume 1 / 3 5 a l . Then Corollary 3.8, Theorem 3.1 and Lemma 3.4 imply
-721 >-v z > v * = J ?n(z - a)2dP(z)

2
-!-
18
/(z - 3al)'dP(z)
Thus we get a contradiction and al < 1 / 3 is proved. In a similar fashion one can show
that a, > 213. By the preceding considerations we get
we therefore deduce
By moving aj+l to 1/3 and aj+z to 2/3 we could strictly reduce the quantization error
which is a contradiction to the optimality of a.
+ +
Since k # j , we have k = j 1 or k = j 2. In any case (i) is proved.
Now suppose that 1 / 3 < aj+l < 2/3. Since P ( M ( a j + l ) ) > 0 we know that
f(aj+l + +
a j ) < 1 / 3 or f ( a j + l aj+z) > 2/3. Assume that only one of these in-
equalities holds, say the first one. Then the quantization error can be strictly reduced
by moving aj+l to 1/3, which yields a contradiction to the optimality of a. Similarly
we can proceed in the second case. Thus statement (ii) follows. 0
For the following considerations we will need two more technical lemmas.
Lemma 4.2. Let F : ] 0 , 1 ]-+ lR be defined by
Then F is an non - decreasing continuous function with

120 Math. Nachr. 183 (1997)
(i) $1- ); = -
1 1- (1/6)n
5 (l-(l,2)n)
and
(ii) F (5 t) =
1
) ail t E I O , ~ ] .
~ ( t for
Proof. Since P is a continuous measure, F is continuous. To show that F is non-

+
decreasing let t E ]0,1] and h > 0 with t h <: 1 be given. Then
F(t + h) - F ( t )
Thus F is non - decreasing.

To prove (i) note that
and
P
([
0,l--
3 =1--
1
2n
(4-2) 1 2dP(z) = 2 J x dP(x)

[OJ- 4 k=l
[ - &r
1 > 1- *+4
for n E N with n 1.
Set
sk :=
[-
J +4
1 $7 ? 1 - &r
2 dP(x) .
Using Theorem 3.1 we obtain
(4.3)
For k = 1 the second integral equals 0 and by Lemma 3.4 we get

s1 = L J 1 1
(4.4) - z d P ( z ) = -.
2 3 12
For k 2 2 the first integral vanishes and
1 I-1 z+-< 2 1 - 3 k -1 '+$ 1
1-3k-' 3 3 -
is equivalent to
1 1
1 - - 3k-2 Iz5 1- F + 1V '
hence
(4.5)
sk
1
= -
2 J 1
-2
3
+ -32 d P ( 2 ) .
This implies
It is easy to see that
Thus (4.6) turns to

(4.7)
By induction (4.4) and (4.7) lead to
Using (4.8) and the definition of Sk the identity (4.2) turns to
1 zdP(z) = c s k
k=l
n
[0,l - j\.]
= 2 (;)k (1- (;)k-')]
(4.9)
-
k=l
5[
k=l
(;)k - 12 1"
"( s
- -1. 1 -(1/2)n 5
- -. 1 -(1/6)n
2 1-1/2 12 1 - 1 / 6
= 1- (a)" - f (I - (;)n).
122 Math. Nachr. 183 (1997)
Combining (4.9) with (4.1) and the definition of F yields

1 1 -(1/6)n
F(1- $) = - 2( 1 - (l/2).)
and (i) is proved.
To prove (ii) note that Theorem 3.1 implies
(4.10)
Using (4.10) and (4.11) in the definition of F yields

.(it) = s1F ( t )
and proves (ii).
Lemma 4.3. Let f : ] - 4, i] -+ R be defined by

1 1
f ( t ) = F (4 t n t ) .
Then f ( t ) > t for all t E ]- a, - 61.

Proof. For t < 0 the inequality t < f(t) is obvious.
For 0 5 t < 1/18 Lemma 4.2 implies
> t.
For 1/18 5 t < 19/162 Lemma 4.2 yields
1 1 1
Using C n ] 3 - $, 5 - [ = 0 and Lemma 4.2 we get
For 19/162 5 t < 1/6 - 2/34 Lemma 4.2 implies

f(t) 2 .(A+-.-)
4
1 19
2 162
= F(g) = .(A-2)
3 81
= F(A-A)
3 34
163 161
= AF(1-$)
3 = 1134 > -1134
= -
23 - -1 - -2 > t.
162 6 34
Thus the lemma is proved. 0
Proposition 4.4. If a is an optimal set of 3 -means then a n ] 3, $ [ = 0.

Proof. Using Remark 2.5 we know that a = {al,a2,a3}with a1 < a2 < a3. By
Lemma 4.1 we have a1 E [O,$] and a3 E [3,1]. We will show that a2 4 I+,$[.
Assume the contrary.
Case 1: a2 E [a,
$ [.
Lemma 4.1 implies
1 1
(4.12) -(a1 +a2) < -
2 3
hence
(4.13)
Since M(a1)f l [0,1] = [0, ;(a1 + a2)] it follows from Theorem 2.6(iii) and Lemma 4.2
that
= P ([o,
1
-1) 1 zdP(z) = F (f (a1 + a2)) 2 F (a + f a1) = f (al).
[o,*]
By Lemma 4.3 this implies

1 2
(4.14) a1 2 ---
6 34 '
Using (4.12) we deduce
2 1 2
(4.15) a2 < -3 - a 1 I Z + $
By Lemma 4.1 we have ;(a2 + as) > 2/3, hence
4 5
(4.16) a3 > --a2 2 - - -2
3 6 34
124 Math. Nachr. 183 (1997)
Let h : [0,1] -+ R be defined by

h(x) = min 1x -ail
11253
For 0 5 x 5 119 we have

1 2
(4.17) h(x) = Ix-aII 1 - - - - x 2 0.
6 34
1
(4.18) h(x) = m i n ( ( x - u l ( , Ix-az1) 2 x - -6 -> 0.

5 2
(4.19) h(x) = min(lx-azl, Ix-a31) 2 - - - - x 2 0
6 34
because (4.16) implies a3 - x 2 $ - $ - x 2 0 and (4.15) implies
1 2
- 5 2 2 5 - -2 -
> - x
6 34 3 - 6 34
Since Q = {al,a2,a3} is optimal, we have (using (4.17), (4.18) and (4.19) for the last
inequality)
Applying Corollary 3.3 to these integrals yields

+is1(z - g)' d P ( z )
We evaluate these integrals by Lemma 3.4 and obtain
(4.20)
v
3
2 & [(5- J + is; + (- f - a) + is; + - 5 ) '+ ;]
23 1 1 1 23 1
- &[5+(~)2+1+(~)2] 182.648 > - . - = 1 1675 52 182'

1
By Corollary 3.8 we have

v3 5. -1- - ( 14 - 3 + 5 ( 3 -12 ) ) = 182.z
1 5
18 8
so that (4.20) is a contradiction.
Case 2: a2 E Q, 3 [.
3
This leads to a contradiction in a similar way as case 1. 0
Lemma 4.5. Let n 2 2 and let a be an optimal set of n -means with a n ] $ [ = 0. i,

Setal = a n [ O , t ] , a ~ = a n [ $ , l ] ,a n d j = c a r d ( a l ) . Then3al ( =S;'(al)) i s a n
optimal set of j -means and 3az-2 ( = ST'(a2)) is an optimal set of ( n - j ) -means.
Moreover,
1
v, = -(I$ 18
+
Vn-j) .
P r o o f . By Lemma 4.1 we have a1 # 0 and a2 # 0. Since a = a1 u a2 we deduce
By the scaling properties of P (Theorem 3.1) the restriction of P to [0, equals i]

aP o S,' and the restriction of P to [$,11 equals i P o SF'. Using the definitions of
S1 and SZ this implies
(4.22) V, =
1
5 / zE(x - 3a)'dP(z) +-
18
/ min (z- (3a - 2 ) ) ' d P ( z ) .
a ~ a 2
If 3al is not an optimal set of j-means, then we could find a set P C IR with card (p) =
j and Jmin (z- b)2 dP(z) < min (z- 3a)2 dP(z). But then
bE/3 aEat
(ip)
U a2 is a set of
126 Math. Nachr. 183 (1997)
cardinality n with J min (z- a)2dP(z)< min (z- a)z d P ( z ) ,which contradicts
aE+rOUaz aEa
the optimality of a. Similarly 3a2 - 2 can be shown to be an optimal set of ( n - j ) -
means. Thus (4.22) implies V, = &(q + V,-j).
Proposition 4.6. (i) The only optimal set of 1-means is { ?j}. Moreover, V1 = i.
(ii) The only optimal set of 2 -means consists of the midpoints of [0, and91 [i,
11.
Moreover, V, = iV1.
(iii) There are two optimal sets of 3-means. One consists of the midpoints
of [0,$1, [g, 91 and [$, 11, the other of the midpoints of [0, 83, [i,
g], and 11.[i,
Moreover, V, = b(V1 V2). +
Proof. (i) is true since 1/2 is the expected value of P (Lemma 3.4).
3
(ii) An optimal set of 2 -means does not contain a point from ] $, [ (Lemma 4.1).
Thus Lemma 4.5 together with (i) yields that the set of midpoints of [0, and 91 [g, 11
is the only candidate for an optimal set of 2-means. By the existence result for
n-means (Proposition 2.1) it, therefore, is the only set of 2-means. Lemma 4.5 also
implies ~2 = &(vI +v I ) = iVi.
(iii) By Proposition 4.4 a n optimal set a of 3-means does not contain a point from
] 3 [. Set a1 = an [0, i]
i, and a2 = an [$,11. If card (a1)= 2 then 3al is an optimal
set of 2-means (Lemma 4.5). Hence (ii) implies that a1 consists of the midpoints of
[0, $1 and [g,93. Similarly, a2 consists of the midpoint of [$, 13. If card (a1)= 1then
we get the set of midpoints of [0,9], [$, g], and [$,1] for a. By Lemma 4.5 there
& +
are no other possibilities for an optimal set of 3 -means andV3 = (K &). Using
the last equality and Proposition 3.7 we see that the two sets are indeed optimal sets
of 3 - means. 0
5. Determination of the quantization error and the optimal

sets of n-means in the general case.
In this section we will compute V, and determine all optimal sets of n-means for
the Cantor probability P .
Proposition 5.1. Let n 2 2 and let a be an optimal set of n-means. Then

~ ~ 1n 2] ~
- , ~ [ - 8 .
P r o o f . For n < 4 this follows from Lemma 4.1 and Proposition 4.4. For n 2 4 let
j be as in Lemma 4.1. Assume aj+l E ] $ [. i,
Case 1: aj+l >_ 1/2.
Then
Since f ( a j + aj+l) < 1/3 by Lemma 4.1 we obtain
For x E [i,b] this implies
because x - aj > x - i> 0 and aj+l - x > I2 - 13 = T -> 2 - 6 2 0.

Combining (5.1) and (5.2) yields
(5.3) v, 1 J (x- $) d P ( x ) .
2
[#41
Applying Corollary 3.3 to the last integral we get
1 9
By Proposition 3.7 we have the contradiction f & 2 2 V, .

Case 2: aj+l < 1/2.
A contradiction can be obtained in a similar way (reflecting the situation at l / Z ) .
0
Theorem 5.2. Forn E N wath n 2 1 let l(n)E IN satisfy 2'(4 5 n < 2'(")+l. A set
a c IR is an optimal set of n -means if and only if there exists a subset I C (1, 2}1(n)
with card ( I ) = n - 2"(") and such that a consists of all midpoints of intervals J, with
u E {l,2}1(n)\I and all midpoints of the children of J, with u E I (i. e. a = ~ ~ ( 1 ) ) .
Moreover,
Proof. We will proceed by induction on n. For n < 4 the statement of the theorem
is proved in Proposition 4.6.
Suppose that the assertion of the theorem holds for all m < n, n 2 2. According to
Proposition 5.1 and Lemma 4.5 there exists a j E (1, . . . ,n - 1) with
1
(5.4) vn = + Vn-j).
i&(b
Without loss of generality we may assume j 2 n - j .
We will show that
128 Math. Nachr. 183 (1997)
Let k , l E IN be such that
(5.6) 2k 5 j < 2"+' and 2' 5 n - j < 2'+l.

Since j 2 n - j we have j 2 f n 2 21(n)-1and n - j 5 $n < 2'(n) hence
(5.7) k 2 l(n) - 1 and 1 5 l(n) - 1.
In particular we have
(5.8) k 2 1.
Using Corollary 3.8 and the induction hypothesis, (5.4) implies
(5.9)
By some easy reformulations this leads to
(5.10) 2 (17 - 8 &) 2 9'(n)-k--'
Since 71/2'(~)2 1, j / 2 k < 2, and (n - j)/2' < 2 we deduce

(5.11) 18 > g l ( n ) - k - l + gl(n)-l-l
Since k 5 l(n) we deduce

(5.12) l(n) - 2 5 1 .
Thus the possible choices for k and 1 are
a) k = l(n) - 1 and 1 = l(n) - 2;
b) k = l(n) - 1 and 1 = l(n) - 1;
c) k = l(n) and 1 = l(n) - 1;
d) k = l(n) and 1 = l(n) - 2;
For a) we obtain by a reformulation of (5.10)
l(n)-1
(n--j> L 2
hence 1 2 l(n) - 1, a contradiction.
For c) we obtain by a reformulation of (5.10) j 5 2'((").
Since 21(n)5 j < 2'(")+', this yields j = 2'(n).
For d) we obtain by a reformulation of (5.10)
Since 1 j/2r(n)we deduce

n -j >
- 2w-1
which is a contradiction, since
2w-2 <
- -j < 2w-1.
Thus we have proved (5.5).

From (5.5) we get
1
v, = -
18
(4+ V&)
(5.13)
Now let a be an optimal set of n-means. By Lemma 4.5 a1 := ST' (a n [ 0 , 5 ] )and

a2 := S;l(a n [i,
11) are optimal sets of j - resp. (n - j ) -means. According to the
induction hypothesis there exists a subset 11 C { 1,2}'(%)-' with card (11) = j- 2'(n)-1
and I 2 c {1,2}l(,)-l with card(I2) = n - j - 26((n)-1such that a1 = aj(I1) and
a2 = an-j(12). Then it is easy to check that, for
I = (7 E {1,2)'(4 : (71 = 1 A (72, . . . ,q(,))E 11) or (71 = 2 A (72, . . . ,q(%))

EI~}
we get
card1 = card11 + card 1 2 = j - 2'(n)-' +n -j - 2'(,)-l = n - 2'(,)
and
a = a,(I).
That every set a as described in the statement of the theorem is an optimal set of
n-means follows from (5.13) and Proposition 3.7. Thus Theorem 5.2 is proved.
6. Asymptotics for V, and the quantization dimension of the

Cantor distribution
In this section we will study the asymptotics of V,. In what follows p = is the
Hausdorff dimension of the Cantor distribution P which is the same as the Hausdorff
dimension of the Cantor set. As pointed out in the introduction we will show that
the sequence ( n j V,) does not converge. This contrasts the corresponding result
nElN
for absolutely continuous distributions, where lim n*V, exists. But first we will
n+m
introduce a new concept of dimension for distributions (see also ZADOR [7]).
130 Math. Nachr. 185 (1997)
Definition 6.1. The quantization dimension of a probability P is defined to be the

number
log n
lim -
n-co log ’(v;’2)
if the limit exists in R.
We will show that the quantization dimension of P equals P.
Lemma 6.2. Define f : [I, 21 + R by
Proof. (i) is obviously true since
8 + 4P 5 12 117
and
17 5 2 * ( 8 + 4 P ) .
(ii) A direct calculation shows f(1) = = f(2). Now
17
f’(x) = -za-1 -
36P
Thus
1 1
f‘(1) = --(17-(8+4p)) = -(9-4P) >0
360 36P
and
17 1 153 - 144 - 72P
36P 2
1 = < 0.
72P
Moreover, for z E [l,21,
1 17
f’(z) = 0 - ~ $ - ~ ( 1 7 - ( 8 + 4 P ) z ) = 0- x = -
36P 8+4P’
This implies that f has its maximum on [I, 21 at the point 17/(8 + 4p) and that
f([l,21) is as described in (ii). 0
Theorem 6.3. The set of accumulation points of the sequence (-$Vn)nEIN equals
where f is as in Lemma 6.2.

P r o o f . Let y E [i,
f(&)] be given. Then there exists an x E [l,21 with y = f(x).
Set nl = [x2'], where [x2'] denotes the greatest integer less than or equal to x 2'. Then
nl < n1+1,l(n1)= I , and there exists an x1 E [l,21 with n1 = 212'. Thus we obtain
1 8 1 7 8
= 82' (T - p)
Now
212' 5 x2' < 2'2' +1
hence
2 - 31 < 5' 5 x
and, therefore, lim xl = x.
1-00
Since f is continuous this implies
lim nlB V,, = f(x) = y .
1-00
Now let y be an accumulation point of the sequence (nPVn) . Then there exists
nElN
(
a subsequence nk8 V n k k) E I N with
lim nkP V,, = y .

k-m
Set lk = l(nk) and 2k = nk/(a'k). Then we have xk E [ 1 , 2 ] and

P
ngvnh = f(zk).
Now we choose a convergent subsequence ( X k , ) of (xk)and we obtain
y =
lim n 8
k400
k v n , = lim f(xk,) =
1-00
f( I-cx,
lim x k , ) E [i,j(m)].17
0
L e m m a 6.4. For all n E IN,

1 9
- < n P v < -.
72- "-8
132 Math. Nachr. 183 (1997)
Proof. For n E IN we have
Remark 6.5. For k E IN, let a k = {S,,(+): B E { 1 , 2 } k } be the (uniquely

determined) optimal set of 2k -means for the Cantor distribution (see Theorem 5.2)
and let
Qk = c P ( ( S c l R : 1 ~ - a l = m i n I ~ - b )
b€ak
aEffk
Then (V2k)'l2 = Iz(P,Qk),where 12 denotes the minimal L2-metric (see POLLARD

[6], GRAF- LUSCHGY[4]). Thus we obtain from Theorem 6.3
l2(P,Qk) = 0(3-k) aS k + 00.
Theorem 6.6. The quantization dimension of P equals the Hausdorff dimension p

of the Cantor set.
Proof. By Lemma 6.4 we have
Ln-8 5 V, 5 - n -8
72 8
hence
1 9
1
- log
1
- 1
- - log n 5 log v,1125. - log - - -1 log n .
2 72 p 2 8 P
This implies
- log n log n - log n
& log & - + log n
I -
log (v:12)
I & log - log n ' :+
Since
and
lim
n-+w + log-8 log- n
log n = B
we get
log n
lim - = P
"-+O0 log (VY")
Graf/Luschgy, T h e Quantization of t h e Cantor Distribution 133
and the theorem is proved. 0
Acknowledgements
We are indebted to the referees for valuable suggestions concerning an improvement of the
first version of this paper.
References
ABUT,H. (ED.): Vector Quantization. IEEE Press, New York 1990
BUCKLEW, J. A.. and Wise, G. L. : Multidimensional Asymptotic Quantization with r - t h Power
Distortion Measures, IEEE Trans. Inform. Theory 28 (1982), 239 - 247
GRAF,S., and LUSCHGY, H. : Foundations of Quantization for Random Vectors, Preprint No.
16,Angew. Mathematik und Informatik, Universitat Miinster 1994
GRAF,S., and LUSCHGY, H. : Consistent Estimation in the Quantization Problem for Random
Vectors, in: Transactions of the 12th Prague Conference on Information Theory, Statistical
Decission Functions, Random Processes, Acad. Sci. Czech. Republic & Charles Univ. Prague,
Prague 1994,pp. 84-87
HUTCHINSON,
J. : Ftactals and Self-Similarity. Indiana Univ. J. 30 (1981), 713-747
POLLARD,D.: Quantization and the Method of k-means, IEEE Trans. Inform. Theory 28
(1982), 199-205
ZADOR. P. L. : Asymptotic Quantization Error of Continuous Signals and the Quantization
Dimension, IEEE Trans. Inform. Theory 28 (1982), 139-249
Fakultat fGr Mathematik FB IV - Mathematik

und Informatik Universitat "bier
Universitat Passau Einsteinstr. 62
D - 94030 Passau D - 54286 Munster
Germany Germany

Graf 1997 MN PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Graf 1997 MN PDF

Uploaded by

Copyright:

Available Formats

Math. Nachr.

183 (1997), 113-133

The Quantization of the Cantor Distribution

(Received October 26, 1994)

A quantizer f E .Fn is called an optimal n - quantizer for P if

The quantization problem is to find an optimal n-quantizer for P or at least an

which is a rather simple example of a continuous singular distribution. We will calcu-

2. Some basic facts about quantization

Proposition 2.1. If J 11z112dP(z) < 00, then

where the infimum is taken over all subsets a of Rd with carda 5 n.

Definition 2.2. A subset Q of Rd is called an optimal set of n -means for P if

Proposition 2.3. For every n 2 1, there is an optimal subset of n -means for P .

Definition 2.4. Given a finite subset a = { a l , . . . ,arn} C Rd we call a Bore1

z E Rd I llz - ail1 = min 112 - all

for every i E (1,...,m}.

Remark 2.5. As a consequence of Proposition 2.1 the problem of finding an op-

Theorem 2.6. Let a be an optimal set of n -means and, for a E a, define

Then, for every a f a,

3. An upper bound for the quantization error

Theorem 3.1. Let Sl, S2 :IR + R be defined by

Remark 3.2. Let S1 and SZ be as in Theorem 3.1. For u f {1,2}k with cr =

Corollary 3.3. Let f : IR -+R+ be Bore1 measurable and k E IN. Then

Proof. The result follows from Theorem 3.1 by induction on k. 17

P r o o f . (i) It follows from Theorem 3.1 that

This equation implies

%(I) = {a. I u E {1,2}1(4\I} u {a,*l I u E I} u {a,*2 I u E I}.

Corollary 3.8. For every n E IN,

P r o o f . Since card ( a n ( 1 ) ) = n it follows from Proposition 2.1 that

By Proposition 3.7 this implies the corollary. 0

4. On optimal sets of 2 - and 3-means

and, f o r j := max {i I ai 5 1/3},

-721 >-v z > v * = J ?n(z - a)2dP(z)

Lemma 4.2. Let F : ] 0 , 1 ]-+ lR be defined by

Then F is an non - decreasing continuous function with

Proof. Since P is a continuous measure, F is continuous. To show that F is non-

Thus F is non - decreasing.

(4-2) 1 2dP(z) = 2 J x dP(x)

Using Theorem 3.1 we obtain

For k = 1 the second integral equals 0 and by Lemma 3.4 we get

It is easy to see that

Thus (4.6) turns to

Using (4.8) and the definition of Sk the identity (4.2) turns to

= 2 (;)k (1- (;)k-')]

Combining (4.9) with (4.1) and the definition of F yields

Using (4.10) and (4.11) in the definition of F yields

and proves (ii).

Lemma 4.3. Let f : ] - 4, i] -+ R be defined by

Then f ( t ) > t for all t E ]- a, - 61.

Using C n ] 3 - $, 5 - [ = 0 and Lemma 4.2 we get

For 19/162 5 t < 1/6 - 2/34 Lemma 4.2 implies

Proposition 4.4. If a is an optimal set of 3 -means then a n ] 3, $ [ = 0.

By Lemma 4.3 this implies

Let h : [0,1] -+ R be defined by

For 0 5 x 5 119 we have

For 213 5 x 5 719 we have

Applying Corollary 3.3 to these integrals yields

- &[5+(~)2+1+(~)2] 182.648 > - . - = 1 1675 52 182'

By Corollary 3.8 we have

Lemma 4.5. Let n 2 2 and let a be an optimal set of n -means with a n ] $ [ = 0. i,

P r o o f . By Lemma 4.1 we have a1 # 0 and a2 # 0. Since a = a1 u a2 we deduce

By the scaling properties of P (Theorem 3.1) the restriction of P to [0, equals i]

5. Determination of the quantization error and the optimal

Proposition 5.1. Let n 2 2 and let a be an optimal set of n-means. Then

%(I) = {a. I u E {1,2}1(4\I} u {a,l I u E I} u {a,2 I u E I}.