Professional Documents
Culture Documents
AbstractIn Shannon theory, lossless source coding deals with The central problem is to determine how many compressed
the optimal compression of discrete sources. Compressed sensing measurements are sufficient/necessary for recovery with
is a lossless coding strategy for analog sources by means of mul- vanishing block error probability as blocklength tends to
tiplication by real-valued matrices. In this paper we study almost
lossless analog compression for analog memoryless sources in an infinity [2][4].
information-theoretic framework, in which the compressor or de- Random coding is employed to show the existence of
compressor is constrained by various regularity conditions, in par- good linear encoders. In particular, when the random
ticular linearity of the compressor and Lipschitz continuity of the projection matrices follow certain distribution (e.g., stan-
decompressor. The fundamental limit is shown to be the informa- dard Gaussian), the restricted isometry property (RIP) is
tion dimension proposed by Rnyi in 1959.
satisfied with overwhelming probability and guarantees
Index TermsAnalog compression, compressed sensing, infor- exact recovery.
mation measures, Rnyi information dimension, Shannon theory,
On the other hand, there are also significantly different ingre-
source coding.
dients in compressed sensing in comparison with information
theoretic setups.
I. INTRODUCTION Sources are not modeled probabilistically, and the funda-
mental limits are on a worst case basis rather than on av-
A. Motivations From Compressed Sensing erage. Moreover, block error probability is with respect to
the distribution of the encoding random matrices.
components in the source does not exceed a certain threshold, version of the random vector. It characterizes the rate of growth
while we adopt a statistical model in which compressors work of the information given by successively finer discretizations
for most source realizations. In this sense, almost lossless of the space. Although a fundamental information measure, the
analog compression can be viewed as an information theoretic Rnyi dimension is far less well known than either the Shannon
framework for compressed sensing. Probabilistic modeling entropy or the Rnyi entropy. Rnyi showed that under certain
provides elegant results in terms of fundamental limit, as well conditions for an absolutely continuous -dimensional random
as sheds light on constructive schemes on individual sequences. vector the information dimension is . Hence, he remarked in
For example, not only is random coding a proof technique in [11] that the geometrical (or topological) and information-the-
Shannon theory, but also a guiding principle in modern coding oretical concepts of dimension coincide for absolutely contin-
theory as well as in compressed sensing. uous probability distributions. However, the operational role
Recently there have been considerably new developments of Rnyi information dimension has not been addressed before
in using statistical signal models (e.g., mixed distributions) except in the work of Kawabata and Dembo [12], which relates
in compressed sensing (e.g., [5][8]), where reconstruction it to the rate-distortion function. It is shown in [12] that when
performance is evaluated by computing the asymptotic error the single-letter distortion function satisfies certain conditions,
probability in the large-blocklength limit. As discussed in the rate-distortion function of a real-valued source scales
Section IV-B, the performance of those practical algorithms proportionally to as , with the proportionality con-
still lies far from the fundamental limit. stant being the information dimension of the source. This result
serves to drop the assumption of continuity in the asymptotic
B. Lossless Source Coding for Analog Sources tightness of Shannons lower bound in the low distortion regime.
Discrete sources have been the sole object in lossless data In this paper we give an operational characterization of Rnyi
compression theory. The reason is at least twofold. First, information dimension as the fundamental limit of almost loss-
nondiscrete sources have infinite entropy, which implies that less data compression for analog sources under various regu-
representation with arbitrarily small block error probability larity constraints of the encoder/decoder. Moreover, we con-
requires arbitrarily large rate. On the other hand, even if we sider the problem of lossless Minkowski dimension compres-
consider encoding analog sources by real numbers, the result is sion, where the Minkowski dimension of a set measures its de-
still trivial, as and have the same cardinality. Therefore, gree of fractality. In this setup we study the minimum upper
a single real number is capable of representing a real vector Minkowski dimension of high-probability events of source re-
losslessly, yielding a universal compression scheme for any alizations. This can be seen as a counterpart of lossless source
analog source with zero rate and zero error probability. coding, which seeks the smallest cardinality of high-probability
However, it is worth pointing out that the compression events. Rnyi information dimension turns out to be the funda-
method proposed above is not robust because the bijection be- mental limit for lossless Minkowski dimension compression.
tween and is highly irregular. In fact, neither the encoder
nor the decoder can be continuous [9, Exercise 6(c), p. 385]. D. Organization of the Paper
Therefore, such a compression scheme is useless in the pres- Notations frequently used throughout the paper are sum-
ence of any observation noise regardless of the signal-to-noise marized in Section II. Section III gives an overview of Rnyi
ratio (SNR). This disadvantage motivates us to study how to information dimension, a new interpretation in terms of entropy
compress not only losslessly but also gracefully in the real rate and discusses connections with rate-distortion theory.
field. In fact some authors have also noticed the importance of Section IV states the main definitions and results, as well as
regularity in data compression. In [10] Montanari and Mossel their connections with compressed sensing. Section V contains
observed that the optimal data compression scheme often ex- definitions and coding theorems of lossless Minkowski dimen-
hibits the following inconvenience: codewords tend to depend sion compression, which are important intermediate results for
chaotically on the data; hence, changing a single source symbol Sections VI and VII. New type of concentration-of-measure
leads to a radical change in the codeword. In [10], a source type of results are proved for memoryless sources, where it is
code is said to be smooth (resp., robust) if the encoder (resp., shown that overwhelmingly large probability is concentrated
decoder) is Lipschitz (see Definition 6) with respect to the on subsets of low (Minkowski) dimension. Section VI tackles
Hamming distance. The fundamental limits of smooth lossless the case of lossless linear compression, where achievability
compression are analyzed in [10] for binary sources via sparse results are given as well as a converse for mixed discrete-con-
graph codes. In this paper, we focus on sources in the real field tinuous sources. Section VII is devoted to lossless Lipschitz
with general distributions. Introducing a topological structure decompression, where we establish a general converse in terms
makes the nature of the problem quite different from traditional of upper information dimension, and its tightness for mixed
formulations in the discrete world, and calls for machinery discrete-continuous and self-similar sources. Some technical
from dimension theory and geometric measure theory. lemmas are proved in Appendixes IX.
C. Operational Characterization of Rnyi
Information Dimension II. NOTATIONS
In 1959, Alfrd Rnyi proposed an information measure for The major notations adopted in this paper are summarized as
random vectors in Euclidean space named information dimen- follows.
sion [11], through the normalized entropy of a finely quantized , for .
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3723
(1) (11)
(2) Define
(3)
(12)
. and
For
(4) (13)
(10) (16)
Define , which is not a norm It should be noted that there exist alternative definitions of
since for or . However, information dimension in the literature. For example, in [13],
is a valid metric on . the lower and upper information dimensions are defined by
Let . For a matrix , denote by replacing with the -entropy with respect to the
the submatrix formed by those columns of whose distance. This definition essentially allows unequal partition
indices are in . of the whole space and lowers the value of information dimen-
All logarithms in this paper are with respect to base . sion, since . However, the resulting definition
is equivalent (see Theorem 23). As an another example, the
III. RNYI INFORMATION DIMENSION following definition is adopted in [14, Def. 4.2]:
In this section, we give an overview of Rnyi information di-
mension and its properties. Moreover, we give a novel interpre- (17)
tation in terms of the entropy rate of the dyadic expansion of the
random variable. We also discuss the connection between infor- where denotes the distribution of and is the -ball
mation dimension and rate-distortion theory established in [12]. of radius centered at . This definition is equivalent to Defini-
3724 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
tion 1, as shown in Appendix I. Note that (17) can be generalized probability measure singular with respect to Lebesgue measure
to random variables on an arbitrary metric space. but with no atoms (singular part).
As shown in [11], the information dimension for the mixture
B. Characterizations and Properties of discrete and absolutely continuous distribution can be deter-
The lower and upper information dimension of a random vari- mined as follows.
able might not always be finite, because can be in-
Theorem 1 [11]: Let be a random variable such that
finity for all . However, as pointed out in [11], if the mild con-
is finite. Assume the distribution of can be repre-
dition is satisfied, we have
sented as
(18)
(26)
The necessity of this condition is shown in Proposition 1.
One sufficient condition for finite information dimension is where is a discrete measure, is an absolutely continuous
. Consequently, if for some measure, and . Then
, then . (27)
Proposition 1:
Furthermore, given the finiteness of and
(19) admits a simple formula
(20)
(28)
(21)
where is the Shannon entropy of is the differ-
If , then ential entropy of , and is
(22) the binary entropy function.
Proof: See [11, Th. 1 and 3] or [16, Th. 1, pp. 588592].
Proof: See Appendix II.
For -valued , (20) can be generalized to Some consequences of Theorem 1 are as follows. As long as
. :
To calculate the information dimension in (12) and (13), it is 1) is discrete: , and coincides with the
sufficient to restrict to the exponential subsequence , as Shannon entropy of .
a result of the following proposition. 2) is continuous: , and is equal to the
differential entropy of .
Proposition 2: 3) is discrete-continuous-mixed: , and is
the weighted sum of the entropy of discrete and continuous
(23)
parts plus a term of .
For mixtures of countably many distributions, we have the
(24) following theorem.
Proof: See Appendix II. Theorem 2: Let be a discrete random variable with
. If exists for all , then exists and
Similarly to the approach in Proposition 1, we have the is given by . More generally
following.
Proposition 3: and are unchanged if rounding or (29)
ceiling functions are used in Definition 1.
Proof: See Appendix II.
(30)
C. Evaluation of Information Dimension
By the Lebesgue decomposition theorem [15], a probability Proof: For any , the conditional distribution of
distribution can be uniquely represented as the mixture given is the same as . Then
(25) (31)
where ; is a purely atomic proba- where
bility measure (discrete part); is a probability measure abso-
lutely continuous with respect to Lebesgue measure, i.e., having
(32)
a probability density function (continuous part1); and is a
1In measure theory, sometimes a measure is called continuous if it does not
have any atoms, and a singular measure is called singularly continuous. Here Since , dividing both sides of (31) by and
we say a measure is continuous if and only if it is absolutely continuous. sending yields (29) and (30).
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3725
(42)
Thus, its information dimension is the entropy rate of its dyadic
expansion, or the entropy rate of any -ary expansion of , and for each . The usual Cantor distribution
divided by . [15] can be defined through the IFS in (40) and .
This interpretation of information dimension enables us to The next result gives the information dimension of a self-
gain more intuition about the result in Theorem 1. When has a similar measure with IFS satisfying the open set condition2
discrete distribution, its dyadic expansion has zero entropy rate. [18, p. 129], that is, there exists a nonempty bounded open set
When is uniform on , its dyadic expansion is indepen- , such that and for
dent identically distributed (i.i.d.) equiprobable, and therefore it .
has unit entropy rate in bits. If is continuous, but nonuniform,
its dyadic expansion still has unit entropy rate. Moreover, from Theorem 3 [17], [12]: Let the distribution of be a self-
(36), (37), and Theorem 1, we have similar measure generated from the stationary ergodic measure
on and the IFS with similarity
(38) ratios and invariant set . Then
(43)
where denotes the relative entropy and the differential
entropy is since a.s. The information When is the distribution of a memoryless process with
dimension of a discrete-continuous mixture is also easily un- common distribution , (43) is reduced to
derstood from this point of view, because the entropy rate of a
mixed process is the weighted sum of entropy rates of each com- (44)
ponent. Moreover, random variables whose lower and upper in-
formation dimensions differ can be easily constructed from pro- 2The open set condition is weaker than the previous strong separation
cesses with different lower and upper entropy rates. condition.
3726 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
Note that the open set condition implies that As a consequence of the monotonicity of Rnyi entropy, in-
. formation dimensions of different orders satisfy the following
Since , it follows that result.
Lemma 1: For and both decrease with .
(45) Define
(51)
In view of (44), we have
Then
(46) (52)
where is a subprobability measure. Since Proof: All inequalities follow from the fact that for a fixed
, we have , which agrees with random variable decreases with in .
Proposition 1. For dimension of order , we highlight the following result
from [21].
F. Connections With Rate-Distortion Theory
Theorem 4 [21, Th. 3]: Let be a random variable whose
The asymptotic behavior of the rate-distortion function, in distribution has Lebesgue decomposition as in (25). Then, we
particular, the asymptotic tightness of the Shannon lower bound have the following.
in the high-rate regime, has been addressed in [19] and [20] for 1) : if , that is, has a discrete component, we
continuous sources. In [12], Kawabata and Dembo generalized have .
it to real-valued sources that do not necessarily possess a den- 2) : if , that is, has a continuous component,
sity, and showed that the information dimension plays a cen- and , we have .
tral role. For completeness, we summarize the main results from The differential Rnyi entropy is defined using its
[12] in Appendix III. density as
(54)
Using codes over an infinite alphabet, any discrete source can for all sufficiently large , and the encoder and decoder
be compressed with zero rate and zero block error probability. are constrained according to Table I. Except for linear encoding
In other words, if both and are countably infinite, then for where , it is assumed that .
all In Definition 5, we have used the following definitions.
(56) Definition 6 (Hlder and Lipschitz Continuity): Let
and be metric spaces. A function is called
for any random process. -Hlder continuous if there exists such that for
any
B. Lossless Analog Compression With Regularity Conditions
In this section, we consider the problem of encoding analog (58)
sources with analog symbols, that is, and
or if bounded encoders are is called -Lipschitz if is -Hlder continuous. is
required, where denotes the Borel -algebra. As in the simply called Lipschitz (resp., -Hlder continuous) if is
countably infinite case, zero rate is achievable even for zero -Lipschitz (resp., -Hlder continuous) for some .
block error probability, because the cardinality of is the same We proceed to give results for each of the minimum -achiev-
for any [22]. This conclusion holds even if we require the en- able rates introduced in Definition 5. Motivated by compressed
coder/decoder to be Borel measurable, because according to Ku- sensing theory, it is interesting to consider the case where the
ratowskis theorem [23, Remark (i), p. 451] every uncountable encoder is restricted to be linear.
standard Borel space is isomorphic3 to . Therefore,
a single real number has the capability of encoding a real vector, Theorem 5 (Linear Encoding: General Achievability): Sup-
or even a real sequence, with a coding scheme that is both uni- pose that the source is memoryless. Then
versal and deterministic.
However, the rich structure of equipped with a metric (59)
topology (e.g., that induced by Euclidean distance) enables us
to probe the problem further. If we seek the fundamental limits for all , where is defined in (51). Moreover, we
of not only lossless coding but graceful lossless coding, have the following.
the result is not trivial anymore. In this spirit, our various 1) For all linear encoders (except possibly those in a set of
definitions share the basic information-theoretic setup where a zero Lebesgue measure on the space of real matrices),
random vector is encoded with a function block error probability is achievable.
and decoded with with such that 2) The decoder can be chosen to be -Hlder continuous for
and satisfy certain regularity conditions and the probability all , where is the compression
of incorrect reproduction vanishes as . rate.
Regularity in encoder and decoder is imposed for the sake Proof: See Section VI-C.
of both less complexity and more robustness. For example, al- Theorem 6 (Linear Encoding: Discrete-Continuous Mixture):
though a surjection from to is capable of lossless Suppose that the source is memoryless with a discrete-contin-
encoding, its irregularity requires specifying uncountably many uous mixed distribution. Then
real numbers to determine this mapping. Moreover, regularity
in encoder/decoder is crucial to guarantee noise resilience of the (60)
coding scheme.
3Two measurable spaces are isomorphic if there exists a measurable bijection for all .
whose inverse is also measurable. Proof: See Section VI-C.
3728 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
Theorem 7 (Linear Encoding: Achievability for Self-Similar For sources with a singular distribution, in general there is no
Sources): Suppose that the source is memoryless with a self- simple answer due to their fractal nature. For an important class
similar distribution that satisfies the open set condition. Then of singular measures, namely self-similar measures generated
from i.i.d. digits (e.g., generalized Cantor distribution), the in-
(61)
formation dimension turns out to be the fundamental limit for
for all . lossless compression with Lipschitz decoder.
Proof: See Section VI-C. Theorem 11 (Lipschitz Decoding: Achievability for Self-Sim-
In Theorems 5 and 6, it has been shown that block error prob- ilar Measures): Suppose that the source is memoryless and
ability is achievable for Lebesgue-a.e. linear encoder. There- bounded, and its -ary expansion consists of independent iden-
fore, choosing any random matrix with i.i.d. entries distributed tically distributed digits. Then
according to some absolutely continuous distribution on (e.g.,
a Gaussian random matrix) satisfies block error probability al- (65)
most surely.
Now, we drop the restriction that the encoder is linear, al- for all . Moreover, if the distribution of each bit is
lowing very general encoding rules. Let us first consider the case equiprobable on its support, then (65) holds for
where both the encoder and decoder are constrained to be con- Proof: See Section VII-D.
tinuous. It turns out that zero rate is achievable in this case.
Example 1: As an example, we consider the setup in Theorem
Theorem 8 (Continuous Encoder and Decoder): For general 11 with and , where . The
sources associated invariant set is the middle third Cantor set [18]
and is supported on . The distribution of , denoted by ,
(62)
is called the generalized Cantor distribution [25]. In the ternary
for all . expansion of , each digit is independent and takes value and
Proof: Since and are Borel iso- with probability and respectively. Then, by Theorem 11,
morphic, there exist Borel measurable and for any . Furthermore, when
, such that . By Lusins theorem [24, coincides with the uniform distribution on , i.e., the
Th. 7.10], there exists a compact set such that re- standard Cantor distribution. Hence, we have a stronger result
stricted on is continuous and . Since is that , i.e., exact lossless compression can
compact and is injective on is a homeomorphism from be achieved with a Lipschitz continuous decompressor at the
to . Hence, is continuous. Since both rate of the information dimension.
and are closed, by Tietze extension theorem [9], and can
be extended to continuous and Let . Then, is the inverse
, respectively. Using and as the new encoder and de- of on . Due to the -Lipschitz continuity of is an
coder, the error probability satisfies expansive mapping, that is
.
(66)
Employing similar arguments as in the proof of Theorem 8,
we see that imposing additional continuity constraints on the
encoder (resp., decoder) has almost no impact on the funda- Note that (66) implies the injectivity of , a necessary con-
mental limit (resp., ). This is because a continuous dition for decodability. Moreover, not only does assign
encoder (resp., decoder) can be obtained at the price of an arbi- different codewords to different source symbols, but also it
trarily small increase of error probability, which can be chosen keeps them sufficiently separated proportionally to their dis-
to vanish as grows. tance. Therefore, the encoder respects the metric structure
Theorems 911 deal with Lipschitz decoding in Euclidean of the source alphabet.
spaces. We conclude this section by introducing stable decoding, a
weaker condition than Lipschitz continuity.
Theorem 9 (Lipschitz Decoding: General Converse): Sup-
pose that the source is memoryless. If , then Definition 7 ( -Stable): Let and be
metric spaces and . is called -stable
(63) on if for all
for all .
Proof: See Section VII-B. (67)
Theorem 12 ( -Stable Decoding): Let the underlying metric for all with and for all in sup-
be the distance. Suppose that the source is memoryless. ported on . Then
Then, for all
(72)
(68)
By Theorem 13, using (70) as the decoder, the -norm of the
that is, the minimum -achievable rate such that for all suffi- decoding error is upper bounded proportionally to the -norm
ciently small there exists a -stable coding strategy is given of the noise.
by . In our framework, a stable or Lipschitz continuous coding
scheme also implies robustness with respect to noise added at
C. Connections With Compressed Sensing the input of the decompressor, which could result from quan-
As an application of Theorem 6, we consider the following tization, finite wordlength or other inaccuracies. For example,
source distribution: suppose that the encoder output is quantized by
a -bit uniform quantizer, resulting in . With a -stable
(69) coding strategy , we can use the following decoder. De-
note the following nonempty set:
where is the Dirac measure with atom at and
is an absolutely continuous distribution. This is the model for (73)
linearly sparse signals used in [5] and [6], where a universal4
iterative thresholding decoding algorithm is proposed. Under where . Pick any
certain assumptions on , the asymptotic error probability turns in and output . Then, by the stability of
out to exhibit a phase transition [5], [6]: there is a sparsity-de- , we have
pendent threshold on the measurement rate above which
the error probability vanishes and below which the error prob- (74)
ability goes to one. This behavior is predicted by Theorem 6,
which shows the optimal threshold is , irrespective of the prior i.e., each component in the decoder output will suffer at most
. Moreover, the decoding algorithm in the achievability proof twice the inaccuracy of the decoder input. Similarly, an -Lips-
of Section VI-C is universal and robust (Hlder continuous), al- chitz coding scheme with respect to distance incurs an error
though it has exponential complexity. The threshold is not no more than .
given in closed form (in [5, eq. (5) and Fig. 1]), but its numerical
evaluation shows that it lies far from the optimal threshold ex-
V. LOSSLESS MINKOWSKI-DIMENSION COMPRESSION
cept in the nonsparse regime ( close to 1). Moreover, it can be
shown that as . The performance of several As a counterpart to lossless data compression, in this section,
other suboptimal factor-graph-based reconstruction algorithms we investigate the problem of lossless Minkowski dimension5
is analyzed in [7]. Practical robust algorithms that approach the compression for general sources, where the minimum -achiev-
fundamental limit of compressed sensing given by Theorem 6 able rate is defined as . This is an important intermediate
are not yet known. tool for studying fundamental limits of lossless linear encoding
Robust reconstruction is of great importance in the theory and Lipschitz decoding. Bridging the three compression frame-
of compressed sensing [26][28], since noise resilience is an works, in Sections VI-C and VII-B, we prove the following
indispensable property for decompressing sparse signals from inequality:
real-valued measurements. For example, consider the following
robustness result. (75)
Theorem 13 [26]: Suppose we wish to recover a vector
Hence, studying provides an achievability bound for loss-
from noisy compressed linear measurements ,
less linear encoding and a converse bound for Lipschitz de-
where and . Let be a solution
coding. We present bounds for for general sources, as
of the following -regularization problem:
well as tight results for discrete-continuous mixed and self-sim-
ilar sources.
(70)
A. Minkowski Dimension of Sets in Metric Spaces
Let satisfy , where is the In fractal geometry, the Minkowski dimension is a way of
-restricted isometry constant of matrix , defined as the determining the fractality of a subset in metric spaces.
smallest positive number such that
Definition 8 (Covering Number): Let be a nonempty
(71) bounded subset of the metric space . For , define
4The decoding algorithm is universal if it requires no knowledge of the prior 5Also known as MinkowskiBouligand dimension, fractal dimension or box-
distribution of nonzero entries. counting dimension.
3730 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
, the -covering number of , to be the smallest number it intersects, hence justifying the name of box-counting dimen-
of -balls needed to cover , that is sion. Denote by the smallest number of mesh cubes
of size that covers , that is
(76)
(81)
(82)
Definition 9 (Minkowski Dimensions): Let be a nonempty (83)
bounded subset of metric space . Define the lower and
upper Minkowski dimensions of as
Lemma 3: Let be a bounded subset in
. The Minkowski dimensions satisfy
(77)
(84)
(78)
(85)
(79) (88)
for .
(80) Proof: See Section V-C.
For the special cases of discrete-continuous-mixed and self-
The following lemma shows that in Euclidean spaces, without similar sources, we have the following tight results.
loss of generality we can restrict attention to covering with
Theorem 15: Suppose that the source is memoryless with a
mesh cubes defined in (5). Since all the mesh cubes partition
discrete-continuous mixed distribution. Then
the whole space, to calculate lower or upper Minkowski dimen-
sion of a set, it is sufficient to count the number of mesh cubes (89)
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3731
for all , where is the weight of the continuous part block length. This has been shown for rates above entropy in
of the distribution. If is finite, then [29, Exercise 1.2.7, pp. 4142] via a combinatorial method. A
unified proof can be given through the method of Rnyi entropy,
(90) which we omit for conciseness. The idea of using Rnyi entropy
to study lossless source coding error exponents was previously
Theorem 16: Suppose that the source is memoryless with a
introduced by [30][32].
self-similar distribution that satisfies the strong separation con-
Lemma 5 deals with universal lower bounds on the source
dition. Then
coding error exponents, in the sense that these bounds are inde-
(91) pendent of the source distribution. A better bound on
has been shown in [33]: for
for all .
(100)
Theorem 17: Suppose that the source is memoryless and
bounded, and its -ary expansion consists of independent However, the proof of (100) was based on the dual expression
digits. Then (96) and a similar lower bound of random channel coding error
(92) exponent due to Gallager [34, Exercise 5.23], which cannot be
applied to . Here we give a common lower bound on
for all . both exponents, which is a consequence of Pinskers inequality
[35] combined with the lower bound on entropy difference by
When can take any value in for all
variational distance [29].
. Such a source can be constructed using Theorem 15
as follows: Let the distribution of be a mixture of a contin- Proof of Theorem 14: (Converse) Let and
uous and a discrete distribution with weights and respec- abbreviate as . Suppose for some .
tively, where the discrete part is supported on and has infinite Then, for sufficiently large there exists , such that
entropy. Then, by Proposition 1 and Theorem 1, but and .
by Theorem 15. First we assume that the source has bounded support, that is,
a.s. for some . By Proposition 2, choose
C. Proofs such that for all
Before showing the converse part of Theorem 14, we state
two lemmas which are of independent interest in conventional (101)
lossless source coding theory. By Lemma 3, for all , there exists , such that for all
Lemma 4: Assume that is a discrete memo-
ryless source with common distribution on the alphabet .
. Let . Denote by the block (102)
error probability of the optimal -code. (103)
Then, for any
Then, by (101), we have
(93)
(104)
(94)
Note that is a memoryless source with alphabet size at
where the exponents are given by most . By Lemmas 4 and 5, for all , for all
such that , we have
(95)
(105)
(96)
(106)
(97)
Choose and so large that the right-hand side of (106) is less
(98) than . In the special case of , (106) contradicts
(103) in view of (104).
Lemma 5: For Next we drop the restriction that is almost surely bounded.
Denote by the distribution of . Let be so large that
(107)
(99)
Denote the normalized restriction of on by , that
is
Proof: See Appendix VII.
Lemma 4 shows that the error exponents for lossless source
(108)
coding are not only asymptotically tight, but also apply to every
3732 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
and the normalized restriction of on by . Then Suppose for now that we can construct a sequence of subsets
, such that for any the following holds:
(109)
(122)
By Theorem 2
(123)
(110)
(111) Denote
(117) (129)
where (120), (117), and (118) follow from (114), (79), and (80), (130)
respectively. But now (120) and (116) contradict the converse
part of Theorem 14 for the bounded source , which we have
already proved. (131)
(Achievability) Recall that
(132)
(121)
(133)
We show that for all , for all , there exists
such that for all , there exists with (134)
and . Therefore, 6Countable subadditivity fails for upper Minkowski dimension. Had it been
readily follows. satisfied, we could have picked to achieve .
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3733
where (134) follows from the fact that By (142), for sufficiently large .
are i.i.d. and the joint Rnyi entropy is Decompose as
the sum of individual Rnyi entropies. Hence
(145)
where
(135)
Choose such that , which is guaranteed
(146)
to exist in view of (121) and the fact that is nonincreasing in
according to Lemma 1. Then
Note that the collection of all is countable, and thus we
may relabel them as . Then, . Hence,
there exists and , such that ,
where
(136)
(147)
(137)
(138)
Then
Hence, decays at least exponentially with
. Accordingly, (123) holds, and the proof of is (148)
complete.
Next we prove for the special case of discrete- (149)
continuous mixed sources. The following lemma is needed in
the converse proof. where (148) is by Lemma 2, and (149) follows from
for each . This proves the -achiev-
Lemma 6 [36, Th. 4.16]: Any Borel set whose upper ability of . By the arbitrariness of .
Minkowski dimension is strictly less than has zero Lebesgue (Converse) Since , we can assume . Let
measure. be such that . Define
Proof of Theorem 15: (Achievability) Let the distribution
of be (150)
(140) (152)
(142)
(153)
where the generalized support of vector is defined as
(144) (154)
3734 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
If implies that has positive Lebesgue By the same arguments that lead to (127), the sets defined in
measure. Thus, . By the arbitrariness of , the (125) satisfy
proof of is complete.
(164)
Next we prove for self-similar sources under
the same assumption of Theorem 3. This result is due to the
stationarity and ergodicity of the underlying discrete process (165)
that generates the analog source distribution.
Proof of Theorem 16: By Theorem 3, is finite. There- (166)
fore, the converse follows from Theorem 14. To show achiev-
Since , this shows the -achiev-
ability, we invoke the following definition. Define the local di-
ability of .
mension of a Borel measure on as the function (if the limit
Next we proceed to construct the required . Applying
exists) [17, p. 169]
Lemma 4 to the DMS and blocklength yields
(155) (167)
Denote the distribution of by the product measure , By the assumption that is bounded, without loss of generality,
which is also self-similar and satisfies the strong separation the- we shall assume a.s. Therefore, the alphabet size of
orem. By [17, Lemma 6.4(b) and Prop. 10.6] is at most . Simply applying (100) to yields
(156) (168)
holds for -almost every . Define the sequence of random which does not grow with and cannot suffice for our purpose
variables of constructing . Exploiting the structure that consists
of independent bits, in Appendix VIII, we show a much better
(157) bound
1) For all linear encoders (except possibly those in a set of the decoder. This is because no matrix acts injectively on
zero Lebesgue measure on the space of real matrices), . To see this, introduce the notation
block error probability is achievable.
2) For all and (174)
Consequently, in view of Theorem 18, the results on On the other hand, is sufficient to linearly compress all
for memoryless sources in Theorems 1416 yield the achiev- -sparse vectors, because (175) holds for Lebesgue-a.e.
ability results in Theorems 57, respectively. Hlder exponents matrix . To see this, choose to be a random matrix with
of the decoder can be found by replacing in (172) by its i.i.d. entries according to some continuous distribution (e.g.,
respective upper bound. Gaussian). Then, (190) holds if and only if all subma-
For discrete-continuous sources, the achievability in Theorem trices formed by columns of are invertible. This is an al-
6 can be shown directly without invoking the general result in most sure event, because the determinant of each of the
Theorem 18. See Remark 3. From the converse proof of The-
orem 6, we see that effective compression can be achieved with submatrices is an absolutely continuous random variable. The
linear encoders, i.e., , only if the source distribution sufficiency of is a bit stronger than the result in Re-
is not absolutely continuous with respect to Lebesgue measure. mark 1, which gives . For an explicit construction
of such a matrix, we can choose to be the matrix
Remark 1: Linear embedding of low-dimensional subsets in (see Appendix IV).
Banach spaces was previously studied in [37][39], etc., in a
nonprobabilistic setting. For example, [38, Th 1.1] showed that: B. Auxiliary Results
for a subset of a Banach space with , there
Let . Denote by the Grassmannian man-
exists a bounded linear function that embeds into . Here in
ifold [25] consisting of all -dimensional subspaces of . For
a probabilistic setup, the embedding dimension can be improved
, the orthogonal projection from to defines
by a factor of two.
a linear mapping of rank . The technique
Following the idea in the proof of Theorem 18, we obtain a we use in the achievability proof of linear analog compression is
nonasymptotic result of lossless linear compression for -sparse to use the random orthogonal projection as the encoder,
vectors, which is relevant to compressed sensing. where is distributed according to the invariant probability
measure on , denoted by [25]. The relationship be-
Corollary 1: Denote the collection of all -sparse vectors in
tween and the Lebesgue measure on is shown in the
by
following lemma.
Lemma 7 [25, Exercise 3.6]: Denote the rows of a
(173) matrix by , the row span of by , and
the volume of the unit -ball in by . Set
Let be a -finite Borel measure on . Then, given any
, for Lebesgue-a.e. real matrix , there exists a (176)
Borel function , such that for
-a.e. . Moreover, when is finite, for any and Then, for measurable, i.e., a collection of -di-
, there exists a matrix and , mensional subspaces of
such that and is
-Hlder continuous. (177)
Remark 2: The assumption that the measure is -finite is
essential, because the validity of Corollary 1 hinges upon Fu- The following result states that a random projection of a given
binis theorem, where -finiteness is an indispensable require- vector is not too small with high probability. It plays a central
ment. Consequently, if is the distribution of a -sparse random role in estimating the probability of bad linear encoders.
vector with uniformly chosen support and Gaussian distributed Lemma 8 [25, Lemma 3.11]: For any
nonzero entries, we conclude that all -sparse vectors can be
linearly compressed except for a subset of zero measure under
(178)
. On the other hand, if is the counting measure on , Corol-
lary 1 no longer applies because that is not -finite. In fact,
if , no linear encoder from to works for every To show the converse part of Theorem 6, we will invoke the
-sparse vector, even if no regularity constraint is imposed on Steinhaus theorem as an auxiliary result.
3736 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
Lemma 9 (Steinhaus [40]): For any measurable set We show that for all .
with positive Lebesgue measure, there exists an open ball cen- To this end, let
tered at contained in .
(188)
Lastly, with the notation in (174), we give a characterization
of the fundamental limit of lossless linear encoding as follows. Define
The proof is omitted for conciseness. (189)
Lemma 10: is the infimum of such that for (190)
sufficiently large , there exists a Borel set and a Then
linear subspace of dimension at least , (191)
such that
(179) Observe that implies that
C. Proofs (192)
Proof of Theorem 18: We first show (171). Fix Therefore, for all but a finite number of s if and only
arbitrarily. Let if
and . We show that there exists a matrix (193)
of rank and Borel measur-
able such that for some .
Next we show that for all but a finite number of
(180) s with probability one. Cover with -balls.
The centers of those balls that intersect are denoted by
for sufficiently large . . Pick .
By definition of , there exists such that for all Then, , hence
, there exists a compact such that cover . Suppose ,
and . Given an encoding matrix , then for any
define the decoder as
(194)
(181) (195)
where the is taken componentwise7. Since is (196)
closed and is compact, is compact. Hence,
where (195) follows because is an orthogonal projection.
is well defined.
Thus, for all implies that
Next consider a random orthogonal projection matrix
. Therefore, by the union bound
independent of , where is a random
-dimensional subspace distributed according to the invariant
measure on . We show that for all (197)
(182)
By Lemma 8
which implies that there exists at least one realization of that
satisfies (180). To that end, we define
(198)
(183)
(199)
and use the union bound
(184) (200)
(187)
(202)
7Alternatively, we can use any other tie-breaking strategy as long as Borel
measurability is satisfied. (203)
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3737
(205)
(216)
In view of (187) Integrating (216) with respect to on and by Fu-
binis theorem, we have
(206)
(207)
(220)
(210)
Recall from (188) that . By the arbitrari-
Recalling defined in (176), we have ness of , (172) holds.
Remark 3: Without recourse to the general result in Theorem
18, the achievability for discrete-continuous sources in Theorem
(211) 6 can be proved directly as follows. In (184), choose
(212)
(221)
(213)
and consider whose entries are i.i.d. standard Gaussian (or
where: any other absolutely continuous distribution on ). Using linear
(211): by (209) and since holds algebra, it is straightforward to show that the second term in
Lebesgue-a.e.; (184) is zero. Thus, the block error probability vanishes since
(212): by Lemma 7; .
(213): by (206).
Observe that (213) implies that for any Finally, we complete the proof of Theorem 6 by proving the
converse.
(214) Converse Proof of Theorem 6: Let the distribution of be
defined as in (139). We show that for any .
Since , assume . Fix an arbitrary .
Since , in view of (209) and (214), we con- Suppose is -achievable. Let and
clude that (207) holds Lebesgue-a.e. . By Lemma 10, for sufficiently large ,
Finally, we show that for any , there exists a sequence of there exist a Borel set and a linear subspace ,
matrices and -Hlder continuous decoders that achieves com-
pression rate and block error probability . Since for 8 denotes the restriction of on the subset .
3738 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
to the counting measure. For gives a nontrivial a much stronger condition than the existence of information di-
measure for sets of Hausdorff dimension in , because if mension. Obviously, if a probability measure is -rectifiable,
; if . then .
As an example, consider and . Let be
the middle-third Cantor set in the unit interval, which has zero B. Converse
Lebesgue measure. Then, and [18,
2.3]. In view of the lossless Minkowski dimension compression
results developed in Section V, the general converse in Theorem
Definition 12 (Rectifiable Sets [42, 3.2.14]): is 9 is rather straightforward. We need the following lemma to
called -rectifiable if there exists a Lipschitz mapping from complete the proof.
some bounded set in onto .
Lemma 13: Let be -rectifiable. Then
Definition 13 (Rectifiable Measures [25, Definition 16.6]):
Let be a measure on . is called -rectifiable if (241)
and there exists a -a.s. set that is -rectifiable.
Several useful facts about rectifiability are presented as Proof: See Appendix IX.
follows. Proof of Theorem 9: Lemma 13 implies the following gen-
Lemma 11 [42]: eral inequality:
1) An -rectifiable set is also -rectifiable for .
2) The Cartesian product of an -rectifiable set and an -rec- (242)
tifiable set is -rectifiable.
3) The finite union of -rectifiable sets is -rectifiable. If the source is memoryless and , then it follows
4) Countable sets are -rectifiable. from Theorem 14 that .
Using the notion of rectifiability, we give a sufficient condi-
tion for the -achievability of Lipschitz decompression by the C. Achievability for Finite Mixture
following lemma. We first prove a general achievability result for finite mix-
Lemma 12: if there exists a sequence of -rec- tures, a corollary of which applies to discrete-continuous mixed
tifiable sets with distributions in Theorem 10.
Theorem 20 (Achievability of Finite Mixtures): Let the dis-
(238) tribution of be a mixture of finitely many Borel probability
measures on , i.e.,
for all sufficiently large .
Proof: See Appendix V.
(243)
Definition 14 ( -Dimensional Density [25, Def. 6.8]): Let
be a measure on . The -dimensional upper and lower den-
sities of at are defined as where is a probability mass function. If is
-achievable with Lipschitz decoders for ,
then is -achievable for with Lipschitz decoders, where
(239)
(244)
(240)
(264)
(258)
then
(265)
(259)
Denoting for brevity
(260)
(261) (266)
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3741
(296)
Combining (294) and (296) yields Proof of Proposition 2: Fix any and , such
that . By Lemma 16, we have
(297)
(313)
By Proposition 2, sending and yields (290). (314)
3744 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
(315)
(325)
(326)
(317)
(327)
(318)
(319) where is the single-letter rate-distortion function of
with distortion function . Then,
The same bound works for rounding. under the equivalence of the metric and the -norm as in (328),
the rate-distortion dimension coincides with the information di-
mension of .
APPENDIX III Theorem 23 [12, Prop. 3.3]: Consider the metric space
INFORMATION DIMENSION AND RATE-DISTORTION THEORY . If there exists , such that for all
The asymptotic tightness of the Shannon lower bound in the
high-rate regime is shown by the following result. (328)
Theorem 22 [20]: Let be a random variable on the normed then
space with a density such that . Let the
distortion function be with , and the (329)
single-letter rate-distortion function is given by
(330)
(320)
Moreover, (329) and (330) hold even if the -entropy
instead of the rate-distortion function is used in the defini-
Suppose that there exists an such that . tion of and .
Then, and
In particular, consider the special case of scalar source and
(321) MSE distortion . Then, whenever exists
and is finite
where the Shannon lower bound takes the following
form: (331)
(323)
APPENDIX IV
Absolute distortion and scalar source: INJECTIVITY OF THE COSINE MATRIX
and . Then We show that the cosine matrix defined in Remark 2 is injec-
tive on . We consider a more general case. Let
(324) and . Let be an matrix where
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3745
APPENDIX VII
APPENDIX V PROOF OF LEMMA 5
PROOF OF LEMMA 12
Proof: By Pinskers inequality
Lemma 18: Let be compact, and let
be continuous. Then, there exists a Borel measurable function (336)
such that for all .
Proof: For all is nonempty and com- where is the variational distance between and and
pact since is continuous. For each , let the th . In this case, where is countable
component of be
(337)
(333)
where is the th coordinate of . This defines By [29, Lemma 2.7, p. 33], when
which satisfies for all . Now we claim (338)
that each is lower semicontinuous, which implies that is
Borel measurable. To this end, we show that for any (339)
is open. Assume the opposite, then there exists a When , by (336), ; when
sequence in that converges to , such that , by (339)
and . Due to the compactness of , there
exists a subsequence that converges to some point in (340)
. Therefore, . But by the continuity of , we have
Using (336) again
(334) (343)
for all , which proves the -achievability of . which is a nonnegative nondecreasing convex function.
3746 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010
By (95) (358)
(346)
(359)
and
(350) (363)
(352) APPENDIX X
PROOF OF LEMMA 14
(353) Proof: Suppose we can construct a mapping
such that
(354) (364)
[35] M. S. Pinsker, Information and Information Stability of Random Vari- [53] E. C. Posner and E. R. Rodemich, Epsilon entropy and data compres-
ables and Processes. San Francisco, CA: Holden-Day, 1964. sion, Ann. Math. Stat., vol. 42, no. 6, pp. 20792125, Dec. 1971.
[36] K. Alligood, T. Sauer, and J. A. Yorke, Chaos: An Introduction to Dy- [54] G. Szeg, Orthogonal Polynomials. Providence, RI: AMS, 1975.
namical Systems. New York: Springer-Verlag, 1996. [55] Y. Wu and S. Verd, Fundamental limits of almost lossless analog
[37] R. Ma, On the dimension of the compact invariant sets of certain compression, in Proc. IEEE Int. Symp. Inf. Theory, Seoul, Korea, Jun.
non-linear maps, in Dynamical Systems and Turbulence, War- 2009.
wick 1980, ser. Lecture Notes in Mathematics. Berlin, Germany:
Springer-Verlag, 1981, vol. 898, pp. 230242.
[38] B. R. Hunt and V. Y. Kaloshin, Regularity of of embeddings of in-
finite-dimensional fractal sets into finite-dimensional spaces, Nonlin-
earity, vol. 12, no. 5, pp. 12631275, 1999. Yihong Wu (S10) received the B.E. degree in electrical engineering from
[39] A. Ben-Artzi, A. Eden, C. Foias, and B. Nicolaenko, Hlder continuity Tsinghua University, Beijing, China, in 2006 and the M.A. degree in electrical
for the inverse of Mas projection, J. Math. Anal. Appl., vol. 178, engineering from Princeton University, Princeton, NJ in 2008, where he is
pp. 2229, 1993. currently working towards the Ph.D. degree at the Department of Electrical
[40] H. Steinhaus, Sur les distances des points des ensembles de mesure Engineering.
positive, Fundamenta Mathematicae, vol. 1, pp. 93104, 1920. He is a recipient of the Princeton University Wallace Memorial honorific fel-
[41] G. J. Minty, On the extension of Lipschitz, Lipschitz-Hlder contin- lowship in 2010. His research interests are in information theory, signal pro-
uous, and monotone functions, Bull. Amer. Math. Soc., vol. 76, no. 2, cessing, mathematical statistics, optimization, and distributed algorithms.
pp. 334339, 1970.
[42] H. Federer, Geometric Measure Theory. New York: Springer-Verlag,
1969.
[43] D. Preiss, Geometry of measures in : Distribution, rectifiability,
Sergio Verd (S80M84SM88F93) received the Telecommunications
and densities, Ann. Math., vol. 125, no. 3, pp. 537643, 1987. Engineering degree from the Universitat Politcnica de Barcelona, Barcelona,
[44] G. A. Edgar, Integral, Probability, and Fractal Measures. New York: Spain, in 1980 and the Ph.D. degree in electrical engineering from the Univer-
Springer-Verlag, 1997. sity of Illinois at Urbana-Champaign, Urbana, in 1984.
[45] M. Keane and M. Smorodinsky, Bernoulli schemes of the same Since 1984, he has been a member of the faculty of Princeton Univer-
entropy are finitarily isomorphi, Ann. Math., vol. 109, no. 2, pp. sity, Princeton, NJ, where he is the Eugene Higgins Professor of Electrical
397406, 1979. Engineering.
[46] A. Del Junco, Finitary codes between one-sided Bernoulli shifts, Er- Dr. Verd is the recipient of the 2007 Claude E. Shannon Award and the 2008
godic Theory Dyn. Syst., vol. 1, pp. 285301, 1981. IEEE Richard W. Hamming Medal. He is a member of the National Academy of
[47] J. G. Sinai, A weak isomorphism of transformations with an invariant Engineering and was awarded a Doctorate Honoris Causa from the Universitat
measure, Dokl. Akad. Nauk SSSR., vol. 147, pp. 797800, 1962. Politcnica de Catalunya in 2005. He is a recipient of several paper awards from
[48] K. E. Petersen, Ergodic Theory. Cambridge, U.K.: Cambridge Univ. the IEEE: the 1992 Donald Fink Paper Award, the 1998 Information Theory
Press, 1990. Outstanding Paper Award, an Information Theory Golden Jubilee Paper Award,
[49] W. H. Schikhof, Ultrametric Calculus: An Introduction to p-Adic Anal- the 2002 Leonard Abraham Prize Award, the 2006 Joint Communications/In-
ysis. New York: Cambridge Univ. Press, 2006. formation Theory Paper Award, and the 2009 Stephen O. Rice Prize from the
[50] M. A. Martin and P. Mattila, -dimensional regularity classifications IEEE Communications Society. He has also received paper awards from the
for -fractals, Trans. Amer. Math. Soc., vol. 305, no. 1, pp. 293315, Japanese Telecommunications Advancement Foundation and from Eurasip. He
1988. received the 2000 Frederick E. Terman Award from the American Society for
[51] V. Milman, Dvoretzkys theorem: Thirty years later, Geom. Funct. Engineering Education for his book Multiuser Detection (Cambridge, U.K.:
Anal., vol. 2, no. 4, pp. 455479, Dec. 1992. Cambridge Univ. Press, 1998). He served as President of the IEEE Informa-
[52] W. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into tion Theory Society in 1997. He is currently Editor-in-Chief of Foundations
a Hilbert space, Contemp. Math., vol. 26, pp. 189206, 1984. and Trends in Communications and Information Theory.