You are on page 1of 28

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO.

8, AUGUST 2010 3721

Rnyi Information Dimension: Fundamental Limits


of Almost Lossless Analog Compression
Yihong Wu, Student Member, IEEE, and Sergio Verd, Fellow, IEEE

AbstractIn Shannon theory, lossless source coding deals with The central problem is to determine how many compressed
the optimal compression of discrete sources. Compressed sensing measurements are sufficient/necessary for recovery with
is a lossless coding strategy for analog sources by means of mul- vanishing block error probability as blocklength tends to
tiplication by real-valued matrices. In this paper we study almost
lossless analog compression for analog memoryless sources in an infinity [2][4].
information-theoretic framework, in which the compressor or de- Random coding is employed to show the existence of
compressor is constrained by various regularity conditions, in par- good linear encoders. In particular, when the random
ticular linearity of the compressor and Lipschitz continuity of the projection matrices follow certain distribution (e.g., stan-
decompressor. The fundamental limit is shown to be the informa- dard Gaussian), the restricted isometry property (RIP) is
tion dimension proposed by Rnyi in 1959.
satisfied with overwhelming probability and guarantees
Index TermsAnalog compression, compressed sensing, infor- exact recovery.
mation measures, Rnyi information dimension, Shannon theory,
On the other hand, there are also significantly different ingre-
source coding.
dients in compressed sensing in comparison with information
theoretic setups.
I. INTRODUCTION Sources are not modeled probabilistically, and the funda-
mental limits are on a worst case basis rather than on av-
A. Motivations From Compressed Sensing erage. Moreover, block error probability is with respect to
the distribution of the encoding random matrices.

T HE Bit is the universal currency in lossless source


coding theory [1], where Shannon entropy is the fun-
damental limit of compression rate for discrete memoryless
Real-valued sparse vectors are encoded by real numbers
instead of bits.
The encoder is confined to be linear while generally in
sources (DMS). Sources are modeled by stochastic processes information-theoretical problems such as lossless source
and redundancy is exploited as probability is concentrated coding we have the freedom to choose the best possible
on a set of exponentially small cardinality as blocklength coding scheme.
grows. Therefore, by encoding this subset, data compression Departing from the compressed sensing literature, we study fun-
is achieved if we tolerate a positive, though arbitrarily small, damental limits of lossless source coding for real-valued mem-
block error probability. oryless sources within an information theoretic setup.
Compressed sensing [2], [3] has recently emerged as an ap- Sources are modeled by random processes. This method
proach to lossless encoding of analog sources by real numbers is more flexible to describe source redundancy which en-
rather than bits. It deals with efficient recovery of an unknown compasses, but is not limited to, sparsity. For example, a
real vector from the information provided by linear measure- mixed discrete-continuous distribution is suitable for char-
ments. The formulation of the problem is reminiscent of the tra- acterizing linearly sparse vectors [5], [6], i.e., those with a
ditional lossless data compression in the following sense. number of nonzero components proportional to the block-
Sources are sparse in the sense that each vector is sup- length with high probability and whose nonzero compo-
ported on a set much smaller than the blocklength. This nents are drawn from a given continuous distribution.
kind of redundancy in terms of sparsity is exploited to Block error probability is evaluated by averaging with re-
achieve effective compression by taking fewer number of spect to the source.
linear measurements. While linear compression plays an important role in our de-
In contrast to lossy data compression, block error proba- velopment, our treatment encompasses weaker regularity
bility, instead of distortion, is the performance benchmark. conditions.
Methodologically, the relationship between our approach and
Manuscript received March 02, 2009; revised April 30, 2010. Date of current compressed sensing is analogous to the relationship between
version July 14, 2010. This work was supported in part by the National Science modern coding theory and classical coding theory: classical
Foundation under Grants CCF-0635154 and CCF-0728445. The material in this
paper was presented in part at the IEEE International Symposium on Informa-
coding theory adopts a worst case (Hamming) approach whose
tion Theory, Seoul, Korea, July 2009 [55]. goal is to obtain codes with a certain minimum distance, while
The authors are with the Department of Electrical Engineering, Princeton modern coding theory adopts a statistical (Shannon) approach
University, Princeton, NJ 08544 USA (e-mail: yihongwu@princeton.edu; whose goal is to obtain codes with small probability of failure.
verdu@princeton.edu).
Communicated by H. Yamamoto, Associate Editor for Shannon Theory. Likewise compressed sensing adopts a worst case model in
Digital Object Identifier 10.1109/TIT.2010.2050803 which compressors work provided that the number of nonzero
0018-9448/$26.00 2010 IEEE
3722 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

components in the source does not exceed a certain threshold, version of the random vector. It characterizes the rate of growth
while we adopt a statistical model in which compressors work of the information given by successively finer discretizations
for most source realizations. In this sense, almost lossless of the space. Although a fundamental information measure, the
analog compression can be viewed as an information theoretic Rnyi dimension is far less well known than either the Shannon
framework for compressed sensing. Probabilistic modeling entropy or the Rnyi entropy. Rnyi showed that under certain
provides elegant results in terms of fundamental limit, as well conditions for an absolutely continuous -dimensional random
as sheds light on constructive schemes on individual sequences. vector the information dimension is . Hence, he remarked in
For example, not only is random coding a proof technique in [11] that the geometrical (or topological) and information-the-
Shannon theory, but also a guiding principle in modern coding oretical concepts of dimension coincide for absolutely contin-
theory as well as in compressed sensing. uous probability distributions. However, the operational role
Recently there have been considerably new developments of Rnyi information dimension has not been addressed before
in using statistical signal models (e.g., mixed distributions) except in the work of Kawabata and Dembo [12], which relates
in compressed sensing (e.g., [5][8]), where reconstruction it to the rate-distortion function. It is shown in [12] that when
performance is evaluated by computing the asymptotic error the single-letter distortion function satisfies certain conditions,
probability in the large-blocklength limit. As discussed in the rate-distortion function of a real-valued source scales
Section IV-B, the performance of those practical algorithms proportionally to as , with the proportionality con-
still lies far from the fundamental limit. stant being the information dimension of the source. This result
serves to drop the assumption of continuity in the asymptotic
B. Lossless Source Coding for Analog Sources tightness of Shannons lower bound in the low distortion regime.
Discrete sources have been the sole object in lossless data In this paper we give an operational characterization of Rnyi
compression theory. The reason is at least twofold. First, information dimension as the fundamental limit of almost loss-
nondiscrete sources have infinite entropy, which implies that less data compression for analog sources under various regu-
representation with arbitrarily small block error probability larity constraints of the encoder/decoder. Moreover, we con-
requires arbitrarily large rate. On the other hand, even if we sider the problem of lossless Minkowski dimension compres-
consider encoding analog sources by real numbers, the result is sion, where the Minkowski dimension of a set measures its de-
still trivial, as and have the same cardinality. Therefore, gree of fractality. In this setup we study the minimum upper
a single real number is capable of representing a real vector Minkowski dimension of high-probability events of source re-
losslessly, yielding a universal compression scheme for any alizations. This can be seen as a counterpart of lossless source
analog source with zero rate and zero error probability. coding, which seeks the smallest cardinality of high-probability
However, it is worth pointing out that the compression events. Rnyi information dimension turns out to be the funda-
method proposed above is not robust because the bijection be- mental limit for lossless Minkowski dimension compression.
tween and is highly irregular. In fact, neither the encoder
nor the decoder can be continuous [9, Exercise 6(c), p. 385]. D. Organization of the Paper
Therefore, such a compression scheme is useless in the pres- Notations frequently used throughout the paper are sum-
ence of any observation noise regardless of the signal-to-noise marized in Section II. Section III gives an overview of Rnyi
ratio (SNR). This disadvantage motivates us to study how to information dimension, a new interpretation in terms of entropy
compress not only losslessly but also gracefully in the real rate and discusses connections with rate-distortion theory.
field. In fact some authors have also noticed the importance of Section IV states the main definitions and results, as well as
regularity in data compression. In [10] Montanari and Mossel their connections with compressed sensing. Section V contains
observed that the optimal data compression scheme often ex- definitions and coding theorems of lossless Minkowski dimen-
hibits the following inconvenience: codewords tend to depend sion compression, which are important intermediate results for
chaotically on the data; hence, changing a single source symbol Sections VI and VII. New type of concentration-of-measure
leads to a radical change in the codeword. In [10], a source type of results are proved for memoryless sources, where it is
code is said to be smooth (resp., robust) if the encoder (resp., shown that overwhelmingly large probability is concentrated
decoder) is Lipschitz (see Definition 6) with respect to the on subsets of low (Minkowski) dimension. Section VI tackles
Hamming distance. The fundamental limits of smooth lossless the case of lossless linear compression, where achievability
compression are analyzed in [10] for binary sources via sparse results are given as well as a converse for mixed discrete-con-
graph codes. In this paper, we focus on sources in the real field tinuous sources. Section VII is devoted to lossless Lipschitz
with general distributions. Introducing a topological structure decompression, where we establish a general converse in terms
makes the nature of the problem quite different from traditional of upper information dimension, and its tightness for mixed
formulations in the discrete world, and calls for machinery discrete-continuous and self-similar sources. Some technical
from dimension theory and geometric measure theory. lemmas are proved in Appendixes IX.
C. Operational Characterization of Rnyi
Information Dimension II. NOTATIONS
In 1959, Alfrd Rnyi proposed an information measure for The major notations adopted in this paper are summarized as
random vectors in Euclidean space named information dimen- follows.
sion [11], through the normalized entropy of a finely quantized , for .
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3723

denotes a random vector. A. Definitions


denotes a realization of .
Definition 1 (Information Dimension [11]): Let be an arbi-
denotes the quantization operator, which can be ap-
trary real-valued random variable. Denote for a positive integer
plied to real numbers, vectors or subsets of as follows:

(1) (11)
(2) Define
(3)
(12)
. and
For

(4) (13)

where and are called lower and upper information


(5) dimensions of , respectively. If , the common
value is called the information dimension of , denoted by
Then, is a partition of with called , i.e.,
mesh cubes of size .
denotes the th bit in the binary expansion of (14)
, that is
Rnyi also defined the entropy of dimension as
(6)
(15)
Then
provided the limit exists.
(7)
Definition 1 can be readily extended to random vectors, where
the floor function is taken componentwise. Since only
(8) depends on the distribution of , we also denote
. Similar convention also applies to entropy and other in-
formation measures.
Similarly, is defined componentwise.
Apart from discretization, information dimension can be de-
Let be a metric space. Denote the closed ball of
fined from a more general viewpoint: the mesh cubes of size
radius centered at by
in are the sets for
. In particular, in , denote the
. For any , the collection par-
ball of radius centered at by
titions . Hence, for any probability measure on , this
(9) partition generates a discrete probability measure on by
assigning . Then, the information dimension
where the -norm on is defined as of can be expressed as

(10) (16)

Define , which is not a norm It should be noted that there exist alternative definitions of
since for or . However, information dimension in the literature. For example, in [13],
is a valid metric on . the lower and upper information dimensions are defined by
Let . For a matrix , denote by replacing with the -entropy with respect to the
the submatrix formed by those columns of whose distance. This definition essentially allows unequal partition
indices are in . of the whole space and lowers the value of information dimen-
All logarithms in this paper are with respect to base . sion, since . However, the resulting definition
is equivalent (see Theorem 23). As an another example, the
III. RNYI INFORMATION DIMENSION following definition is adopted in [14, Def. 4.2]:
In this section, we give an overview of Rnyi information di-
mension and its properties. Moreover, we give a novel interpre- (17)
tation in terms of the entropy rate of the dyadic expansion of the
random variable. We also discuss the connection between infor- where denotes the distribution of and is the -ball
mation dimension and rate-distortion theory established in [12]. of radius centered at . This definition is equivalent to Defini-
3724 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

tion 1, as shown in Appendix I. Note that (17) can be generalized probability measure singular with respect to Lebesgue measure
to random variables on an arbitrary metric space. but with no atoms (singular part).
As shown in [11], the information dimension for the mixture
B. Characterizations and Properties of discrete and absolutely continuous distribution can be deter-
The lower and upper information dimension of a random vari- mined as follows.
able might not always be finite, because can be in-
Theorem 1 [11]: Let be a random variable such that
finity for all . However, as pointed out in [11], if the mild con-
is finite. Assume the distribution of can be repre-
dition is satisfied, we have
sented as
(18)
(26)
The necessity of this condition is shown in Proposition 1.
One sufficient condition for finite information dimension is where is a discrete measure, is an absolutely continuous
. Consequently, if for some measure, and . Then
, then . (27)
Proposition 1:
Furthermore, given the finiteness of and
(19) admits a simple formula
(20)
(28)
(21)
where is the Shannon entropy of is the differ-
If , then ential entropy of , and is
(22) the binary entropy function.
Proof: See [11, Th. 1 and 3] or [16, Th. 1, pp. 588592].
Proof: See Appendix II.
For -valued , (20) can be generalized to Some consequences of Theorem 1 are as follows. As long as
. :
To calculate the information dimension in (12) and (13), it is 1) is discrete: , and coincides with the
sufficient to restrict to the exponential subsequence , as Shannon entropy of .
a result of the following proposition. 2) is continuous: , and is equal to the
differential entropy of .
Proposition 2: 3) is discrete-continuous-mixed: , and is
the weighted sum of the entropy of discrete and continuous
(23)
parts plus a term of .
For mixtures of countably many distributions, we have the
(24) following theorem.

Proof: See Appendix II. Theorem 2: Let be a discrete random variable with
. If exists for all , then exists and
Similarly to the approach in Proposition 1, we have the is given by . More generally
following.
Proposition 3: and are unchanged if rounding or (29)
ceiling functions are used in Definition 1.
Proof: See Appendix II.
(30)
C. Evaluation of Information Dimension
By the Lebesgue decomposition theorem [15], a probability Proof: For any , the conditional distribution of
distribution can be uniquely represented as the mixture given is the same as . Then
(25) (31)
where ; is a purely atomic proba- where
bility measure (discrete part); is a probability measure abso-
lutely continuous with respect to Lebesgue measure, i.e., having
(32)
a probability density function (continuous part1); and is a
1In measure theory, sometimes a measure is called continuous if it does not
have any atoms, and a singular measure is called singularly continuous. Here Since , dividing both sides of (31) by and
we say a measure is continuous if and only if it is absolutely continuous. sending yields (29) and (30).
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3725

To summarize, when has a discrete-continuous-mixed dis- E. Self-Similar Distribution


tribution, the information dimension of is given by the weight An iterated function system (IFS) is a family of contractions
of the continuous part. When the distribution of has a singular on , where , and
component, its information dimension does not admit a simple satisfies for all
formula in general. For instance, it is possible that with . By [17, Theorem 2.6], given an IFS, there is a
[11]. However, for the important class of self-similar sin- unique nonempty compact set , called the invariant set of the
gular distributions, the information dimension can be explicitly IFS, such that . We say that the IFS satisfies
determined. See Section III-E. the strong separation condition, if are
disjoint. The corresponding invariant set is called self-similar, if
D. Interpretation of Rnyi Dimension as Entropy Rate the IFS consists of similarity transformations, that is,
Let a.s. Observe that , since the with an orthogonal matrix and , in which
range of contains at most values. Then, case
. The dyadic expansion of can be written as
(39)

(33) is called the similarity ratio of . Self-similar sets are usually


fractal. For example, consider the IFS on with
(40)
where each is a binary random variable. Therefore, there is
a one-to-one correspondence between and the binary random The resulting invariant set is the middle-third Cantor set.
process . Note that the partial sum in (33) is Now we define measures supported on a self-similar set
associated with the IFS . A continuous mapping
(34) from the space (equipped with the product
topology) onto is defined as follows:

and and are in one-to-one correspon- (41)


dence, therefore

(35) where the right-hand side is a singleton [12]. Therefore, every


measure on induces a measure on as the
By Proposition 2, we have image measure of under , that is, .
If is stationary and ergodic, is called a self-similar mea-
(36) sure. In the special case when corresponds to a memoryless
process with common distribution satis-
(37) fies [17, Th. 2.8]

(42)
Thus, its information dimension is the entropy rate of its dyadic
expansion, or the entropy rate of any -ary expansion of , and for each . The usual Cantor distribution
divided by . [15] can be defined through the IFS in (40) and .
This interpretation of information dimension enables us to The next result gives the information dimension of a self-
gain more intuition about the result in Theorem 1. When has a similar measure with IFS satisfying the open set condition2
discrete distribution, its dyadic expansion has zero entropy rate. [18, p. 129], that is, there exists a nonempty bounded open set
When is uniform on , its dyadic expansion is indepen- , such that and for
dent identically distributed (i.i.d.) equiprobable, and therefore it .
has unit entropy rate in bits. If is continuous, but nonuniform,
its dyadic expansion still has unit entropy rate. Moreover, from Theorem 3 [17], [12]: Let the distribution of be a self-
(36), (37), and Theorem 1, we have similar measure generated from the stationary ergodic measure
on and the IFS with similarity
(38) ratios and invariant set . Then
(43)
where denotes the relative entropy and the differential
entropy is since a.s. The information When is the distribution of a memoryless process with
dimension of a discrete-continuous mixture is also easily un- common distribution , (43) is reduced to
derstood from this point of view, because the entropy rate of a
mixed process is the weighted sum of entropy rates of each com- (44)
ponent. Moreover, random variables whose lower and upper in-
formation dimensions differ can be easily constructed from pro- 2The open set condition is weaker than the previous strong separation
cesses with different lower and upper entropy rates. condition.
3726 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

Note that the open set condition implies that As a consequence of the monotonicity of Rnyi entropy, in-
. formation dimensions of different orders satisfy the following
Since , it follows that result.
Lemma 1: For and both decrease with .
(45) Define
(51)
In view of (44), we have
Then
(46) (52)

where is a subprobability measure. Since Proof: All inequalities follow from the fact that for a fixed
, we have , which agrees with random variable decreases with in .
Proposition 1. For dimension of order , we highlight the following result
from [21].
F. Connections With Rate-Distortion Theory
Theorem 4 [21, Th. 3]: Let be a random variable whose
The asymptotic behavior of the rate-distortion function, in distribution has Lebesgue decomposition as in (25). Then, we
particular, the asymptotic tightness of the Shannon lower bound have the following.
in the high-rate regime, has been addressed in [19] and [20] for 1) : if , that is, has a discrete component, we
continuous sources. In [12], Kawabata and Dembo generalized have .
it to real-valued sources that do not necessarily possess a den- 2) : if , that is, has a continuous component,
sity, and showed that the information dimension plays a cen- and , we have .
tral role. For completeness, we summarize the main results from The differential Rnyi entropy is defined using its
[12] in Appendix III. density as

G. Rnyi Dimension of Order (53)


With Shannon entropy replaced by Rnyi entropy in
(12)(13), the generalized notion of dimension of order is In general, is discontinuous in . For discrete-contin-
defined similarly. uous-mixed distributions, for all ,
Definition 2 (Information Dimension of Order ): Let while equals to the weight of the continuous part. How-
. Define ever, for Cantor distribution,
for all .
(47)
IV. DEFINITIONS AND MAIN RESULTS
and
This section presents a unified framework for lossless data
compression and our main results in the form of coding theo-
(48) rems under various regularity conditions. Proofs are relegated
to Sections VVII.
where denotes the Rnyi entropy of order of a discrete
random variable with probability mass function A. Lossless Data Compression
, defined as
Let the source be a stochastic process on
, with denoting the source alphabet and a -al-
gebra over . Let be a measurable space, where is
(49)
called the code alphabet. The main objective of lossless data
compression is to find efficient representations for source real-
izations by .
and are called lower and upper dimensions of
Definition 3: A -code for over the code
of order , respectively. If , the common value
space is a pair of mappings:
is called the information dimension of of order , denoted by
1) encoder: that is measurable relative to
. Rnyi also defined in [16] the entropy of of order
and ;
and dimension as
2) decoder: that is measurable relative to
(50) and .
The block error probability is .
provided the limit exists. The fundamental limit in lossless source coding is as follows.
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3727

Definition 4 (Lossless Data Compression): Let TABLE I


be a stochastic process on . Define to be REGULARITY CONDITIONS OF ENCODER/DECODERS AND CORRESPONDING
MINIMUM -ACHIEVABLE RATES
the infimum of such that there exists a sequence of
-codes over the code space , such that

(54)

for all sufficiently large .


According to the classical discrete almost-lossless source
coding theorem, if is countable and is finite, the minimum Definition 5: Let be a stochastic process
achievable rate for any i.i.d. process with distribution is on . Define the minimum -achievable rate to be
the infimum of such that there exists a sequence of
-codes , such that
(55)
(57)

Using codes over an infinite alphabet, any discrete source can for all sufficiently large , and the encoder and decoder
be compressed with zero rate and zero block error probability. are constrained according to Table I. Except for linear encoding
In other words, if both and are countably infinite, then for where , it is assumed that .
all In Definition 5, we have used the following definitions.
(56) Definition 6 (Hlder and Lipschitz Continuity): Let
and be metric spaces. A function is called
for any random process. -Hlder continuous if there exists such that for
any
B. Lossless Analog Compression With Regularity Conditions
In this section, we consider the problem of encoding analog (58)
sources with analog symbols, that is, and
or if bounded encoders are is called -Lipschitz if is -Hlder continuous. is
required, where denotes the Borel -algebra. As in the simply called Lipschitz (resp., -Hlder continuous) if is
countably infinite case, zero rate is achievable even for zero -Lipschitz (resp., -Hlder continuous) for some .
block error probability, because the cardinality of is the same We proceed to give results for each of the minimum -achiev-
for any [22]. This conclusion holds even if we require the en- able rates introduced in Definition 5. Motivated by compressed
coder/decoder to be Borel measurable, because according to Ku- sensing theory, it is interesting to consider the case where the
ratowskis theorem [23, Remark (i), p. 451] every uncountable encoder is restricted to be linear.
standard Borel space is isomorphic3 to . Therefore,
a single real number has the capability of encoding a real vector, Theorem 5 (Linear Encoding: General Achievability): Sup-
or even a real sequence, with a coding scheme that is both uni- pose that the source is memoryless. Then
versal and deterministic.
However, the rich structure of equipped with a metric (59)
topology (e.g., that induced by Euclidean distance) enables us
to probe the problem further. If we seek the fundamental limits for all , where is defined in (51). Moreover, we
of not only lossless coding but graceful lossless coding, have the following.
the result is not trivial anymore. In this spirit, our various 1) For all linear encoders (except possibly those in a set of
definitions share the basic information-theoretic setup where a zero Lebesgue measure on the space of real matrices),
random vector is encoded with a function block error probability is achievable.
and decoded with with such that 2) The decoder can be chosen to be -Hlder continuous for
and satisfy certain regularity conditions and the probability all , where is the compression
of incorrect reproduction vanishes as . rate.
Regularity in encoder and decoder is imposed for the sake Proof: See Section VI-C.
of both less complexity and more robustness. For example, al- Theorem 6 (Linear Encoding: Discrete-Continuous Mixture):
though a surjection from to is capable of lossless Suppose that the source is memoryless with a discrete-contin-
encoding, its irregularity requires specifying uncountably many uous mixed distribution. Then
real numbers to determine this mapping. Moreover, regularity
in encoder/decoder is crucial to guarantee noise resilience of the (60)
coding scheme.
3Two measurable spaces are isomorphic if there exists a measurable bijection for all .
whose inverse is also measurable. Proof: See Section VI-C.
3728 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

Theorem 7 (Linear Encoding: Achievability for Self-Similar For sources with a singular distribution, in general there is no
Sources): Suppose that the source is memoryless with a self- simple answer due to their fractal nature. For an important class
similar distribution that satisfies the open set condition. Then of singular measures, namely self-similar measures generated
from i.i.d. digits (e.g., generalized Cantor distribution), the in-
(61)
formation dimension turns out to be the fundamental limit for
for all . lossless compression with Lipschitz decoder.
Proof: See Section VI-C. Theorem 11 (Lipschitz Decoding: Achievability for Self-Sim-
In Theorems 5 and 6, it has been shown that block error prob- ilar Measures): Suppose that the source is memoryless and
ability is achievable for Lebesgue-a.e. linear encoder. There- bounded, and its -ary expansion consists of independent iden-
fore, choosing any random matrix with i.i.d. entries distributed tically distributed digits. Then
according to some absolutely continuous distribution on (e.g.,
a Gaussian random matrix) satisfies block error probability al- (65)
most surely.
Now, we drop the restriction that the encoder is linear, al- for all . Moreover, if the distribution of each bit is
lowing very general encoding rules. Let us first consider the case equiprobable on its support, then (65) holds for
where both the encoder and decoder are constrained to be con- Proof: See Section VII-D.
tinuous. It turns out that zero rate is achievable in this case.
Example 1: As an example, we consider the setup in Theorem
Theorem 8 (Continuous Encoder and Decoder): For general 11 with and , where . The
sources associated invariant set is the middle third Cantor set [18]
and is supported on . The distribution of , denoted by ,
(62)
is called the generalized Cantor distribution [25]. In the ternary
for all . expansion of , each digit is independent and takes value and
Proof: Since and are Borel iso- with probability and respectively. Then, by Theorem 11,
morphic, there exist Borel measurable and for any . Furthermore, when
, such that . By Lusins theorem [24, coincides with the uniform distribution on , i.e., the
Th. 7.10], there exists a compact set such that re- standard Cantor distribution. Hence, we have a stronger result
stricted on is continuous and . Since is that , i.e., exact lossless compression can
compact and is injective on is a homeomorphism from be achieved with a Lipschitz continuous decompressor at the
to . Hence, is continuous. Since both rate of the information dimension.
and are closed, by Tietze extension theorem [9], and can
be extended to continuous and Let . Then, is the inverse
, respectively. Using and as the new encoder and de- of on . Due to the -Lipschitz continuity of is an
coder, the error probability satisfies expansive mapping, that is
.
(66)
Employing similar arguments as in the proof of Theorem 8,
we see that imposing additional continuity constraints on the
encoder (resp., decoder) has almost no impact on the funda- Note that (66) implies the injectivity of , a necessary con-
mental limit (resp., ). This is because a continuous dition for decodability. Moreover, not only does assign
encoder (resp., decoder) can be obtained at the price of an arbi- different codewords to different source symbols, but also it
trarily small increase of error probability, which can be chosen keeps them sufficiently separated proportionally to their dis-
to vanish as grows. tance. Therefore, the encoder respects the metric structure
Theorems 911 deal with Lipschitz decoding in Euclidean of the source alphabet.
spaces. We conclude this section by introducing stable decoding, a
weaker condition than Lipschitz continuity.
Theorem 9 (Lipschitz Decoding: General Converse): Sup-
pose that the source is memoryless. If , then Definition 7 ( -Stable): Let and be
metric spaces and . is called -stable
(63) on if for all
for all .
Proof: See Section VII-B. (67)

Theorem 10 (Lipschitz Decoding: Achievability for Discrete/ We say is -stable if is -stable.


Continuous Mixture): Suppose that the source is memoryless
A function is -Lipschitz if and only if it is -stable
with a discrete-continuous mixed distribution. Then
for every . We denote by the minimum -achiev-
(64) able rate such that there exists a sequence of Borel encoders and
-stable decoders that achieve block error probability . The
for all . fundamental limit of stable decoding is given by the following
Proof: See Section VII-C. tight result, whose proof is omitted for conciseness.
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3729

Theorem 12 ( -Stable Decoding): Let the underlying metric for all with and for all in sup-
be the distance. Suppose that the source is memoryless. ported on . Then
Then, for all
(72)
(68)
By Theorem 13, using (70) as the decoder, the -norm of the
that is, the minimum -achievable rate such that for all suffi- decoding error is upper bounded proportionally to the -norm
ciently small there exists a -stable coding strategy is given of the noise.
by . In our framework, a stable or Lipschitz continuous coding
scheme also implies robustness with respect to noise added at
C. Connections With Compressed Sensing the input of the decompressor, which could result from quan-
As an application of Theorem 6, we consider the following tization, finite wordlength or other inaccuracies. For example,
source distribution: suppose that the encoder output is quantized by
a -bit uniform quantizer, resulting in . With a -stable
(69) coding strategy , we can use the following decoder. De-
note the following nonempty set:
where is the Dirac measure with atom at and
is an absolutely continuous distribution. This is the model for (73)
linearly sparse signals used in [5] and [6], where a universal4
iterative thresholding decoding algorithm is proposed. Under where . Pick any
certain assumptions on , the asymptotic error probability turns in and output . Then, by the stability of
out to exhibit a phase transition [5], [6]: there is a sparsity-de- , we have
pendent threshold on the measurement rate above which
the error probability vanishes and below which the error prob- (74)
ability goes to one. This behavior is predicted by Theorem 6,
which shows the optimal threshold is , irrespective of the prior i.e., each component in the decoder output will suffer at most
. Moreover, the decoding algorithm in the achievability proof twice the inaccuracy of the decoder input. Similarly, an -Lips-
of Section VI-C is universal and robust (Hlder continuous), al- chitz coding scheme with respect to distance incurs an error
though it has exponential complexity. The threshold is not no more than .
given in closed form (in [5, eq. (5) and Fig. 1]), but its numerical
evaluation shows that it lies far from the optimal threshold ex-
V. LOSSLESS MINKOWSKI-DIMENSION COMPRESSION
cept in the nonsparse regime ( close to 1). Moreover, it can be
shown that as . The performance of several As a counterpart to lossless data compression, in this section,
other suboptimal factor-graph-based reconstruction algorithms we investigate the problem of lossless Minkowski dimension5
is analyzed in [7]. Practical robust algorithms that approach the compression for general sources, where the minimum -achiev-
fundamental limit of compressed sensing given by Theorem 6 able rate is defined as . This is an important intermediate
are not yet known. tool for studying fundamental limits of lossless linear encoding
Robust reconstruction is of great importance in the theory and Lipschitz decoding. Bridging the three compression frame-
of compressed sensing [26][28], since noise resilience is an works, in Sections VI-C and VII-B, we prove the following
indispensable property for decompressing sparse signals from inequality:
real-valued measurements. For example, consider the following
robustness result. (75)
Theorem 13 [26]: Suppose we wish to recover a vector
Hence, studying provides an achievability bound for loss-
from noisy compressed linear measurements ,
less linear encoding and a converse bound for Lipschitz de-
where and . Let be a solution
coding. We present bounds for for general sources, as
of the following -regularization problem:
well as tight results for discrete-continuous mixed and self-sim-
ilar sources.
(70)
A. Minkowski Dimension of Sets in Metric Spaces
Let satisfy , where is the In fractal geometry, the Minkowski dimension is a way of
-restricted isometry constant of matrix , defined as the determining the fractality of a subset in metric spaces.
smallest positive number such that
Definition 8 (Covering Number): Let be a nonempty
(71) bounded subset of the metric space . For , define
4The decoding algorithm is universal if it requires no knowledge of the prior 5Also known as MinkowskiBouligand dimension, fractal dimension or box-
distribution of nonzero entries. counting dimension.
3730 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

, the -covering number of , to be the smallest number it intersects, hence justifying the name of box-counting dimen-
of -balls needed to cover , that is sion. Denote by the smallest number of mesh cubes
of size that covers , that is

(76)
(81)

(82)
Definition 9 (Minkowski Dimensions): Let be a nonempty (83)
bounded subset of metric space . Define the lower and
upper Minkowski dimensions of as
Lemma 3: Let be a bounded subset in
. The Minkowski dimensions satisfy
(77)
(84)
(78)
(85)

respectively. If , the common value is called Proof: See Appendix V.


the Minkowski dimension of , denoted by .
B. Definitions and Coding Theorems
It should be pointed out that the Minkowski dimension
depends on the underlying metric. Nevertheless, equivalent Consider a source in equipped with an -norm. We
metrics result in the same dimension. A few examples are as define the minimum -achievable rate for Minkowski-dimen-
follows. sion compression as follows.
for any finite set .
Definition 10 (Minkowski-Dimension Compression Rate):
for any bounded set of nonempty interior
Let be a stochastic process on . Define
in Euclidean space .
Let be the middle-third Cantor set in the unit interval.
Then, [18, Example 3.3].
[18, Example 3.5]. From this
example, wee see that Minkowski dimension lacks certain (86)
stability properties one would expect of a dimension, since
it is often desirable that adding a countable set would have
Note that the conventional minimum source coding rate
no effect on dimension. This property fails for Minkowski
in Definition 4 is defined like in (86) replacing by
dimension. On the contrary, we observe that Rnyi infor-
.
mation dimension exhibits stability with respect to adding
In general for any . This is because
a discrete component as long as the entropy is finite. How-
for any , there exists a compact subset , such that
ever, mixing any distribution with a discrete measure with
, and by definition. Several
unbounded support and infinite entropy will necessarily re-
coding theorems for are given as follows.
sult in infinite information dimension.
The upper Minkowski dimension satisfies the following prop- Theorem 14: Suppose that the source is memoryless with dis-
erties (see [18, p. 48 (iii) and p. 102 (7.9)]), which will be used tribution such that . Then
in the proof of Theorem 14.
(87)
Lemma 2: For bounded sets
and

(79) (88)
for .
(80) Proof: See Section V-C.
For the special cases of discrete-continuous-mixed and self-
The following lemma shows that in Euclidean spaces, without similar sources, we have the following tight results.
loss of generality we can restrict attention to covering with
Theorem 15: Suppose that the source is memoryless with a
mesh cubes defined in (5). Since all the mesh cubes partition
discrete-continuous mixed distribution. Then
the whole space, to calculate lower or upper Minkowski dimen-
sion of a set, it is sufficient to count the number of mesh cubes (89)
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3731

for all , where is the weight of the continuous part block length. This has been shown for rates above entropy in
of the distribution. If is finite, then [29, Exercise 1.2.7, pp. 4142] via a combinatorial method. A
unified proof can be given through the method of Rnyi entropy,
(90) which we omit for conciseness. The idea of using Rnyi entropy
to study lossless source coding error exponents was previously
Theorem 16: Suppose that the source is memoryless with a
introduced by [30][32].
self-similar distribution that satisfies the strong separation con-
Lemma 5 deals with universal lower bounds on the source
dition. Then
coding error exponents, in the sense that these bounds are inde-
(91) pendent of the source distribution. A better bound on
has been shown in [33]: for
for all .
(100)
Theorem 17: Suppose that the source is memoryless and
bounded, and its -ary expansion consists of independent However, the proof of (100) was based on the dual expression
digits. Then (96) and a similar lower bound of random channel coding error
(92) exponent due to Gallager [34, Exercise 5.23], which cannot be
applied to . Here we give a common lower bound on
for all . both exponents, which is a consequence of Pinskers inequality
[35] combined with the lower bound on entropy difference by
When can take any value in for all
variational distance [29].
. Such a source can be constructed using Theorem 15
as follows: Let the distribution of be a mixture of a contin- Proof of Theorem 14: (Converse) Let and
uous and a discrete distribution with weights and respec- abbreviate as . Suppose for some .
tively, where the discrete part is supported on and has infinite Then, for sufficiently large there exists , such that
entropy. Then, by Proposition 1 and Theorem 1, but and .
by Theorem 15. First we assume that the source has bounded support, that is,
a.s. for some . By Proposition 2, choose
C. Proofs such that for all
Before showing the converse part of Theorem 14, we state
two lemmas which are of independent interest in conventional (101)
lossless source coding theory. By Lemma 3, for all , there exists , such that for all
Lemma 4: Assume that is a discrete memo-
ryless source with common distribution on the alphabet .
. Let . Denote by the block (102)
error probability of the optimal -code. (103)
Then, for any
Then, by (101), we have
(93)
(104)
(94)
Note that is a memoryless source with alphabet size at
where the exponents are given by most . By Lemmas 4 and 5, for all , for all
such that , we have
(95)
(105)
(96)
(106)
(97)
Choose and so large that the right-hand side of (106) is less
(98) than . In the special case of , (106) contradicts
(103) in view of (104).
Lemma 5: For Next we drop the restriction that is almost surely bounded.
Denote by the distribution of . Let be so large that

(107)
(99)
Denote the normalized restriction of on by , that
is
Proof: See Appendix VII.
Lemma 4 shows that the error exponents for lossless source
(108)
coding are not only asymptotically tight, but also apply to every
3732 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

and the normalized restriction of on by . Then Suppose for now that we can construct a sequence of subsets
, such that for any the following holds:
(109)
(122)
By Theorem 2
(123)

(110)
(111) Denote

where because of the following: since is finite, we (124)


have in view of Proposition 1. Conse-
quently, , hence . (125)
The distribution of is given by
(126)
(112)

where denotes the Dirac measure with atom at . Then


Then, . Now we claim that for each
and . First observe that for each
(113) , therefore covers
(114) . Hence, , by (122). Therefore,
by Lemma 3
where:
(113): by (112) and (31), since Theorem 1 implies
(127)
;
(114): by (111) and (107).
For , let . Define By the BorelCantelli lemma, (123) implies that
. Let where is so large
(115) that . By the finite subad-
ditivity6 of upper Minkowski dimension in Lemma 2,
. By the
Then, for all arbitrariness of , the -achievability of rate is proved.
Now let us proceed to the construction of the required .
(116) To that end, denote by the probability mass function of
. Let
and
(128)

(117) (129)

(118) Then, immediately (122) follows from .


(119) Also, for
(120)

where (120), (117), and (118) follow from (114), (79), and (80), (130)
respectively. But now (120) and (116) contradict the converse
part of Theorem 14 for the bounded source , which we have
already proved. (131)
(Achievability) Recall that
(132)
(121)

(133)
We show that for all , for all , there exists
such that for all , there exists with (134)
and . Therefore, 6Countable subadditivity fails for upper Minkowski dimension. Had it been
readily follows. satisfied, we could have picked to achieve .
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3733

where (134) follows from the fact that By (142), for sufficiently large .
are i.i.d. and the joint Rnyi entropy is Decompose as
the sum of individual Rnyi entropies. Hence
(145)

where
(135)
Choose such that , which is guaranteed
(146)
to exist in view of (121) and the fact that is nonincreasing in
according to Lemma 1. Then
Note that the collection of all is countable, and thus we
may relabel them as . Then, . Hence,
there exists and , such that ,
where
(136)
(147)
(137)
(138)
Then
Hence, decays at least exponentially with
. Accordingly, (123) holds, and the proof of is (148)
complete.
Next we prove for the special case of discrete- (149)
continuous mixed sources. The following lemma is needed in
the converse proof. where (148) is by Lemma 2, and (149) follows from
for each . This proves the -achiev-
Lemma 6 [36, Th. 4.16]: Any Borel set whose upper ability of . By the arbitrariness of .
Minkowski dimension is strictly less than has zero Lebesgue (Converse) Since , we can assume . Let
measure. be such that . Define
Proof of Theorem 15: (Achievability) Let the distribution
of be (150)

(139) By (142), for sufficiently large . Let


, then . Write as the
where is a probability measure on abso- disjoint union
lutely continuous with respect to Lebesgue measure and is a
discrete probability measure. Let be the collection of all the (151)
atoms of , which is, by definition, a countable subset of .
Let . Then, is a sequence of i.i.d. binary
random variables with expectation where

(140) (152)

By the weak law of large numbers (WLLN) Also let . Since


, there exists and , such
that . Note that
(141)

(142)
(153)
where the generalized support of vector is defined as

(143) If , which implies has


positive Lebesgue measure. By Lemma 6,
Fix an arbitrary and let , hence

(144) (154)
3734 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

If implies that has positive Lebesgue By the same arguments that lead to (127), the sets defined in
measure. Thus, . By the arbitrariness of , the (125) satisfy
proof of is complete.
(164)
Next we prove for self-similar sources under
the same assumption of Theorem 3. This result is due to the
stationarity and ergodicity of the underlying discrete process (165)
that generates the analog source distribution.
Proof of Theorem 16: By Theorem 3, is finite. There- (166)
fore, the converse follows from Theorem 14. To show achiev-
Since , this shows the -achiev-
ability, we invoke the following definition. Define the local di-
ability of .
mension of a Borel measure on as the function (if the limit
Next we proceed to construct the required . Applying
exists) [17, p. 169]
Lemma 4 to the DMS and blocklength yields
(155) (167)

Denote the distribution of by the product measure , By the assumption that is bounded, without loss of generality,
which is also self-similar and satisfies the strong separation the- we shall assume a.s. Therefore, the alphabet size of
orem. By [17, Lemma 6.4(b) and Prop. 10.6] is at most . Simply applying (100) to yields

(156) (168)

holds for -almost every . Define the sequence of random which does not grow with and cannot suffice for our purpose
variables of constructing . Exploiting the structure that consists
of independent bits, in Appendix VIII, we show a much better
(157) bound

Then, (156) implies that as . There- (169)


fore, for all and , there exists , such that
Then, by (167) and (169), there exists , such that (163) holds
and
(158)
(170)
Let
which implies (123). This concludes the proof of .
(159)
VI. LOSSLESS LINEAR COMPRESSION
(160) In this section, we analyze lossless compression with linear
encoders, which are the basic elements in compressed sensing.
Then, in view of (158), and Capitalizing on the approach of Minkowski-dimension com-
pression developed in Section V, we obtain achievability re-
sults for linear compression. For memoryless sources with a
(161)
discrete-continuous mixed distribution, we also establish a con-
(162) verse in Theorem 6 which shows that the information dimension
is the fundamental limit of lossless linear encoding.
where (162) follows from (160) and . Hence,
is proved. A. Minkowski-Dimension Compression
and Linear Compression
Finally, we prove for memoryless sources
whose -ary expansion consisting of independent digits. The following theorem establishes the relationship be-
Proof of Theorem 17: Without loss of generality, assume tween Minkowski-dimension compression and linear data
, that is, the binary expansion of consists of inde- compression.
pendent bits. We follow the same steps as in the achievability Theorem 18 (Linear Encoding: General Achievability): For
proof of Theorem 14. Suppose for now that we can construct a general sources
sequence of subsets , such that (123) holds and their car-
dinality does not exceed (171)

(163) Moreover, we have the following.


WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3735

1) For all linear encoders (except possibly those in a set of the decoder. This is because no matrix acts injectively on
zero Lebesgue measure on the space of real matrices), . To see this, introduce the notation
block error probability is achievable.
2) For all and (174)

where . Then, for any matrix


(172)
(175)
where is the compression rate, there exists
a -Hlder continuous decoder that achieves block error Hence, there exist two -sparse vectors that have the same image
probability . under .

Consequently, in view of Theorem 18, the results on On the other hand, is sufficient to linearly compress all
for memoryless sources in Theorems 1416 yield the achiev- -sparse vectors, because (175) holds for Lebesgue-a.e.
ability results in Theorems 57, respectively. Hlder exponents matrix . To see this, choose to be a random matrix with
of the decoder can be found by replacing in (172) by its i.i.d. entries according to some continuous distribution (e.g.,
respective upper bound. Gaussian). Then, (190) holds if and only if all subma-
For discrete-continuous sources, the achievability in Theorem trices formed by columns of are invertible. This is an al-
6 can be shown directly without invoking the general result in most sure event, because the determinant of each of the
Theorem 18. See Remark 3. From the converse proof of The-
orem 6, we see that effective compression can be achieved with submatrices is an absolutely continuous random variable. The
linear encoders, i.e., , only if the source distribution sufficiency of is a bit stronger than the result in Re-
is not absolutely continuous with respect to Lebesgue measure. mark 1, which gives . For an explicit construction
of such a matrix, we can choose to be the matrix
Remark 1: Linear embedding of low-dimensional subsets in (see Appendix IV).
Banach spaces was previously studied in [37][39], etc., in a
nonprobabilistic setting. For example, [38, Th 1.1] showed that: B. Auxiliary Results
for a subset of a Banach space with , there
Let . Denote by the Grassmannian man-
exists a bounded linear function that embeds into . Here in
ifold [25] consisting of all -dimensional subspaces of . For
a probabilistic setup, the embedding dimension can be improved
, the orthogonal projection from to defines
by a factor of two.
a linear mapping of rank . The technique
Following the idea in the proof of Theorem 18, we obtain a we use in the achievability proof of linear analog compression is
nonasymptotic result of lossless linear compression for -sparse to use the random orthogonal projection as the encoder,
vectors, which is relevant to compressed sensing. where is distributed according to the invariant probability
measure on , denoted by [25]. The relationship be-
Corollary 1: Denote the collection of all -sparse vectors in
tween and the Lebesgue measure on is shown in the
by
following lemma.
Lemma 7 [25, Exercise 3.6]: Denote the rows of a
(173) matrix by , the row span of by , and
the volume of the unit -ball in by . Set
Let be a -finite Borel measure on . Then, given any
, for Lebesgue-a.e. real matrix , there exists a (176)
Borel function , such that for
-a.e. . Moreover, when is finite, for any and Then, for measurable, i.e., a collection of -di-
, there exists a matrix and , mensional subspaces of
such that and is
-Hlder continuous. (177)
Remark 2: The assumption that the measure is -finite is
essential, because the validity of Corollary 1 hinges upon Fu- The following result states that a random projection of a given
binis theorem, where -finiteness is an indispensable require- vector is not too small with high probability. It plays a central
ment. Consequently, if is the distribution of a -sparse random role in estimating the probability of bad linear encoders.
vector with uniformly chosen support and Gaussian distributed Lemma 8 [25, Lemma 3.11]: For any
nonzero entries, we conclude that all -sparse vectors can be
linearly compressed except for a subset of zero measure under
(178)
. On the other hand, if is the counting measure on , Corol-
lary 1 no longer applies because that is not -finite. In fact,
if , no linear encoder from to works for every To show the converse part of Theorem 6, we will invoke the
-sparse vector, even if no regularity constraint is imposed on Steinhaus theorem as an auxiliary result.
3736 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

Lemma 9 (Steinhaus [40]): For any measurable set We show that for all .
with positive Lebesgue measure, there exists an open ball cen- To this end, let
tered at contained in .
(188)
Lastly, with the notation in (174), we give a characterization
of the fundamental limit of lossless linear encoding as follows. Define
The proof is omitted for conciseness. (189)
Lemma 10: is the infimum of such that for (190)
sufficiently large , there exists a Borel set and a Then
linear subspace of dimension at least , (191)
such that
(179) Observe that implies that

C. Proofs (192)

Proof of Theorem 18: We first show (171). Fix Therefore, for all but a finite number of s if and only
arbitrarily. Let if
and . We show that there exists a matrix (193)
of rank and Borel measur-
able such that for some .
Next we show that for all but a finite number of
(180) s with probability one. Cover with -balls.
The centers of those balls that intersect are denoted by
for sufficiently large . . Pick .
By definition of , there exists such that for all Then, , hence
, there exists a compact such that cover . Suppose ,
and . Given an encoding matrix , then for any
define the decoder as
(194)
(181) (195)
where the is taken componentwise7. Since is (196)
closed and is compact, is compact. Hence,
where (195) follows because is an orthogonal projection.
is well defined.
Thus, for all implies that
Next consider a random orthogonal projection matrix
. Therefore, by the union bound
independent of , where is a random
-dimensional subspace distributed according to the invariant
measure on . We show that for all (197)

(182)
By Lemma 8
which implies that there exists at least one realization of that
satisfies (180). To that end, we define
(198)
(183)
(199)
and use the union bound

(184) (200)

where the first term . Next we show that the


where (200) is due to , because .
second term is zero. Let . Then
Since , there is a constant such that
. Since is a translation of
, it follows that
(185)
(201)
Thus
(186)

(187)
(202)
7Alternatively, we can use any other tie-breaking strategy as long as Borel
measurability is satisfied. (203)
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3737

where: all but a finite number of s a.s., there exists a (independent


(202): by substituting (200) and (201) into (197); of ), such that
(203): by (188).
Therefore, by the BorelCantelli lemma, for all but a
finite number of s with probability one. Hence
(215)
(204)
Thus, by (192), for any
which implies that for any

(205)
(216)
In view of (187) Integrating (216) with respect to on and by Fu-
binis theorem, we have
(206)

whence (182) follows. This shows the -achievability of . By


the arbitrariness of , (171) is proved.
Now we show that

(207)

holds for all except possibly on a set of zero (217)


Lebesgue measure, where is the corresponding decoder for (218)
defined in (181). Note that
Hence, there exists and an orthogonal projection ma-
trix of rank , such that and for all
(208)
(209) (219)

where: for all . Therefore8 is


(208): by (184); -Hlder continuous. By the extension the-
(209): by (187) and . orem of Hlder continuous mappings [41], can be
Define extended to that is -Hlder continuous. Then

(220)
(210)
Recall from (188) that . By the arbitrari-
Recalling defined in (176), we have ness of , (172) holds.
Remark 3: Without recourse to the general result in Theorem
18, the achievability for discrete-continuous sources in Theorem
(211) 6 can be proved directly as follows. In (184), choose
(212)
(221)
(213)
and consider whose entries are i.i.d. standard Gaussian (or
where: any other absolutely continuous distribution on ). Using linear
(211): by (209) and since holds algebra, it is straightforward to show that the second term in
Lebesgue-a.e.; (184) is zero. Thus, the block error probability vanishes since
(212): by Lemma 7; .
(213): by (206).
Observe that (213) implies that for any Finally, we complete the proof of Theorem 6 by proving the
converse.
(214) Converse Proof of Theorem 6: Let the distribution of be
defined as in (139). We show that for any .
Since , assume . Fix an arbitrary .
Since , in view of (209) and (214), we con- Suppose is -achievable. Let and
clude that (207) holds Lebesgue-a.e. . By Lemma 10, for sufficiently large ,
Finally, we show that for any , there exists a sequence of there exist a Borel set and a linear subspace ,
matrices and -Hlder continuous decoders that achieves com-
pression rate and block error probability . Since for 8 denotes the restriction of on the subset .
3738 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

such that and where . Hence,


. contains linear independent vectors, denoted by .
If is absolutely continuous with respect to Let be a basis for , where by
Lebesgue measure. Therefore, has positive Lebesgue mea- assumption. Since , we conclude that
sure. By Lemma 9, contains an open ball in . Hence, are linearly dependent. Therefore
cannot hold for any subspace with
positive dimension. This proves . Next we assume
(234)
that .
Let
where and for some and . If we choose
(222) those nonzero coefficients sufficiently small, then
and since is a linear subspace. This
and . By (142), for sufficiently large contradicts . Thus, , and
, hence . follows from the arbitrariness of .
Next we decompose according to the generalized support
of
VII. LOSSLESS LIPSCHITZ DECOMPRESSION
(223) In this section, we study the fundamental limit of lossless
compression with Lipschitz decoders. To facilitate the discus-
sion, we first introduce several important concepts from geo-
where we have denoted the disjoint subsets metric measure theory. Then, we proceed to give proofs of The-
orems 911.
(224)
A. Geometry Measure Theory
Then
Geometric measure theory [42], [25] is an area of analysis
(225) studying the geometric properties of sets (typically in Euclidean
spaces) through measure theoretic methods. One of the core
concepts in this theory is rectifiability, a notion of smoothness
(226) or regularity of sets and measures. Basically a set is rectifi-
able if it is the image of a subset of a Euclidean space under
So there exists such that and some Lipschitz function. Rectifiable sets admit a smooth analog
.
coding strategy. Therefore, lossless compression with Lipschitz
Next we decompose each according to which can
decoders boils down to finding a subset of source realizations
only take countably many values. Let . For ,
that is rectifiable and has high probability. In contrast, the goal
let
of conventional almost-lossless data compression is to show
(227) concentration of probability on sets of small cardinality. This
characterization enables us to use results from geometric mea-
(228)
sure theory to study Lipschitz coding schemes.
Then, can be written as a disjoint union of Definition 11 (Hausdorff Measure and Dimension): Let
and . Define
(229)

Since , there exists such that (235)


.
where . Define the
Note that
-dimensional Hausdorff measure on by
(230)
(236)
(231)
The Hausdorff dimension of is defined by
(232)
(237)
Therefore, . Since is absolutely continuous with
respect to Lebesgue measure on has positive Lebesgue Hausdorff measure generalizes both the counting measure
measure. By Lemma 9, contains an open ball and Lebesgue measure and provides a nontrivial way to mea-
for some . Therefore, we have sure low-dimensional sets in a high-dimensional space. When
is just a rescaled version of the usual -dimen-
(233) sional Lebesgue measure [25, 4.3]; when reduces
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3739

to the counting measure. For gives a nontrivial a much stronger condition than the existence of information di-
measure for sets of Hausdorff dimension in , because if mension. Obviously, if a probability measure is -rectifiable,
; if . then .
As an example, consider and . Let be
the middle-third Cantor set in the unit interval, which has zero B. Converse
Lebesgue measure. Then, and [18,
2.3]. In view of the lossless Minkowski dimension compression
results developed in Section V, the general converse in Theorem
Definition 12 (Rectifiable Sets [42, 3.2.14]): is 9 is rather straightforward. We need the following lemma to
called -rectifiable if there exists a Lipschitz mapping from complete the proof.
some bounded set in onto .
Lemma 13: Let be -rectifiable. Then
Definition 13 (Rectifiable Measures [25, Definition 16.6]):
Let be a measure on . is called -rectifiable if (241)
and there exists a -a.s. set that is -rectifiable.
Several useful facts about rectifiability are presented as Proof: See Appendix IX.
follows. Proof of Theorem 9: Lemma 13 implies the following gen-
Lemma 11 [42]: eral inequality:
1) An -rectifiable set is also -rectifiable for .
2) The Cartesian product of an -rectifiable set and an -rec- (242)
tifiable set is -rectifiable.
3) The finite union of -rectifiable sets is -rectifiable. If the source is memoryless and , then it follows
4) Countable sets are -rectifiable. from Theorem 14 that .
Using the notion of rectifiability, we give a sufficient condi-
tion for the -achievability of Lipschitz decompression by the C. Achievability for Finite Mixture
following lemma. We first prove a general achievability result for finite mix-
Lemma 12: if there exists a sequence of -rec- tures, a corollary of which applies to discrete-continuous mixed
tifiable sets with distributions in Theorem 10.
Theorem 20 (Achievability of Finite Mixtures): Let the dis-
(238) tribution of be a mixture of finitely many Borel probability
measures on , i.e.,
for all sufficiently large .
Proof: See Appendix V.
(243)
Definition 14 ( -Dimensional Density [25, Def. 6.8]): Let
be a measure on . The -dimensional upper and lower den-
sities of at are defined as where is a probability mass function. If is
-achievable with Lipschitz decoders for ,
then is -achievable for with Lipschitz decoders, where
(239)

(244)
(240)

If , the common value is called the -di- (245)


mensional density of at , denoted by .
The following important result in geometric measure theory Proof: By induction, it is sufficient to show the result for
gives a density characterization of rectifiability for Borel . Denote . Let be a sequence of
measures. i.i.d. binary random variables with . Let
be a i.i.d. sequence of real-valued random variables, such that
Theorem 19 (Preiss Theorem [43, Th. 5.6]): A -finite Borel
the distribution of each conditioned on the events
measure on is -rectifiable if and only if
and are and respectively. Then,
for -a.e. .
is a memoryless process with common distribution . Since the
Recalling the expression for information dimension in claim of the theorem depends only on the probability law of the
(17), we see that for the information dimension of a measure to source, we base our calculation of block error probability on this
be equal to it requires that the exponent of the average mea- specific construction.
sure of -balls equals , whereas -rectifiability of a measure Fix . Since and are achievable for and
requires that the measure of almost every -ball scales as , respectively, by Lemma 12, there exists such that for all
3740 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

, there exists , with where:


and is -rectifiable and is (257): by construction of in (249);
-rectifiable. Let (259): by (250) and (251);
(260): according to (244);
(246) (261): by (247).
By WLLN, . Hence, for any , there In view of Lemma 12, is -achievable for . By the
exists , such that for all arbitrariness of and is -achievable for .
Proof of Theorem 10: Let the distribution of be
(247)
as defined in (139), where is discrete and
Let is absolutely continuous. By Lemma 11, countable sets are
-rectifiable. For any , there exists such that
(248) . By definition, is -rectifiable.
Therefore, by Lemma 12 and Theorem 20, is an -achievable
Define rate for . The converse follows from (242) and Theorem 15.

(249) D. Achievability for Singular Distributions


Next we show that is -rectifiable. For all In this section, we prove Theorem 11 for memoryless sources,
, it follows from (248) that using isomorphism results in ergodic theory. The proof out-
line is as follows: a classical result in ergodic theory states that
(250) Bernoulli shifts are isomorphic if they have the same entropy.
(251) Moreover, the homomorphism can be chosen to be finitary, that
is, each coordinate only depends on finitely many coordinates.
and
This finitary homomorphism naturally induces a Lipschitz de-
coder in our setup; however, the caveat is that the Lipschitz con-
(252) tinuity is with respect to an ultrametric (Definition 15) that is not
(253) equivalent to the usual Euclidean distance. Nonetheless, by an
arbitrarily small increase in the compression rate, the decoder
(254)
can be modified to be Lipschitz with respect to the Euclidean
where according to (245). Observe that distance. Before proceeding to the proof, we first present some
is a finite union of subsets, each of which is a Cartesian product necessary results of ultrametric spaces and finitary coding in er-
of a -rectifiable set in and a -rectifiable godic theory.
set in . Recalling Lemma 11, is -rectifi- Definition 15: Let be a metric space. is called an
able, in view of (254). ultrametric if
Now we calculate the measure of under
(262)
for all .
(255) A canonical class of ultrametric spaces is the ultrametric
Cantor space [44]: let denote the set of all one-sided
-ary sequences . To endow with an ultra-
metric, define
(256)
(263)
Then, for every is an ultrametric on . In a sim-
ilar fashion, we define an ultrametric on by considering
(257) the -ary expansion of real vectors. Similar to the binary ex-
pansion defined in (6), for and
, define

(264)
(258)
then

(265)

(259)
Denoting for brevity
(260)
(261) (266)
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3741

(263) induces an ultrametric on and . If and each have at least three


non-zero components and , then there is a fini-
(267) tary homomorphism .
It is important to note that is not equivalent to the distance We now use Lemmas 1415 and Theorem 21 to prove The-
(or any distance), since we only have orem 11.
Proof of Theorem 11: Without loss of generality, assume
(268) that the random variable satisfies . Denote by the
common distribution of the -ary digits of . By Proposition
To see the impossibility of the other direction of (268), consider 2
and . As
(269)
but remains . Therefore, a Lipschitz function with
respect to is not necessarily Lipschitz under . However, Fix . Let and . Let be a
the following lemma bridges the gap if the dimension of the probability measure on such that .
domain and the Lipschitz constant are allowed to increase. Such a always exists because .
Let and denote the product measure
Lemma 14: Let be the ultrametric on defined in
on and , respectively. Since and has the same en-
(267). and is Lips-
tropy rate, by Theorem 21, there exists a finitary homomorphism
chitz. Then, there exists and
. By the characterization of fini-
such that and is Lipschitz.
tariness in Lemma 15, for any , there exists
Proof: See Appendix X.
such that is determined only by . Denote the
Next we recall several results on finitary coding of Bernoulli closed ultrametric ball
shifts. KolmogorovOrnstein theory studies whether two pro-
cesses with the same entropy rate are isomorphic. Kean and (270)
Smorodinsky [45] showed that two double-sided Bernoulli
shifts of the same entropy are finitarily isomorphic. For the where is defined in (267). Then, for any
single-sided case, Del Junco [46] showed that there is a finitary . Note that forms a countable cover
homomorphism between two single-sided Bernoulli shifts of of . This is because is just a cylinder set in with
the same entropy, which is a finitary improvement of Sinais base , and the total number of cylinders is countable.
theorem [47], [55]. We will see how a finitary homomorphism Furthermore, since intersecting ultrametric balls are contained
of the digits is related to a real-valued Lipschitz function, in each other [49], there exists a sequence in , such
and how to apply Del Juncos ergodic-theoretic result to our that partitions . Therefore, for all ,
problem. there exists , such that , where

Definition 16 (Finitary Homomorphisms): Let and be (271)


finite sets. Let and denote the left shift operators on the
product spaces and respectively. Let and For , recall the -ary expansion of defined in
be measures on and (with product -algebras). A ho- (266), denoted by . Let
momorphism is a measure preserving
mapping that commutes with the shift operator, i.e., (272)
and -a.e. is said to be finitary if there exist sets (273)
of zero measure and such that
(274)
is continuous (with respect to the product topology).
Informally, finitariness means that for almost every Since is measure preserving, , therefore
is determined by finitely many coordinates in . The fol-
lowing lemma characterizes this intuition in precise terms. (275)
Lemma 15 [48, Conditions 5.1, p. 281]: Let and Next we use to construct a real-valued Lipschitz mapping
. Let be a homomorphism. . Define by
Then, the following statements are equivalent.
1) is finitary. (276)
2) For -a.e. , there exists , such that for
any implies that
. Since commutes with the shift operator, for all
3) For each , the inverse image . Also, for
of each time- cylinder set in is, up to a set of measure . Therefore
, a countable union of cylinder sets in .
Theorem 21 [46, Th. 1]: Let and be probability dis- (277)
tributions on finite sets and . Let
3742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

Next we proceed to show that . By [18, Exercise 9.11],


is Lipschitz. In view of (268) and (263), it is sufficient to show there exists a constant , such that for all
that is Lipschitz. Let
(288)
(278)
that is, has positive lower -density everywhere. By [50,
Th. 4.1(1)], for any , there exists such that
First observe that is -Lipschitz on each ultrametric ball
and is -rectifiable. Therefore,
in . To see this, consider distinct points . Let
. By Lemma 12, the rate is -achievable.
. Then, . Since
and coincide on their first VIII. CONCLUDING REMARKS
digits. Therefore
Compressed sensing, as an analog compression paradigm,
(279) imposes two basic requirements: the linearity of the encoder
(280) and the robustness of the decoder; the rationale is that low com-
(281) plexity of encoding operations and noise resilience of decoding
operations are indispensable in dealing with analog sources. To
Since every closed ultrametric ball is also open [49, Prop. 18.4],
better understand the fundamental limits imposed by the re-
are disjoint, therefore is -Lipschitz on
quirements of low complexity and noise resilience, it is peda-
for some . Then, for any gogically sound to study them separately and in a more general
(282) paradigm than compressed sensing. Motivated by this observa-
(283) tion, in this paper we have proposed an information theoretic
(284) framework for lossless analog compression of analog sources
under regularity conditions of the coding schemes. Abstractly,
(285) the approach boils down to probabilistic dimension reduction
where: with smooth embedding. In this framework, obtaining funda-
(282): by (268); mental limits requires tools quite different from those used in
(283) and (285): by (267); traditional information theory, calling for machineries from di-
(284): by (281). mension theory and geometric measure theory in addition to er-
Hence, is Lipschitz. godic theory.
By Lemma 14, there exists a subset and a Within this general framework, we analyzed the fundamental
Lipschitz mapping such that limits under different regularity constraints imposed on com-
. By Kirszbrauns theorem [42, 2.10.43], pressor and decompressor. Perhaps the most surprising result
we extend to a Lipschitz function . is, as shown in (75)
Then, . Since is continuous and is com-
pact, by Lemma 18, there exists a Borel function (289)
, such that on .
which holds for any real-valued source. This conclusion implies
To summarize, we have obtained a Borel function
that a Lipschitz constraint at the decompressor results in less ef-
and a Lipschitz function , where
ficient compression than a linearity constraint at the compressor.
, such that
For memoryless sources, we have also obtained bounds or exact
. Therefore, we conclude that . The converse
expressions for various -achievable rates. As seen in Theorems
follows from Theorem 9.
512, Rnyis information dimension plays an important role in
Last, we show that
the associated coding theorems. These results provide new oper-
(286) ational characterizations for Rnyis information dimension in
a lossless compression framework.
for the special case when is equiprobable on the support, In the important case of discrete-continuous mixed sources,
where . Recalling the construction of self-sim- which is a probabilistic generalization of the linearly sparse
ilar measures in Section III-C, we first note that the distribu- source model used in compressed sensing (a fixed fraction of ob-
tion of is a self-similar measure that is generated by the IFS servations are zero), we have shown that the fundamental limit is
, where Rnyi information dimension, which coincides with the weight
on the continuous part in the source distribution. In the memo-
(287) ryless case, this corresponds to the fraction of analog symbols
in the source realization. This might suggest that the mixed dis-
This IFS satisfies the open set condition, since crete-continuous nature of the source is of fundamental impor-
and the union is disjoint. Denote by tance in the analog compression framework; sparsity is just one
the invariant set of the reduced IFS . By manifestation of a mixed distribution.
[12, Corollary 4.1], the distribution of , denoted by , It should be remarked that random linear coding is not only an
is in fact the normalized -dimensional Hausdorff mea- important achievability proof technique in Shannon theory, but
sure on , i.e., . Therefore, also an inspiration to obtain efficient schemes in modern coding
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3743

theory, compressed sensing as well as our analog compression APPENDIX II


framework. Moreover, it also provides information about how PROOFS OF PROPOSITIONS 13
close practical encoders are to the fundamental limit. For in- Lemma 16: For all
stance, in lossless linear compression, the achievability bound
on in Theorem 18 can be achieved with Lebesgue-a.e. (298)
linear encoders. This implies that generating random linear en-
coders using any continuous distribution achieves the desired Proof:
error probability almost surely. (299)
As far as future research directions, there are regularity
(300)
conditions beyond those in Table I that are worth studying. For
example, it is interesting to investigate the fundamental limit of Note that for any
bilipshitz coding schemes, i.e., the encoder and decoder both
being Lipschitz continuous. This is a probabilistic version of (301)
bilipschitz embedding in Euclidean spaces, e.g., Dvoretzkys
theorem [51] and JohnsonLindenstrauss lemma [52]. As a (302)
more restricted case, the fundamental limit of linear compres-
sion with Lipschitz decompression is the most desirable result. (303)

Given , the range of is upper bounded


APPENDIX I by . Therefore, for all
PROOF OF EQUIVALENCE OF (17) AND (14)
Proposition 4: The information dimension of a random vari- (304)
able on can be calculated as follows:
Hence, admits the same upper bound and (298)
(290) holds.
Lemma 17 [53, p. 2102]: Let be an -valued random
where is the distribution of and is the -ball of
variable. Then, if .
radius centered at . The lower (upper) infor-
Proof of Proposition 1: Using Lemma 16 with
mation dimension can be obtained by replacing by
and , we have
.
Proof: Let be a random vector in and denote its dis- (305)
tribution by . Due to the equivalence of -norms, it is suffi-
Equation (21) (20): When is finite, dividing both
cient to show (290) for . Recall the notation in (5) and
sides of (305) by and letting results in (20).
note that
Equation (20) (21): Suppose . By (305),
(291) for every and (20) fails. This also proves (22).
(19) (21)
For any , there exists , such that
(306)
. Then
(307)
(292)
(308)
(293)
(309)
As a result of (292), we have (310)
(294) where:
(308): by Lemma 17;
On the other hand, note that is a disjoint (310): by
union of mesh cubes. By (293), we have (311)
(295) (312)

(296)

Combining (294) and (296) yields Proof of Proposition 2: Fix any and , such
that . By Lemma 16, we have

(297)
(313)
By Proposition 2, sending and yields (290). (314)
3744 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

Therefore -norm and vector source:


and . Then

(315)
(325)

and hence (23) and (24) follow.


For general sources, Kawabata and Dembo introduced the
Proof of Proposition 3: Note that concept of rate-distortion dimension in [12]. The rate-distortion
dimension of a measure (or a random variable with the dis-
tribution ) on the metric space is defined as follows:
(316)

(326)
(317)

(327)
(318)
(319) where is the single-letter rate-distortion function of
with distortion function . Then,
The same bound works for rounding. under the equivalence of the metric and the -norm as in (328),
the rate-distortion dimension coincides with the information di-
mension of .
APPENDIX III Theorem 23 [12, Prop. 3.3]: Consider the metric space
INFORMATION DIMENSION AND RATE-DISTORTION THEORY . If there exists , such that for all
The asymptotic tightness of the Shannon lower bound in the
high-rate regime is shown by the following result. (328)
Theorem 22 [20]: Let be a random variable on the normed then
space with a density such that . Let the
distortion function be with , and the (329)
single-letter rate-distortion function is given by
(330)
(320)
Moreover, (329) and (330) hold even if the -entropy
instead of the rate-distortion function is used in the defini-
Suppose that there exists an such that . tion of and .
Then, and
In particular, consider the special case of scalar source and
(321) MSE distortion . Then, whenever exists
and is finite
where the Shannon lower bound takes the following
form: (331)

(322) Therefore, is the scaling factor of with respect


to in the high-rate regime, which gives an operational
where denotes the volume of the unit ball characterization of information dimension in Shannon theory.
. Note that in the most familiar cases we can sharpen (331) to
Special cases of Theorem 22 include the following. show the following.
MSE distortion and scalar source: is discrete and : .
and . Then, the Shannon is continuous and :
lower bound takes the familiar form .

(323)
APPENDIX IV
Absolute distortion and scalar source: INJECTIVITY OF THE COSINE MATRIX
and . Then We show that the cosine matrix defined in Remark 2 is injec-
tive on . We consider a more general case. Let
(324) and . Let be an matrix where
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3745

. We show that each subma- APPENDIX VI


trix formed by columns of is non-singular if and only if PROOF OF LEMMA 3
are distinct. Proof: Since -norms are equivalent, it is sufficient to
Let . Then, where only consider . Observe that is nonin-
and denotes the th order Cheby- creasing. Hence, for any , we have
shev polynomial of the first kind [54]. Note that
is a polynomial in of degree . Also (335)
if for some . Therefore,
. The constant is given by Therefore, it is sufficient to restrict to and
the coefficient of the highest order term in the contribution from in (77) and (78). To see the equivalence of covering by mesh
the main diagonal . Since the leading coefficient cubes, first note that . On the other hand,
of is , we have . Therefore any -ball of radius is contained in the union of mesh
cubes of size (by choosing a cube containing some point in
the set together with its neighboring cubes). Thus,
. Hence, the limits in (77) and (78) coincide with
(332) those in (84) and (85).

APPENDIX VII
APPENDIX V PROOF OF LEMMA 5
PROOF OF LEMMA 12
Proof: By Pinskers inequality
Lemma 18: Let be compact, and let
be continuous. Then, there exists a Borel measurable function (336)
such that for all .
Proof: For all is nonempty and com- where is the variational distance between and and
pact since is continuous. For each , let the th . In this case, where is countable
component of be
(337)
(333)

where is the th coordinate of . This defines By [29, Lemma 2.7, p. 33], when
which satisfies for all . Now we claim (338)
that each is lower semicontinuous, which implies that is
Borel measurable. To this end, we show that for any (339)
is open. Assume the opposite, then there exists a When , by (336), ; when
sequence in that converges to , such that , by (339)
and . Due to the compactness of , there
exists a subsequence that converges to some point in (340)
. Therefore, . But by the continuity of , we have
Using (336) again

Hence, by definition of and , we have , which (341)


is a contradiction. Therefore, is lower semicontinuous.
Since holds in the minimization of (95)
Proof of Lemma 12: Let be -rectifiable and and (97), (99) is proved.
assume that (238) holds for all . Then, by definition there
exists a bounded subset and a Lipschitz function
APPENDIX VIII
, such that . By continuity,
PROOF OF (169)
can be extended to the closure , and . Since
is compact, by Lemma 18, there exists a Borel function Proof: By (100), for all
, such that for all .
By Kirszbrauns theorem [42, 2.10.43], can be extended to (342)
a Lipschitz function with the same Lipschitz
constant. Then where

(334) (343)

for all , which proves the -achievability of . which is a nonnegative nondecreasing convex function.
3746 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

Since a.s., is in one-to-one correspon- APPENDIX IX


dence with . Denote the distribution of PROOF OF LEMMA 13
and by and , respectively. By Proof: By the -rectifiability of , there exists a bounded
assumption, are independent, hence subset and an -Lipschitz mapping such
that . Note that
(344)

By (95) (358)

By definition of , there exists ,


(345) such that is covered by the union of
. Then
where is a distribution on . Denote the marginals of
by . Combining (344) with properties of entropy
and relative entropy, we have

(346)
(359)
and

which implies that


(347)
(360)
(348)
Therefore
Therefore
(361)
(349)
(362)

(350) (363)

which the last inequality follows from (358).


(351)

(352) APPENDIX X
PROOF OF LEMMA 14
(353) Proof: Suppose we can construct a mapping
such that

(354) (364)

(355) holds for all . By (364), is injective. Let


(356) and . Then, by (364) and the -Lipschitz
continuity of
where:
(350): by (346); (365)
(351): by (348), we have
(366)
(367)

(357) holds for all . Hence, is Lipschitz


with respect to the distance, and it satisfies .
(353): by (342); To complete the proof of the lemma, we proceed to construct
(354): let ; then, , by (347); the required . The essential idea is to puncture the -ary ex-
(355): due to the convexity of . pansion of such that any component has at most consecu-
WU AND VERD: RNYI INFORMATION DIMENSION: FUNDAMENTAL LIMITS OF ALMOST LOSSLESS ANALOG COMPRESSION 3747

[5] D. L. Donoho, A. Maleki, and A. Montanari, Message-passing algo-


rithms for compressed sensing, Proc. Nat. Acad. Sci., vol. 106, no. 45,
pp. 1891418919, Nov. 2009.
[6] D. L. Donoho, A. Maleki, and A. Montanari, Construction of message
passing algorithms for compressed sensing, to be submitted to IEEE
Trans Inf. Theory.
[7] Y. Eftekhari, A. H. Banihashemi, and I. Lambadaris, An efficient ap-
proach toward the asymptotic analysis of node-based recovery algo-
rithms in compressed sensing, 2010 [Online]. Available: http://arxiv.
org/abs/1001.2284
[8] P. Schniter, Turbo reconstruction of structured sparse signals, in
Proc. Conf. Inf. Sci. Syst., Princeton, NJ, Mar. 2010.
Fig. 1. Schematic illustration of in terms of -ary expansions.
[9] J. R. Munkres, Topology, 2nd ed. Englewood Cliffs, NJ: Prentice-
Hall, 2000.
[10] A. Montanari and E. Mossel, Smooth compression, Gallager bound
tive nonzero digits. For notational convenience, define and nonlinear sparse graph codes, in Proc. IEEE Int. Symp. Inf.
and for as follows: Theory, Toronto, ON, Canada, Jul. 2008.
[11] A. Rnyi, On the dimension and entropy of probability distributions,
Acta Mathematica Hungarica, vol. 10, no. 12, Mar. 1959.
[12] T. Kawabata and A. Dembo, The rate-distortion dimension of sets and
(368) measures, IEEE Trans. Inf. Theory, vol. 40, no. 5, pp. 15641572, Sep.
1994.
[13] Y. B. Pesin, Dimension Theory in Dynamical Systems: Contemporary
(369)
Views and Applications. Chicago, IL: Univ. Chicago Press, 1997.
[14] B. R. Hunt and V. Y. Kaloshin, How projections affect the dimension
Define spectrum of fractal measures, Nonlinearity, vol. 10, pp. 10311046,
1997.
[15] E. inlar, Probability and Stochastics. New York: Springer-Verlag,
(370) 2010.
[16] A. Rnyi, Probability Theory. Amsterdam, The Netherlands: North-
Holland, 1970.
[17] K. Falconer, Techniques in Fractal Geometry. New York: Wiley,
1997.
A schematic illustration of in terms of the -ary expansion [18] K. Falconer, Fractal Geometry: Mathematical Foundations and Appli-
is given in Fig. 1. cations, 2nd ed. New York: Wiley, 2003.
Next we show that satisfies the expansiveness condition in [19] A. Gyrgy, T. Linder, and K. Zeger, On the rate-distortion function of
random vectors and stationary sources with mixed distributions, IEEE
(364). For any , let . Then, Trans. Inf. Theory, vol. 45, pp. 21102115, 1999.
by definition, for some and [20] T. Linder and R. Zamir, On the asymptotic tightness of the Shannon
. Without loss of generality, lower bound, IEEE Trans. Inf. Theory, vol. 40, no. 6, pp. 20262031,
Nov. 1994.
assume that and . Then, by construction of [21] I. Csiszr, On the dimension and entropy of order of the mixture of
, there are no more than consecutive nonzero digits in or probability distributions, Acta Mathematica Hungarica, vol. 13, no.
. Since the worst case is that and are followed 34, pp. 245255, Sep. 1962.
by s and s respectively, we have [22] P. Halmos, Naive Set Theory. Princeton, NJ: D. Van Nostrand, 1960.
[23] K. Kuratowski, Topology. New York: Academic, 1966, vol. I.
[24] G. Folland, Real Analysis: Modern Techniques and Their Applications,
2nd ed. New York: Wiley-Interscience, 1999.
(371) [25] P. Mattila, Geometry of Sets and Measures in Euclidean Spaces:
Fractals and Rectifiability. Cambridge, U.K.: Cambridge Univ.
(372) Press, 1999.
[26] E. Cands, J. Romberg, and T. Tao, Stable signal recovery from in-
complete and inaccurate measurements, Commun. Pure Appl. Math.,
which completes the proof of (364).
vol. 59, no. 8, pp. 12071223, Aug. 2006.
[27] R. Calderbank, S. Howard, and S. Jafarpour, Construction of a large
ACKNOWLEDGMENT class of deterministic matrices that satisfy a statistical isometry prop-
The authors would like to thank M. Chiang for stimulating erty, IEEE J. Sel. Topics Signal Process., vol. 29, no. 4, 2009.
[28] W. Xu and B. Hassibi, Compressed sensing over the Grassmann man-
discussions and J. Luukkainen of the University of Helsinki for ifold: A unified analytical framework, in Proc. 46th Annu. Allerton
suggesting [50]. They are also grateful for suggestions by an Conf. Commun. Control Comput., 2008, pp. 562567.
anonymous reviewer. [29] I. Csiszr and J. G. Krner, Information Theory: Coding Theorems for
Discrete Memoryless Systems. New York: Academic, 1982.
[30] I. Csiszr, Generalized cutoff rates and Rnyis information mea-
REFERENCES sures, IEEE Trans. Inf. Theory, vol. 41, no. 1, pp. 2634, Jan.
[1] C. E. Shannon, A mathematical theory of communication, in Bell 1995.
Syst. Tech. J., 1948, vol. 27, pp. 379423, 62356. [31] P. N. Chen and F. Alajaji, Csiszrs cutoff rates for arbitrary discrete
[2] D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory, vol. 52, sources, IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 330338, Jan.
no. 4, pp. 12891306, Apr. 2006. 2001.
[3] E. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: [32] H. Shimokawa, Rnyis entropy and error exponent of source coding
Exact signal reconstruction from highly incomplete frequency infor- with countably infinite alphabet, in Proc. IEEE Int. Symp. Inf. Theory,
mation, IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489509, Feb. Seattle, WA, Jul. 2006.
2006. [33] C. Chang and A. Sahai, Universal quadratic lower bounds on source
[4] M. Wainwright, Information-theoretic bounds on sparsity recovery in coding error exponents, in Proc. Conf. Inf. Sci. Syst., 2007.
the high-dimensional and noisy setting, in Proc. IEEE Int. Symp. Inf. [34] R. G. Gallager, Information Theory and Reliable Communication.
Theory, Nice, France, Jun. 2007. New York: Wiley, 1968.
3748 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010

[35] M. S. Pinsker, Information and Information Stability of Random Vari- [53] E. C. Posner and E. R. Rodemich, Epsilon entropy and data compres-
ables and Processes. San Francisco, CA: Holden-Day, 1964. sion, Ann. Math. Stat., vol. 42, no. 6, pp. 20792125, Dec. 1971.
[36] K. Alligood, T. Sauer, and J. A. Yorke, Chaos: An Introduction to Dy- [54] G. Szeg, Orthogonal Polynomials. Providence, RI: AMS, 1975.
namical Systems. New York: Springer-Verlag, 1996. [55] Y. Wu and S. Verd, Fundamental limits of almost lossless analog
[37] R. Ma, On the dimension of the compact invariant sets of certain compression, in Proc. IEEE Int. Symp. Inf. Theory, Seoul, Korea, Jun.
non-linear maps, in Dynamical Systems and Turbulence, War- 2009.
wick 1980, ser. Lecture Notes in Mathematics. Berlin, Germany:
Springer-Verlag, 1981, vol. 898, pp. 230242.
[38] B. R. Hunt and V. Y. Kaloshin, Regularity of of embeddings of in-
finite-dimensional fractal sets into finite-dimensional spaces, Nonlin-
earity, vol. 12, no. 5, pp. 12631275, 1999. Yihong Wu (S10) received the B.E. degree in electrical engineering from
[39] A. Ben-Artzi, A. Eden, C. Foias, and B. Nicolaenko, Hlder continuity Tsinghua University, Beijing, China, in 2006 and the M.A. degree in electrical
for the inverse of Mas projection, J. Math. Anal. Appl., vol. 178, engineering from Princeton University, Princeton, NJ in 2008, where he is
pp. 2229, 1993. currently working towards the Ph.D. degree at the Department of Electrical
[40] H. Steinhaus, Sur les distances des points des ensembles de mesure Engineering.
positive, Fundamenta Mathematicae, vol. 1, pp. 93104, 1920. He is a recipient of the Princeton University Wallace Memorial honorific fel-
[41] G. J. Minty, On the extension of Lipschitz, Lipschitz-Hlder contin- lowship in 2010. His research interests are in information theory, signal pro-
uous, and monotone functions, Bull. Amer. Math. Soc., vol. 76, no. 2, cessing, mathematical statistics, optimization, and distributed algorithms.
pp. 334339, 1970.
[42] H. Federer, Geometric Measure Theory. New York: Springer-Verlag,
1969.
[43] D. Preiss, Geometry of measures in : Distribution, rectifiability,
Sergio Verd (S80M84SM88F93) received the Telecommunications
and densities, Ann. Math., vol. 125, no. 3, pp. 537643, 1987. Engineering degree from the Universitat Politcnica de Barcelona, Barcelona,
[44] G. A. Edgar, Integral, Probability, and Fractal Measures. New York: Spain, in 1980 and the Ph.D. degree in electrical engineering from the Univer-
Springer-Verlag, 1997. sity of Illinois at Urbana-Champaign, Urbana, in 1984.
[45] M. Keane and M. Smorodinsky, Bernoulli schemes of the same Since 1984, he has been a member of the faculty of Princeton Univer-
entropy are finitarily isomorphi, Ann. Math., vol. 109, no. 2, pp. sity, Princeton, NJ, where he is the Eugene Higgins Professor of Electrical
397406, 1979. Engineering.
[46] A. Del Junco, Finitary codes between one-sided Bernoulli shifts, Er- Dr. Verd is the recipient of the 2007 Claude E. Shannon Award and the 2008
godic Theory Dyn. Syst., vol. 1, pp. 285301, 1981. IEEE Richard W. Hamming Medal. He is a member of the National Academy of
[47] J. G. Sinai, A weak isomorphism of transformations with an invariant Engineering and was awarded a Doctorate Honoris Causa from the Universitat
measure, Dokl. Akad. Nauk SSSR., vol. 147, pp. 797800, 1962. Politcnica de Catalunya in 2005. He is a recipient of several paper awards from
[48] K. E. Petersen, Ergodic Theory. Cambridge, U.K.: Cambridge Univ. the IEEE: the 1992 Donald Fink Paper Award, the 1998 Information Theory
Press, 1990. Outstanding Paper Award, an Information Theory Golden Jubilee Paper Award,
[49] W. H. Schikhof, Ultrametric Calculus: An Introduction to p-Adic Anal- the 2002 Leonard Abraham Prize Award, the 2006 Joint Communications/In-
ysis. New York: Cambridge Univ. Press, 2006. formation Theory Paper Award, and the 2009 Stephen O. Rice Prize from the
[50] M. A. Martin and P. Mattila, -dimensional regularity classifications IEEE Communications Society. He has also received paper awards from the
for -fractals, Trans. Amer. Math. Soc., vol. 305, no. 1, pp. 293315, Japanese Telecommunications Advancement Foundation and from Eurasip. He
1988. received the 2000 Frederick E. Terman Award from the American Society for
[51] V. Milman, Dvoretzkys theorem: Thirty years later, Geom. Funct. Engineering Education for his book Multiuser Detection (Cambridge, U.K.:
Anal., vol. 2, no. 4, pp. 455479, Dec. 1992. Cambridge Univ. Press, 1998). He served as President of the IEEE Informa-
[52] W. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into tion Theory Society in 1997. He is currently Editor-in-Chief of Foundations
a Hilbert space, Contemp. Math., vol. 26, pp. 189206, 1984. and Trends in Communications and Information Theory.

You might also like