You are on page 1of 5

Lecture 40 - Rate Distortion

Vedika Garg
210102096
October 17, 2023

Introduction
In this lecture, we will touch upon important concepts such as Rate-Distortion function and
Quantization. When we talk about analogue to digital conversion, there are mainly 3 steps:
Sampling, Quantization and Encoding. In the last few lectures, we saw Sampling as well as
Encoding. So what is the need for Quantization?
After Sampling continuous-time signals to discrete-time signals, we still may get continuous
valued signals that can take any value from the real line. This is not good for Digital Signal
processing where the word length or the “memory” you deal with is finite. As you might have
seen in your Embedded Systems course, the 8085-microprocessor deals with 8 bits word length
or in some cases 16 bits word length so there are a finite number of ways you can represent
your data with or say you have limited resources. In other words, if we consider continuous
signals, we would need an infinite precision to accommodate these samples. But Digital systems
have finite precision i.e. limited number of bits available to represent numerical values. So it
now becomes necessary to quantize these continuous valued signals and the information loss
here becomes inevitable. In such cases, some lossy compression technique has to be employed
and some distortion is introduced. And what we rather focus on is how we can “limit” this
inevitable information loss/distortion.

1 Quantization
To understand Quantization more, consider an example of you going to buy a phone with your
dad. If the price tag says 9499, you would probably read it as 9000 but your dad will read
it as 10k(to be honest, mine would have read it as 9.5k but it doesn’t matter here). Anyway,
both of you are losing out on some information regarding the price of the same phone, after
approximating it as per your convenience. Similarly Quantization is rounding off. Since
you cannot have an infinite precision source, so you round off its values to some extent and
then minimize the rounded-off error.

To summarize Sampling and Quantization:


Sampling converts a continuous-time, continuous-value signal into a discrete-time, continuous-
value signal. This involves capturing specific samples of the analogue signal at discrete points
in time.
Quantization then takes the discrete-time, continuous-value signal obtained from sampling
and converts it into a discrete-time, discrete-value signal. It involves mapping the continuous

1
range of sampled values onto a finite set of discrete values or levels.

Let us consider a Random variable X that comes from some source A, and it needs to be
represented as Y from some source B. Naturally,

B⊂A

Here
X = {a · b : (a, b) ∈ N ∪ {0}}
where . represents decimal point, and

Y = {c : c ∈ N ∪ {0}}

One way this can be achieved is by the simple rounding-off method that you studied in school.
For instance, take a=5, b=6. So c will be the rounded-off value of 5.6 i.e. 6.
So the main motive of this lecture is to quantify the information lost while quantization and
then minimize it.

2 Distortion
A measure of the information that we lose while expressing one random variable in the form of
another random variable is known as Distortion. Now let’s look of what a good distortion is
and it’s properties:
• Good distortion has to be a good approximation to the perception process.
• Good distortion has to be simple enough to be mathematically tractable.

Suppose we have two signals x[n] and x̂[n], so the distortion between them can be represented
in multiple forms:

1. Maximum distortion
D = max |x(t) − x̂(t)|
t

Also known as peak distortion.

2. Absolute average distortion


Suppose there are total N samples of x[n] and x̂[n]:
Z T
1 2
D = lim |x(t) − x̂(t)| dt
T →∞ T − T2

3. Mean squared distortion


Z T
1 2
D = lim (x(t) − x̂(t))2 dt
T →∞ T − T2

2
The Absolute Average distortion, although a useful measure of signal distortion, is discon-
tinuous at zero. Therefore, the mean squared distortion is more commonly used. The mean
squared distortion can be viewed as the power of the generated error signal.

In the case of discrete-valued signals, Hamming distortion can be used to quantify the
error. (
1 if x ̸= x̂
dH (x, x̂) =
0 if x = x̂

When X is a random variable, the distortion of the source output becomes

D = E[d(Xn , X̂n )]

Recall the Source Coding Theorem, we can represent the outcome of a source with 0 error
probability as long as the average code length (also known as the Code Rate) is greater than
or equal to the entropy of the source.
But now we have a source, and not enough resources to represent its entropy. So to accommo-
date for the distortion, we changed the question! We will represent this source(having a larger
entropy) as another source(having a lesser entropy) then there would exist a Code Rate of that
new source greater than or equal to the entropy. Hence the communication will now proceed
with the new source generated from the older source!

Ques: Given a memory-less information source X over the range of X and dis-
tributed with a PMF p(x), that has to be represented in terms of X̂(which is the
approximation of X) over a reproduction X̂ that guarantees the average distortion
between the source and the reproduced sequences to be no more than constant D.

What is the new source code rate so that the average information lost or the mean distortion is
no more than D?

2.1 Rate Distortion Theorem


Theorem: The minimum number of bits per source symbol required to reproduce a memory-less
source with a distortion ≤ D is called the Rate Distortion function. It is denoted by R(D) and
is given by:

R(D) = min I(X; X̂)


p(x̂|x):E[d(X,X̂)]≤D

This is a decreasing function of D. The more we provide resources for accommodating larger
code rates, the lesser will be the distortion. But what the rate distortion function actually
represents is the minimum mutual information for all possible distortions of X̂|X such that the
distortion is upper bounded by D.

In our problem, we need to convert continuous-value signals into discrete-value signals. How-
ever, we have only defined entropy and mutual information for discrete sources so far. When
we have a discrete-time, continuous-alphabet source that outputs real numbers, we cannot use
entropy, which only has an intuitive meaning for discrete sources.
To address this issue, we define another quantity called differential entropy, which is
similar to entropy but can be used for continuous sources.

3
3 Differential Entropy
Differential Entropy of a Continuous Random variable X with pdf f (x) is defined by:
Z ∞
h(X) = − fX (x) log fX (x) dx
−∞

where 0 log(0) = 0

3.1 Uniform Random Variable


The differential entropy of a random variable X uniformly distributed on [0, a] is given by
Z a  
1 1
h(X) = − log dx = log(a)
0 a a

Randomness increases as “a” increases and can be negative as well for a < 1, which is in
contrast to the non-negativity of the discrete entropy.

3.2 Gaussian Random Variable


The differential entropy of Gaussian random variable X with 0 mean and variance σ 2 :

X ∼ N (0, σ 2 )

Given that the probability density function (PDF) of a zero-mean Gaussian distribution is:
1 x2
f (x) = √ e− 2σ2
2πσ 2
Substituting this PDF into the entropy formula:
Z ∞  
1 2
− x2 1 2
− x2
h(X) = − √ e 2σ log √ e 2σ dx
−∞ 2πσ 2 2πσ 2
Simplifying the logarithm term and separating integrals:
Z ∞ Z ∞
1 −x2
h(X) = − f (x) log √ dx − f (x) log e 2σ2 dx
−∞ 2πσ 2 −∞

The first integral integrates to a constant:


Z ∞ 2
1 2 x
= log(2πσ ) + 2
log(e)f (x) dx
2 −∞ 2σ

Taking log(e) common and x2 = σ 2 :


1 1
= log(e)[ ln(2πσ 2 ) + ln(e)]
2 2
1
= log(e)[ ln(2πσ 2 e)]
2
Therefore, the differential entropy of a Gaussian random variable with mean zero and variance
σ 2 is given by:

4
1
h(X) = log(2πσ 2 e)
2
We can observe that as the variance increases, randomness increases and also zero variance
implies non randomness.

Extensions of the definition of differential entropy to joint random variables and conditional
differential entropy are straightforward. For two random variables X and Y, we have:
Z ∞Z ∞
h(X, Y ) = − f (x, y) log f (x, y) dx dy
−∞ −∞

and
Z ∞ Z ∞
h(X|Y ) = − f (x, y) log f (x|y) dx dy
−∞ −∞

and

h(X|Y ) = h(X, Y ) − h(Y )


The mutual information between two continuous random variables X and Y is defined simi-
larly to the discrete case as:

I(X; Y ) = h(Y ) − h(Y |X) = h(X) − h(X|Y )

4 Summary
Now we have two pieces of the puzzle. One, we have the Rate-Distortion function in its
mathematical form and we still need to interpret it. Second, we have the idea of differential
entropy and mutual information for continuous-valued signals. In the next lecture, we’ll put
these two pieces together and try to see the logic behind Quantization.

You might also like