You are on page 1of 4

Information Theory

Professor Himanshu Tyagi


Indian Institute of Science, Bangalore
Lecture 12
Unit 2: Review

(Refer Slide Time: 00:21)

Hello, welcome to the third week of this course of Information Theory. To begin by a review of
what we saw last week. So, last week, we introduced the first main concept of this course,
namely that of Shannon entropy which for a pmfp on an alphabet x so this p is a probability mass
function on this calligraphic x which is an alphabet to finite size. So, this Shannon entropy H of p
was defined as summation over x, p of x log 1 over p of x this log is to the base 2.

So, we introduce Shannon entropy as an answer to the question that we raised. We were seeking
a measure of uncertainty. And we postulated that one way to capture uncertainty in a random
variable is to characterize the number of bits required to represent it. In that context, we saw
various measures of how compressible a random variable is. We saw this quantity L of p, L bar
of p. So, this was the minimum average length of a one to one map that we can use to represent a
random variables.

So, if you think of an encoder (a) e which looks at this a letter from this alphabet and maps it to a
sequence, a binary sequence of finite length. This is the set of all such binary sequences. So, e
basically takes this x and outputs e of x where e of x could look like, something like 0, 0, 1 any
finite length binary sequence so and so forth and the length of this sequence is 4 this sequence
like this and different sequences will have different length. So, this L bar of p was defined as the
minimum of the average length.

Now, what is the average length? Take average with respect to px of the length of this sequence.
So, length of a sequence I am just denoting this by the length of a sequence. This is the number
of bits in the sequence. So, minimum of all these lengths minimized over all one to one map e
okay e for encoder, okay.

So, if it is a 1 to 1 map, then by looking at the sequence, you can directly you can exactly recover
a, original alphabet and the minimum over that average length was denoted by L bar p.

(Refer Slide Time: 02:56)

Another quantity we defined was this L epsilon, which was a related notion, but slightly
different. This was defined as the minimum number of bits required to store almost all the
sequences so it can be written as the minimum cardinality of a set A such that the set A has a
large probability under p or that this set A is subset of this alphabet. So, the idea I should put a
log here minimum cardinality or the minimum number of bits. So, log of A and then you take a
C because you want to only have integral such that.
So, this is the minimum number of bits that you need to store all the sequences in the set A. And
which set A are we looking for? We are looking for the smallest set A such that its probability is
greater than 1 minus epsilon this can be have a notion of how compressible pmf is. And what we
observed last time was that, in fact, both these quantities I did not give a formal proof I just gave
a null, just a heuristics argument.

But it was convincing I hope that both these quantities are roughly similar modular this
dependent on epsilon and they are roughly similar to H of p. That is how H of p came in.
Actually we a give a formal lemma, which gave a bound for this epsilon L epsilon p it gave a
method for finding this set A. It just said, look for all the sequences A for which log 1 by px is
not, not too large. So, these sequences are at least sufficiently likely.

So, all those sequences if you retain then this set, if it has a large probability the set of all these
sequences, then you can just retain these sequences and the number of sequences in this set
cannot be more than 2 to the power lambda. So, we use this to relate L epsilon p to H of p. So,
this is something that we will see later in the course and formal.

But the upshot here is that the number of bits to which a sequence is can be compressed is related
to the entropy of… So, the number of bits which are random variable can be compressed in the
entropy of the track and this is one operational meaning, okay meaning of entropy. That is how
we introduce entropy, okay. Towards the end, we also talk about random hashing, and that is
another concept that will be able to use later in the course.

And we gave up one particular use of random hash, we showed that a random hash we defined a
random hash is what is sometimes called collision resistance hash. And what we showed was that
random hash of appropriate length for random hash of appropriate length it will be very unlikely
that two different sequences get mapped to the same value. Therefore, it is almost a one to one
map and it can be used for compression. So, we introduced compression be related entropy to it
that is what we were last week.

This week will move forward and we will relate entropy to another notion of uncertainty a
slightly different notion last time a notion of uncertainty was how much you can compress a
random variable. This week we will explore another natural notion of uncertainty. How many
random bits can you extract from a random variable?

So, how many random bits, okay can you extract from random variable. And it looks like it is a
very different notion of uncertainty and it may have a different answer than entropy but what we
will see this week, is that even this different notion of uncertainty is fundamentally related to
entropy of a random variable, thereby giving another operational meaning of entropy.

And since this quantity entropy appears in so many different context related to uncertainty, in a
random variable. This is the fundamental concept which captures uncertainty in random variable.
So, see you in the next lecture.

You might also like