You are on page 1of 18

Information Theory

Professor. Himanshu Tyagi


Indian Institute of Science, Bangalore
Lecture No. 09
A Compression Problem

(Refer Slide Time: 0:20)

Now that we have seen enough examples for compression, it is time to come up with some
mathematically sound definitions of minimum limits of compression. So, we will present two
different definitions; the first one here, the first one here tries to capture the minimum number of
bits required, the minimum number of bits required to represent almost all symbols from the set,
calligraphic X, remember we have the set calligraphic X which represents all possible symbols,
all possible words that we are trying to store.

We are looking for all subsets of this set which have a large probability and among those, we are
looking for the subset with the smallest cardinality possible. That is what this min here says, so
this definition here, it is a mathematical definition, it says that let L epsilon P, this quantity will
come up again and again in this course, so please remember this, this L epsilon P is the minimum
number of bits required to represent subset A of significant probability, probability greater than 1
minus epsilon.

So, this epsilon is the same as some epsilon here, you can think of it as a number 0.01, so we are
allowing you to ignore a symbol with probability less than 0.01. It is just an example but the
definition can be for any arbitrary epsilon and among all, so this probability irrespective to this P,
the model that you have formed, we saw one example of such P earlier and you want to find the
subset which has large probability with respect to P and has smallest cardinality, what is the
minimum number of bits required to the present, which is subset.

That is a natural definition of the minimum number of bits required to represent a random
variable generated from this distribution P. So this is actually, if you remember a table 1
example, this is actually the minimum number of bits required to represent a table 1. So we just
formed a table and we, so remember this example we will form two tables. We kept all the more
frequent symbols in table 1 and then we kept less frequent symbols in table 2. Table 2 was a set
of small symbols which can be node, this is the minimum number of bits required to represent
table 1. That is what this definition is.

(Refer Slide Time: 2:45)

Now this is one possible definition, the second definition here is a, is related but slightly
different, it is the minimum variable, minimum length of a variable length code. What do I mean
by that? Instead of, in this previous definition, I represent all the symbols in the subset A with a
same length vector, same length binary vector of log 2 cardinality, this is I use the same length
binary sequences for all symbols and then I ask for minimum length of, minimum length of
vectors required to store all elements of A for a subset A with significant probability.
Now alternatively I can allow for mapping e, e for encoder because I encode my symbols from X
into binary sequences, this is a 1 to 1 map, so that I can look at a binary sequence and come back
to the symbol here and I can look at this symbol and come back to this right here, by the way this
notation here, if you have not seen it, this is the set of all binary sequences, set of all binary
sequences of finite lengths.

So, 00, 01, 10, all finite length binary sequences, so this takes a symbol, this this encoded takes a
symbol and represent, represents that as a binary sequence. And it is a 1 to 1 map, so that I can
look at the sequence and come back to the symbol. So this is a notion of a compression scale.
This mod of e x represents a length of a binary sequence. So, this guy here is length of a binary
sequence e x.

So, what is this quantity? It is the average length of the binary sequence, so what this L bar, bar
for average, L bar P denotes, the minimum expected length of, of a 1 to 1 representation of the
symbols in x. So, we do not ignore any symbols in this new notation, in this new notion of,
notion of minimum length of a of a code that can be used to compress the source, so we do not
ignore any symbols, but we only consider average length as a question of previous definition,
which considered worst case length but allowed you to ignore symbols with probability less than
epsilon.

So, these two definitions are very similar, but they are slightly different too and we will see in
what sense they are similar and in what sense they are differ. So, this L bar P is a second
definition I want you to remember, it is a minimum average length of a code which represents X
as bits. So, these quantities have been defined mathematically and they are being motivated by
our search for measure of uncertainty, we are looking for a measure of uncertainty. That is why
we have defined these two quantity.

And we do not put any practical constraints, we do not say that this encoder e has to be
(computed), computation in light, nothing like that. Because right now our goal is to not come up
with a compression scheme, our goal is to come up with a measure of uncertainty. Later when
we visit compression and, compression in detail we will look at schemes which are also practical
and somehow match, have average length matching these fundamental quantities.
(Refer Slide Time: 6:01)

So, we have defined these two fundamental quantities, L bar P and L epsilon P, L epsilon P is the
minimum, worst case (proba), worst case length of a binary sequence that can represent almost
all, a fraction 1 minus epsilon of symbols in X. And L bar P is the minimum average length of a
mapping which represents all the symbols in X. So how do we, what do you think are the
schemes which achieve these, these quantities? What schemes are performance that matches
these fundamental quantities?

I will urge you to take a pause and think about it, of course I will describe it right away. So now
that you have thought about it, so we already saw such a scheme. Basically what we do is we
arrange these probabilities P1 to Pk, let us say there are k symbols in a decreasing order. So, this
bracket here represents the first, the highest probability symbol, P2 represents the second highest,
Pk, note that it may not be P1, it can be P100 or P20 or P30, so this is the same sequence P1 to
Pn.

So, you have these probabilities P1 to Pk, the k symbols, so I am assuming here that x has k
symbols. So, you arrange these probabilities in a decreasing order and now what you do? If, so
this i denotes symbol with the ith highest probability that is my notation here. So, once you
arrange it, what you do is, for the minimum, for a scheme which attains this, you take the symbol
with highest probability, assign to it the shortest sequence, namely the sequence with length 0,
sequence with length 1 which is let us say 0.
Then again you have one more sequence with length 1, you assign it to this guy and you keep on
going, each time assigning the shortest available sequence. We will see shortly that this scheme
is actually optimal and attains L bar P. In fact, this scheme is also optimal for attaining L epsilon
P, the scheme attains both of them, except that when you are striving to attain L epsilon P, you
do not keep on going, you stop once you have collected enough symbols to have probability
greater than 1 minus epsilon.

But the principle is the same. Arrange the symbols in decreasing order of probabilities and start
assigning shorter sequences to symbols with larger probabilities. That is the idea, so we just
described this informally and now we will see formal statements claiming the optimality of the
scheme.

(Refer Slide Time: 8:39)

In fact, we will, a formal statement will be about these quantities, L epsilon P and L bar P. So
here is the first theorem I am showing you in this course, I will also use this opportunity to write
a formal proof. So get ready to get your mathematical hat and start writing a proof. So, what is
the statement this statement characterizes epsilon P.

One second, this characterize L epsilon P, here epsilon is some quantity, which is between 0 and
1 but is fixed. So, you can take examples like epsilon equal to some small quantity, like 0.1 0.01,
it is customary in math to denote the small quantities by Greek alphabet, epsilon naught delta, if
you have not seen that, so, just do not be surprised that I am using this Greek alphabet epsilon
delta. It is customary to represent small quantities by this epsilon delta's.

So, L epsilon P, this theorem claims is exactly equal to the seal, this is the seal. So, it is the next
integer, next integer after this quantity log of N epsilon P. Well, what is N epsilon P well, N
epsilon P collects these symbols in decreasing order of probability. Remember this notation,
collect L of them, the first L of them till it crosses the probability 1 minus epsilon.

So, it puts it is making a bucket, just I, just to pictorially represent what the statement is saying it
is a bucket. You add symbol to this bucket one by one. So, first you add the symbol P1, namely
the symbol with highest probability, then you see, you are trying to fill this bucket till here, this
is the probability you wanted to have 1 minus epsilon probability, you add P1 to it.

You see if has it filled up this bucket, if not, then you add the next symbol. So, you keep on
doing this, you keep on adding one by one, and you add Pl, you stop at the smallest Pl, at which
you exceed this volume 1 by epsilon. And that is when you stop. That is what is N epsilon P is,
minimum number of symbols that you can fit in 1 minus epsilon probability that you require to
cross that 1 minus epsilon probability.

So, you will take all these symbols, you will look at the set of the symbols and just look at the
number of bits required to represents just these symbols. That is what is N epsilon P, that log of
N epsilon P seal is. And you will ignore all the other symbols and the claim is that this is
optimal, this L epsilon P is equal to this. So, we will now try to prove this claim.

Let me make some more space, so this is actually not a difficult proof. Nonetheless, I would like
to give you some details for it. So, when you see an equality like this, and you are asked to prove
that equality, keep in mind that it is hiding two statements. So, it is hiding a statement that this
quantity on the left, the quantity that you defined operationally, so what do you mean by defining
operationally?

This is a quantity we gave some physical meaning to, it is the minimum number of bits required
to the present a source with probability P, a probability distribution P, that is an operational
quantity, it can be described in words and it has meaning in practice. So, this operational
quantity, we are saying equals to this has a formula like this. So, we will have to establish two
parts, first part is that this quantity.
So proof, I only give a proof sketch, and you can go back and read this notes where all the details
will be given. So, first part of the proof will claim that actually, L epsilon P, is less than this
thing on the right. It is the first part of the claim that it is less than, of course, to show that it is
exactly equal to, we will also have to show that it is we come to this later, we also have to show
that L epsilon P is greater than, cannot be less than, is greater than this.

Both are needed to claim this. So this is something which you have to keep in mind, whenever I
make a claim like this for equality, you have to understand that there are two sides of this. There
are two statements hidden there; one is that it is smaller and the other one it is greater. So, it
cannot be, it is at most this much, and it cannot be less than this, either two statements, and we
this is how we will prove.

So, do not be surprised that why do we have two parts of the proof on equality statement, this is
something which will happen again and again in this course. So, one more thing. Now what does
this statement mean? L epsilon P is less than equal to this. Remember, L epsilon P was defined
as minimum over all sets. So, this statement means that there is a set, so let me write a definition,
just to refresh your memory L epsilon P was defined as a minimum of the number of bits
required to represent a set, this is the cardinality of a set, this notation here is the cardinality of a
set.

We look at all possible subsets of our alphabet, such that P of A is greater than 1 minus epsilon.
Among all subsets what is the minimum value of this, that is what our definition was. Now, what
this says is there is a subset A, which has cardinality this much and therefore, the minimum can
only be smaller. So, you have to just find a subset. So, this upper bound here says that we have to
give a scheme, you have to give a subset which attains this. So, what is that subset?

Well, that subset we already wrote down, this subset exactly is that subset, the minimum L which
fills this bucket. So is it, so let us quickly check if it is, is there is such a subset? Yes, indeed, we
are crossing the probability 1 minus epsilon. So, this subset here, so let me write, for A equal to,
so remember this notation, it is 1 is the symbol with first the highest probability, 2 is the symbol
with second highest probability.

And you keep on going till till what number, till you go till this number. Again, take the symbol
with this, let us say this is 100, hundredth highest probability. And the claim is this symbol will
cross 1 minus epsilon probability. That is the claim. That is the definition actually of this N
epsilon P and so the cardinality of this subset A is clearly N epsilon P, also the probability of this
subset A by definition here is greater than or equal to 1 minus epsilon.

So, implies that this, this gives you that, so if you look at all the observations there this implies
that L epsilon P can only be smaller than, which is exactly equal to this guy here. So, this is the
first part of the proof. So, we have shown this part, it is actually quite obvious once you define
the subset.

(Refer Slide Time: 15:54)

Now, for the other part of the proof, to claim that L epsilon P is greater than this, how do we
show that? Now, this is an (interesting), this is an interesting part of the proof. This is asking you
to show that no matter which subset you choose, it must have cardinality greater than equal to
this, that is what it says. So, suppose, so A is any subset which has probability greater than 1
minus epsilon. So, what did we do?

We take this i in A so this i goes from 1 to let us say, how do I write it? So, let us let us give
some notation for symbols in A. So, let us say A has the symbols i1 to i L. So, A has L elements
denoted by i1 to i L. So, what is P of A, then P of A is equal to summation i equal to 1 to L P of
let me call it something else j equal to 1 to L P of ij, that is what this probability is and it crosses,
it crosses 1 minus epsilon.
Now, we would like to claim that so, let us see, so what do I put it? So, whatever these symbols
are, they, the sum of their first, first probabilities cannot be less than this sum. This is some
simple observation, but interesting. Remember, this is the jth highest probability, why? This is
easy to see that if you sum any l symbols, their probability must be a, cannot exceed the sum of
the probabilities of the same l symbols with highest probabilities but this inequality must hold.

So, what it means is that, if you sum the first l highest probability symbol that, that that
probability exceeds 1 minus epsilon, but then by a definition of N epsilon P, N epsilon P is a
smallest l at which this probability exceeds 1 minus epsilon. So, this l can only be larger than N
epsilon P. And therefore, log of, so from now on I will not put this subscript here, all the logs in
this course are the, are to the base 2, because you want to think of representing thing in (big),
things in bits.

So, what it says is log of l epsilon P is less than what is this l, let me write it, log of l, but this l is
exactly equal to, so this l is exactly equal to cardinality of A. So, any set A which has this
probability must have this exceeding this and therefore, the minimum cardinality of such a set
also exceeds this. That is the proof. So, since we showed that this any subset A with probability
greater than 1 minus epsilon must have log of cardinality of A seal exceeding this guy. From
that, it follows that, the minimums over all subsets of this cardinality must exceed form that is
how we have shown this. So, this is the second part of the proof, so our proof is complete.

So what have we shown, we have shown the, this theorem that L epsilon P is equal to log of N
epsilon P where N epsilon P is the smallest l such that if you add l highest probability elements,
that probability exceeds 1 minus epsilon. In fact, this tells you what set A to use to, to store if
you are just wanting to store elements with probability 1 minus epsilon, and want to ensure that
the number of bits required to store is the least possible. So, we have sort of a formula for N
epsilon P, L epsilon P. Now, let us go to the second quantity we saw.
(Refer Slide Time: 20:43)

So, the second part of this theorem claims to characterize this quantity L bar epsilon. Remember
this L bar epsilon, L bar P, it is the minimum average length of a code that represents the
symbols in P using the variable lengths that represents this random variable X generated from P
using variable length sequences. And the claim is we again have this precise formula for it. Note
that this P bracket i is a notation from earlier where screen up this here, so this is P i,

So remember, this is the probability of the symbol with ith largest probability. So, the claim is
that this formula can also be written precisely k is the number of symbols in our alphabet. So, we
are throughout assuming that this is equal to k, multiplied with this guy here. So, what is this guy
can, can someone think about this quantity here? So, so it is a good time to pause and think about
it.

Alright, let me, now, now that you thought about it, let me describe what this quantity is. So, this
log of i by 2 plus 1, this quantity actually is the, it is the minimum integer j, such that, if you add,
if you look at all binary sequences of length less than or equal to j, how many sub sequences are
there well l equal to 1 to j, 2 to the power l. So, to this formula is very simple, there are 2 to the
power l sequences of length l. If you look at all binary sequences of length 1 to j. And you are
looking for these sequences and you want to collect how many sub sequences, you want to
collect i sub sequences.
So, so what does it mean? So, so this just once again, what is this formula here, this I am
claiming is equal to this, it is the minimum number of binary sequences, minimum length of a
binary sequences that I need to represent i symbols. So when i equals to 1, just as an example,
then you can represent it by just one binary sequence 0, binary sequence of length 1 0, and so
this j becomes the minimum j becomes 1, and i equal to 2.

Once again, you can just represent both, both the symbols 1 and 2 using a single binary, binary
sequence of length 1, so you can have two symbols 0 and 1. So in that case, again, for i equal to
2 again, this is 1, so we can check for i equal to 1, this is half plus 1, so it is again, this is just, so
what is log of half plus 1 seal, well, that is just 1. What is log of, when i equal 2, so this becomes
1, 1 plus 1, 2. So one seal is again 1, so 3, well it goes to 2, 4, 4 by 2 is 2, 2 plus 1, 3 seal again 2.
5, 5 by 2 plus 1 seal, again 2.

Now, 6, how many (bina), what is the maximum length of binary sequence that above the
minimum length of binary sequences you need to represent 6 different symbols. So 6 by 2 is 3, 3
plus 1 4, the log of 4 with the base 2 is 2. So, this is 2. So, that is what this formula is. It is the
minimum length of binary sequences required to represent i distinct symbols. It is a bit of a
mouthful, but I hope you understand what can we, just try to think of this quantity and write
some examples and you will come to this.

So, the claim is this L bar P is exactly equal to this. So once again, we will try to prove it, that L
bar P is exactly equal to this. So, let us make some more space for ourselves. All right, so here is
proof. In fact, this proof I will not give you all the details, it is a little bit involved. But in my
notes, I am providing all the details. So, one side of this proof is very easy. As I said the first part
of the proof is to claim that there is a scheme which attains this in the sense that L bar P is less
than equal to summation i equal to 1 to k Pi log i by 2 plus 1.

So, how do we attain this, which scheme attains this? It is a very simple scheme. First, you
arrange the symbols in their decreasing order probability, it is the scheme we discussed earlier
and assign, so maybe I will say this way. So, you move in this direction and assign the shortest
available binary sequence. So for symbol 1, you assign the sequence of length or the shortest
available binary sequence, which is 0.
So, you go from the symbol with largest probability and assign sequences, the shortest available
sequence. So, this first symbol gets sequence 0 assigned to it, the next symbol gets sequence 1
assigned to it and you keep on going and then the last symbol will get some sequence assigned to
it. Now, what is the length of the sequence that gets assigned to symbol with probability i, this is
exactly that formula.

The length of the sequence that in this order gets assigned to symbol with i largest probability is
exactly this. That is why this formula came out. There is no magic to this formula. And this is the
exact expression, you can convince yourself that is what it is. So, this shows that there is a
scheme, there is an assignment, which gets you this thing on the right, so the best assignment can
only be smaller. So this is simple, now we have to claim that no other assignment can do better.

(Refer Slide Time: 27:04)

v
On this part of the proof is a little bit involved. So I will only give some high level idea. So what
we want to claim is that the best you can do can never beat the scheme that we discussed, that is
what we have to show, just writing it here. Alright, so how do we show this? Well, we use some
induction to prove it, you can prove it in various ways, I will prove it using induction. So what I
will claim is that consider any scheme, your favorite scheme.

So what is the scheme means, scheme is a mapping e, which looks at this alphabet and assign to
it binary sequences. And this e is 1 to 1, in the sense that it assigns a unique sequence to each
symbol, so that you can look at the symbol and come back to the sequence. Of course, you can
look at this symbol and go to the sequence, it should also be the other way around, you should be
able to invert it. That is what, that is what we assumed about this e.

Now, let li denote the length of the sequence that is assigned to the symbol with probability Pi
the ith highest probability symbol, the length of sequence assigned to a symbol i. So, remember
that this symbol i is the symbol with probability Pi. So, that is what this li is. So, any scheme we
will also assign some sequence to this symbol. Of course, the scheme we saw earlier had
assigned the shortest available sequence, short to the symbol.

Now this any other scheme will also have some other sequence assigned to the symbol, let li be
the length of that sequence. That is what that notation is. So claim is, so claim, claim is
something we will show that is what we mean by a claim that if you look at the first t symbols
with highest probability, these are the first t symbols with higher probability. And look at the
average length for this guy, this is greater than i equal to 1 to t Pi log of, sorry, so log of i by 2
plus 1, for every t.

So, why am I writing this in a strange way, of course, I only need to show this what t equal to k,
because I want to prove this claim by induction. So, if you have not seen proof by induction,
well, yeah, you probably need to read a little bit of, you need to brush up a little bit of proof
writing. I assume you are familiar with these kind of things.

So, how do we show it? So, this clearly holds for, this clearly holds for t equal to 1. Because P1,
L1, whatever you are, so what happens with this guy for i equal to, for t equal to 1, this guy is
just 1, So, P1, L1 must be greater than equal to P1, 1 because L1 must, is greater than equal to 1.
So, this holds for t equal to 1.

Now you suppose this holds for some t. And from there, you show that it, if it holds for t, then it
must also hold for t plus 1. That is how he will do that proof. So, I will not give details in the
class. But you can take a look at the notes later, I will provide complete notes on this. So, if it
holds t, it must hold for t plus 1. That is how you show this.

So what this proof, therefore says is that no matter what you do, no matter what scheme you use,
you can never do better than this. And therefore, this quantity must equal to this. So, this also
says in effect that this scheme that we saw, we just arranges the symbols in decreasing orders of
their probability and assigns sequences one by one from starting from the left, and always
assigned the shortest available sequence.

This is optimal. Also, remember that even for L epsilon P, a similar scheme was optimal, you
again, arrange these odd symbols in their decreasing order of probability, and took the symbol
and put it in a bucket to the next symbol, put it in a bucket kept on doing this until your bucket
had a proper symbols worst probability 1 minus epsilon.

And then you just tore those symbols, look at the minimum length of (seq), binary sequences
required to represent all those symbols. So, that is also very similar scheme. And that retains we
claim L epsilon P. So what have we seen, we have seen just a quick summary where we are till
now.

(Refer Slide Time: 32:18)

So, so we saw these two quantities, L epsilon P and L bar P and schemes. So, I will say schemes
that attain this, which means schemes that are optimal or the schemes that attain these. And
generally, the template of both the schemes are are similar. And this is the principle they follow
assign shortest or shorter sequences to symbols with largest or larger probability that is the
template of our scheme, larger probability.

These are the template of the schemes. This is, both the schemes follow this principle. It is a very
natural principle. And now we have given a mathematical theorem, which shows that this indeed
is the optimal scheme. All right, so just one more thing before we close this video.
(Refer Slide Time: 33:52)

So, we saw these two different quantities, L epsilon P and L bar P, how are they related? So here
is an exercise. You can think about this exercise. So, it claims that these quantities are closely
related, it claims that L epsilon P is actually almost L bar P divided by epsilon. So, if you do not
worry about dependence on epsilon, these quantities, quantities are similar. It is bounded above
by this plus 1, we know this plus 1 is constant in (())(34:13).

So, suppose this thing is 10,000. And epsilon is 0.1. So, this is well 10 times more than this, but
no more. So this, there is a relation between these two. Also, this is not L epsilon P, cannot be
too much less than L bar P either, it is L bar P by 1 minus epsilon and some correction. So, if you
ignore this dependence on epsilon, these two quantities are very similar, which is something nice
because otherwise you would always worry which definition should I keep in mind.

Remember, these definitions were made up to mathematically capture the minimum number of
bits required to represent symbols with probability distribution P, a source with probability
distribution P and this exercise is very pleasing because it says that the two quantities we define
will at the end of it have very similar behavior.

Now, maybe you can take a pause and try to solve this exercise. And I will solve it here. But I
asked you to, I urge you to again, take a pause and solve it. I will give a hint of how to solve this
exercise right away. Now, this is a tough exercise. So, now that you have tried solving it, let me
just give you a simple hint. I will also give a part of it as a homework.
So, what is this inequality saying this inequality is telling you that one of them can be bounded
by the L epsilon P is bounded by L bar P by epsilon. So it is, an upper bound for L epsilon P is a
scheme. So, what it is saying is you can come up with a scheme of for L epsilon P, which has
length less than equal to this. So, which symbols will you keep? That is the question which
symbols will you keep?

So suppose, so consider a scheme which attains L bar P. So suppose, I am showing this part of
the inequality. L bar P, equals to summation i equal to 1 to k Pi li. So suppose, it is attain by this
particular assignment li. Then you can think of this li as a random variable and this L bar P is the
expected value of this random variable. So, I am writing it in that form. So, what is this random
variable L? It takes value li with probability Pi. So, that is how we are viewing this L bar P.

Now, if it is, this random variable, recall, Markov’s inequality. So, Markov inequality tells us
that the probability that this random variable L, this is the length exceeds any fixed length l,
small l. maybe too many L's around, so we will change the symbol to something else, let us say
any fixed lambda, this is a non-negative random variable, lengths are always non-negative. This
probability is less than equal to the expected value of this random variable divided by lambda.

Equivalently, equivalently probability that L exceeds this expected value of this random variable
divided by epsilon, that probability is less than equal to epsilon. So, what is this set here, this is
the set we were looking for actually. So, if you look at all the i, for which, this random variable is
li, remember for which li is less than equal to E L by epsilon, that set, that set has probability
greater than 1 minus epsilon, because the complement of that set is probability less than epsilon.

Because this set must have probability greater than 1 minus epsilon. And therefore, all you need
to do is just keep all the sequences in this set. But all the sequences in this set can have length
less than equal to this. And therefore they can be represented by length less than equal to this
plus 1 here, because this need not be an integer. So you take a seal, but in any case, it is less than
equal to this plus 1. Therefore, this is less than equal to this, very simple idea.

How about this side of the bound, so it is a similar bound, you have to take a scheme which
attains this and convert it into a variable length scheme, which variable and scheme and claim
that that scheme has performance, this plus this into 1 minus epsilon. And when you do that, then
you will have an upper bound for L bar P. That is the other side of the proof.
It is a tricky exercise, do not feel too disheartened if you cannot solve it, but it is an important
exercise for the following reason; what we are claiming here is L epsilon P is approximately
equal to L bar P, if we ignore this dependence on epsilon and the small constants that we add
here and there.

So, the upshot of all this is that we defined these two notions of minimum length to which we
can compress a source with (())(39:55) and we gave a very precise formula for both of them and
we claim that these two formula (conc) coincide. Is this a pleasing answer as a measure of
uncertainty? Well, in one sense it is we have a close form formula.

But in another sense, it is not a very nice answer. Because this formula does not reveal much,
this formula looks very heavy. And it is very difficult to compute also. And we cannot really see
what the actual dependence on this probability P.

(Refer Slide Time: 40:28)

But a good principle which came out of all of this, which is perhaps more important than this
formula, is this one that we assign shortest, shortest sequences to symbols with larger
probabilities. And what we will do next is instead of going for such an accurate formula for L bar
P, which is difficult to understand, we will settle for a less accurate formula or less accurate
estimate for these quantities, but which is readily interpretable.
And so, next lecture will introduce that quantity which approximates both these L bar (eps) P and
L epsilon P, very well and it will be one of the most fundamental quantities for this course. So,
stay tuned for the next video lecture.

You might also like