You are on page 1of 3

EECS 121: Intro to Digital Communication

Spring 2010

Lecture 1 January 22
Lecturer: Anant Sahai Scribe: Myo Nyi Nyi

1.1

Introduction

Today we are going to discuss Source-Coding and move on to Compression if we have time or otherwise we will cover that in the next lecture.

1.2

Source-Coding

In the world of Digital Communication, sending information means sending bits from one place to another. So, the rst thing we have to do is we have to think how we are going to transform all kinds of contents(audio, video, data, text, etc.) into bits. Back in the days, like in 1948, people were using analog systems for the communication. In analog systems, each content is dierent from others. That means each content has its own information and communication means sending those information from one place to another. However, in the revolution of digital communication, the idea of communication wasnt necessarily to move information from one place to another. It is obvious now, but in those days, it wasnt obvious and people were thinking how they could dierentiate the contents from the communication. Lets say we have audios, videos, and texts, and the question was how we can manipulate these things so that the communication will see those dierent kinds of contents as the same or how we model them. And essentially, it was all about breaking things into bits. That means the digital communication treats all kinds of contents the same, seeing them as a bunch of bits regardless of what kind of contents they are or what kind of meanings they have. Lets start with a simpler model to understand how we model things into just a bunch of bits. Example: English Text on a page in a monospace font and lets say it is N characters long. How do we model this English test? Think of it as a set of possible symbols from alphabet: X = {a, b, c, ..., z, A, B, ..., Z, 0, 1, ..., 9, +, , , /, ..., , !} (1.1)

We raise it to N power as the text is N characters long, and we have X N . Now, we want to turn this into bits. And, essentially, for the communication purpose, we want to work with

1-1

EECS 121

Lecture 1 January 22

Spring 2010

bit string: {0, 1}M where M > N . So, now we need a function to transform from X N to {0, 1}M , and we call that function the source code. f : X N {0, 1}M where f : g such that English Strings : g (f (s)) = s. (1.2)

The source code, f, has to be an invertible function so that the person on the other will be able to convert the bits back into the original information. Otherwise, there is no way we will be able to get the original information back. Now, the question is how we determine the limits on the bits we need as we want the system to be optimized as much as possible. Question: What is the limits on the bits we need? Again, lets start with simple cases and we will construct step by step. 1. The rst case will be an empty set. X = {}. This one is easy. Since there is nothing, we need 0 bit to represent the whole text. 2. How about when there is exactly one symbol in the set? For example, X = {A}. This one is pretty straight forward too. We need 0 bit to represent the whole text since there is exactly only one possibility. Whenever we receive a piece of information, we know that it has to be that particular symbol since there is only one in the set. 3. How about when there are exactly two symbols in the set? For example, X = {A, B }. We need one bit for each character as there are two possibilities, and since there are N characters in total, we will need N bits to represent the whole text. 4. How about when there are exactly three symbols in the set? For example, X = {A, B, C }. The one is a bit tricky. The best strategy is 2N bits. However, how do we prove that we cant do any better than 2N? What is the lower bound for that? Lets us prove by answering some questions and observing them. How many strings: 3N since there are three possible characters in the set. How many bit strings: 2M So, its impossible if : 2M < 3N m < log2 3 = N (log2 3)
N

(1.3) (1.4)

Now, the question become how we bridge this gap so that we will have the number of necessary bits being as optimized as possible. Lets try to group dierent number of characters received at a time and see how it goes. Lets start with 2 letters at a time: 32 = 9. So, we need 4 bits to represent at a time. It is still 2 = 2. How about 3 letters at a time: 33 = 27. So, we need 5 bits to represent at a time. This time is better as 1-2
5 3

< 2.

EECS 121

Lecture 1 January 22

Spring 2010

So, the best strategy is

5 3

bits, and it has the lower bound and upper bound as follow: (log2 3)N 5 3 (N log2 3) + 1 (1.5)

In general, when there are M dierent symbols in the set and the data is N characters long, Best Strategy = N log2 |X | bits (1.6)

This kind of grouping, looking at many symbols at a time, is called Blockering and it is a powerful tool in the digital communication. 5. How about when there are exactly four symbols in the set? For example, X = {A, B, C, D}A. By using EQ(1.6) above, we can see that the number of bits required is 2N Bits. Alternatively, we can use Pigeonhole Principle to prove that the optimized number of bits for this set is 2N. The Pigeonhole Principle states that if N items are put into M pigeonholes where N > M , then there is at least one pigeonhole which contains more than one pigeon. Since we dont want any collision in our system in order to translate the data correctly back after transmitted, we need 2N bits to represent the N-length text. From the above cases, we can see that the number of bits required is truly optimized whenever the number of characters in the set is in Power of Two.

1.3

Conclusion

Try to do the following exercise on your own and the GSI will cover that problem during the discussion section. In the next lecture, we will discuss about the Compression. Exercise. What is the dierence between English Text and Monkey-Typing Text? In general, we know that English Text has some bonding whereas Monkey-Typing Text doesnt. How do we use that idea to have better compression?

1-3

You might also like