Coding & Information Theory Explained

Coding & Information
theory
By: Shiva Navabi

January, 29th 2011
Overview
Information theory and it’s importance
Coding theory and it’s subsections
Statements of Shannon
Conclusion
References
Questions & ambiguities !
What is Information theory mainly discussing?
Information theory provided the possibility to
measure the amount of information regardless of it’s
content
Information theory deals with the amount of
information, not it’s content
Claude Shannon
 The first one who brought up the idea of measuring the amount of
information, was “Claude Shannon”
 Shannon published his landmark paper
“A Mathematical Theory of Communication” in 1948
and created a turning point in the field of digital
communications and Information theory was first
established on a firm basis
Significant aspects of Information Theory
 It sets bounds on what can be done but does little to aid in design of a
particular system
 Despite most bodies of knowledge, Information theory gives a central role to
errors (noise) and consider it as an issue that already exists and we have to
deal with that
 Presents the ideal model of a communication system with considering the
limits we’ll face in practice and recognizes the maximum efficiency of the
channel in the presence of noise
 Clarifies the concept of information and particularly discriminates theses
two different concepts: amount & meaning of information
 Provides mathematical tools for measuring the amount of information
 Represents Information in terms of bits and established the basis of digital
communications and more generally, digitized all electronic devices
The conventional signaling system is modeled as bellow:
Noise 4
Source Encode Channel Decode Sink
1 2 3 5 6
1.An information source 2.An encoding of this source
3.A channel (the medium through which the signal is transmitted)
4.A noise (error) source that is added to the signal in the channel
5.A decoding of the original information from the contaminated received signal
6.A sink for the information
Elaboration of the blocks of the typical signaling system:
1. Information source
 Any arbitrary information source is represented as a sequence of
symbols in a source alphabet s1 , s2 , . . . , sq having q symbols.
 The power of both coding and information theory is that they do not
define what information is; but try to measure the amount of it instead
 Continuous forms of information (analog signals) are mostly sampled
and digitized due to 2 main reasons:
1. storage, manipulation, transmission and generally processing of
digital signals is much more reliable that analog signals
2. Integrated circuits and specially computers are all digital devices;
therefore working with digital signals would be much easier
2&5. Encoding & Decoding
 Both encoding and decoding blocks implicitly should be divided into
two parts.
 Encoding block consists of two main stages:
1. Source encoding
2. Channel encoding
1. Source encoding
This stage of encoding basically deals with how to encode the
mentioned source symbols more efficiently; The more complex the
structure of the encoding method is, the more efficient the channel
capacity would be utilized.
Therefore “Data Compression” is highly influenced and controlled by
the method we use to encode our information source
2. Channel encoding
Strategies and methods in channel encoding, mainly are devised in
order to compensate for the alterations and damages that may be
applied to the original signal by the inevitable noise of the channel
through which the signal is transmitted;
Channel encoding methods on the other hand, expands the length of
the sent message but don’t contain any information;
In conclusion, at the “Encode” block there will be two stages as
following:
1. Source encoding
2. Channel encoding
At the “Decode” block, of course, these two stages will be reversed as
following:
1. Channel decoding
2. Source decoding
Coding Theory entry
 From the moment we start dealing with strategies and ideas of
encoding the information source, “coding theory” is being
utilized and implemented.
 Information theory and Coding theory are greatly related to each
other; hence, the boundaries of these two are not very clear most
of the times. Coding theory tries to bring Information theory
into application using efficient methods of encoding.
 Coding theory leads to information theory, and information
theory provides bounds on what can be done by suitable encoding
of the information.
Channel Coding
Methods of channel coding are generally
divided into two parts:
1. Error-Detecting codes
2. Error-Correcting codes
also/ Escape characters
Why Error-Detecting codes?
We require a highly reliable transmission whether
through a certain distance (transmission) or through
time (storage); Therefore a reliable transmission
definitely requires error-detection.
Note/ It is obviously not possible to detect an error if
every possible symbol, or set of symbols, that can be
received is a legitimate message.
It is possible to catch errors only if there are some
restrictions on what is a proper message
An example of Error-Detecting codes
Simple Parity Checks
The simplest way of encoding a binary message to make it error detecting

is to count the number of 1’s in the message, and then append a final
binary digit chosen so that the entire message has an even number of 1’s
in it. The entire message is therefore of even parity. Thus to (n-1) message
positions we append an nth parity-check position.
At the receiving end the count of the number of 1’s is made, and an odd
number of 1’s in the entire n position indicates that at least one error has
occurred.
Note/ parity-check is only able to detect odd number of errors; Therefore
in this code a double error cannot be detected. Nor can any even number of
errors be detected.
Need for Error Correction
A first question is: if a computer can find out there is an error, why it can
not find out where it is?
A very first strategy: retransmission of the message
If there was any error detected at receiving end, then we could request for
retransmission of the message.
The first and simplest error-correcting code: “triplication” code
In this strategy every message is repeated three times and at the receiving
end a majority vote is taken. It is obvious that this system will correct
isolated single errors.
However this strategy is very inefficient and requires expensive
equipments.
Rectangular Codes
The next simplest error-correcting codes are the rectangular codes where the
information is arranged in the form of an m-1 by n-1 rectangle. A parity-check
bit is then added to each row of m-1 bits to make a total of m bits. Similarly, a
check bit is added for each column.
The redundancy will be smaller the more the rectangle approaches a square.
For square codes of side n, we have (n-1)2 bits of information and 2n-1 bits of
checking along the sides.
Source Encoding
A better idea!
Variable-Length Codes
The advantage of a code in which the message symbols are of variable
length is that sometimes the code is more efficient in the sense that to
represent the same information we can use fewer digits on the average.
If every symbol is as likely as every other one, then the “block codes” are
about as efficient as any code can be.
But if some symbols are more probable than others, then we can take
advantage of this to make the most frequent symbols correspond to the
shorter encodings, and the less frequent symbols correspond to the longer
encodings.
Example: Morse code
Some fundamental problems with variable-length codes
 At the receiving end how do you recognize each symbol of the code?
 How do you recognize the end of one code word and the beginning of
the next?
Note/ The message may have other larger structures (plus its source
symbols having different probabilities) which could be used to make
the code more efficient.
Example: in English the letter Q is usually followed by the letter U
Unique Decoding
The received message must have a single, unique possible interpretation
Consider a code in which the source alphabet S has four symbols and they are to be
encoded in binary as follows:
s1 = 0
s2 = 01
s3 = 11
s4 = 00
The particular received message 0011 could be one of these two:
0011 = s4 , s3 or 0011 = s1 , s1 , s3
Thus the code is not uniquely decodable.
Condition/ Only if every distinct sequence of source symbols has a corresponding
encoded sequence that is unique, can we have a uniquely decodable signaling
system.
Instantaneous Codes
Decision Tree:
Consider the following code:
s1 = 0
s2 = 10
s3 = 110
s4 = 111
In order to decode any received message which is encoded using above

code words in a logical way, we’d better draw a decision tree.
Here’s the Decoding tree:
In this example the decoding is instantaneous since no encoded symbol

is a prefix of any other symbol.
Therefore:
A certain code can be instantaneously decoded only if none of the code
words is the prefix of any of the others.
Example: the following code is uniquely decodable but is not instantaneous
because you don’t know when one symbol is over without looking further:
s1 = 0
s2 = 01
s3 = 011
s4 = 111
In this code, some code words are prefix of other ones; therefore the code
cannot be decoded instantaneously.
Consider this string : 0111 . . . 1111
s4 s4
It can only be decoded by first going to the end and then identifying each
code word. Thus, you cannot decide that whether you have the word that
corresponds to the prefix, or if you must wait until more is sent to complete
the word. => this will cause a time delay!
Huffman Codes
Consider an information source with q symbols probabilities of which are
taken in decreasing order:
p1 ≥ p2 ≥ p3 ≥ . . . ≥ pq
For length of corresponding code words, the relationship will run in the
opposite way because of efficiency; therefore:
The two least frequent source symbols of an efficient code have the same
encoded lengths(why?).
Huffman encoding includes two stages:
1. Reduction process
2. Splitting process
Limitation on the efficiency of different codes
An important question/ How efficient could be the best encoding method?
Obviously the most efficient coding method is the one which compress
the information as great as possible.
Therefore the problem is to find a limitation on the amount of efficiency
and data compression.
Information
The concept of information in the field of communication and
information theory is different from the common expression of
information between people and these two separate meanings should not
be confused; since they are not equivalent.
Definition of information in mathematics
A signaling system is set up so that it could transmit a message to the
receiver where they already don’t know what the content of the message
is. Therefore, if you already know what you’re gonna receive at the
receiving end, the whole process is a waste and you won’t get any
information.(Example: p1 = 1 and all other pi = 0)
On the other hand, if the probabilities are all very different, then when a
symbol with a low probability arrives, you feel more surprised, get more
information than when a symbol with a higher probability arrives.
Thus information is somewhat inversely related to the probability of
occurrence.
I(p) is the mathematical function which will measure the amount of
information, surprise and uncertainty in the occurrence of an event of
probability p.
I(p) is represented as bellow:
With the base 2 for the logarithm, the unit of information is called “bit”.
If we use base e then the unit would be called “nat”.
For base 10 the unit would be called “hartley”.
The are three things assumed about I(p):
1. I(p) ≥ 0 (a real nonnegative measure)

2. I(p1 , p2) = I(p1) + I(p2) for independent events
3. I(p) is a continuous function of p
Now, the question is / if we get I(si) units of information when we receive

the symbol si , how much do we get on the average?
Entropy
The concept of entropy in mathematics is similar to the one we have in
other fields.
Common concept of entropy
Entropy used to represent a quantitative measure of the disorder of a
system and inversely related to the amount of energy available to do work
in an isolated system. The more energy has become dispersed, the less
work it can perform and the greater the entropy would be.
Concept of entropy in Information theory
It will put a lower bound on the amount of efficiency and data
compression in a certain code and it will tell us that for a specific
distribution, how efficient the best coding method would be.
Entropy will represent the minimum amount of memory that is occupied
by a certain code to convey a constant amount of information.
Mathematical relationship of entropy
The mathematical relationship of entropy is represented as bellow:
Which will tell us that on the average, over the whole alphabet of symbols
si , how much information we will get. (information per symbol of the
alphabet S)
How to maximize entropy?
If you have no idea which one of the source symbols are going to arrive,
then you’ll become most surprise and you’ll get the most amount of
information; hence, The entropy will be maximum. Therefore:
Whenever all source symbols are equally probable, the entropy of the
distribution would become the maximum of all other ones.
There also seem to be another factor affecting entropy.
Example:
For a die you’ll have : { , , , , , }
For a fair coin you’ll have : { , }

So, which distribution has a greater entropy?
Generally, the greater the uncertainty is, the greater the entropy would be.
Back to Information theory
Information theory, represented by Shannon, includes two important
statements of conclusion:
Source-coding theorem
According to this theorem, entropy of the information source represents
the minimum amount of bits (memory) that you need to encode the source
symbols and no coding method could compress the data more than that.
However, with a suitable encoding strategy it is possible to approach to the
value given by the entropy of the information source.
Noisy-channel coding theorem
Reliable transmission of the message through a noisy channel is possible.
However there is a limitation on the transmission rate at which the signal is
travelling into the channel. The “Channel Capacity” sets a threshold on the
transmission rate of the signal and can be approached via suitable coding
methods.
Channel Capacity
The inequality of Channel Capacity is represented by Shannon:
Where
C is the capacity of the channel (transmission rate) which is represented in “bits
per second”. It shows the amount of discrete information in bits that the medium
(channel) can hold.
W is the bandwidth of the channel in hertz.
S/N is the signal-to-noise ratio of the communication channel
Note/ SNR is expressed as a straight power ratio (not as decibels)
Some examples
Example 1:
Assume that:
SNR = 20 dB ; W = 4 kHz
Hence
=> 26.63 kbit/sec
Note/ the value of 100 is appropriate for an SNR of 20 dB
Meaning that the transmission rate could not exceeds over
26.63 bit/sec
Example 2:
Assume that :
C = 50 kbit/sec ; W = 1 MHz
Hence
=>
Meaning that for these circumstances, the minimum SNR should be

0.035 corresponding to an SNR of -14.5 dB.
This example shows that it is possible to transmit the signal while the
original signal (carrying information) is much weaker than the
background noise. However the Information theory won’t tell us how? It
only convinces us that it is possible.
Conclusion
 Information theory recognizes the limitations of communication
systems in practice and sets bounds on ideal transmission in the
presence of noise; However it does not tell us how to achieve them.
 Coding theory generally, control the reliability of transmission
(channel encoding) and affects the efficiency and data compression of
the transmitted signal.
 In fact, “Coding theory” will help us to approach the bounds
determined by “Information theory” in practice.
 It should be emphasized that it is hard to specify clear and distinct
boundaries for these two theories;
 Your idea ?
References
 W.Hamming, R. (1986). Coding and Infomation theory. Prentice-Hall.
 Forchhammer, J. J. (2010). Two-Dimentional Information and Coding
Theory. Cambridge University Press.
 Tilborg, H. C. (1993). Coding Theory a first course. Eindhoven.
 (n.d.). Retrieved from wikipedia: http://www.wikipedia.org

Questions & ambiguities ?!
Challenging points & discussion !

Thanks for your attention & participation

Coding & Information Theory Explained

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Coding & Information Theory Explained

Uploaded by

Copyright:

Available Formats

Coding & Information

By: Shiva Navabi

Source Encode Channel Decode Sink

The simplest way of encoding a binary message to make it error detecting

In order to decode any received message which is encoded using above

In this example the decoding is instantaneous since no encoded symbol

1. I(p) ≥ 0 (a real nonnegative measure)

Now, the question is / if we get I(si) units of information when we receive

For a fair coin you’ll have : { , }

Meaning that for these circumstances, the minimum SNR should be

Challenging points & discussion !

You might also like