Lesson 4 Information Theory

Data and Digital Communication
Lesson 4: Information Theory
Engr. RHANDEL O. ZAMORA, EcE, MSME, LPT

In this lesson, we will discussed how information such as
alphanumeric symbols may be source encoded into data
representations. The purpose of a communications system is not
fundamentally to send data but primarily to convey information
between two, or more, parties. This raises the twin questions of what
is information and how may we represent it in a communications
system?
Topic includes:
Information source
Information measure
Entropy
Source codes
Data compression
Information source
One standard method of defining an information source is based upon
indicating the number, and identity, of each possible symbol as well as their
associated frequency of occurrence, each of which is expressed as a probability.
Consider the following received message:
THA CIT SAT ON THA MIT
The reason that such a correct interpretation may be made is that the English language
contains a degree of redundancy. That is, the information that the symbols used to encode
the information produce more than is strictly required, as seen in the above trivial example.
This example illustrates that the information content of the original message is not
necessarily the same as the original set of symbols used to convey the message.
The information content of a message may be quantified. An information source may be
modelled by a repertoire of messages from which that desired may be selected. Suppose a
source contains a set of symbols denoted by:
(x1, x2, x3, . . . , xn)
The probability of any one symbol n appearing at any moment in the sequence may be known
and denoted by P(xn) and where we may express a source by the following parameters:
- the number of symbols in the repertoire, or alphabet, denoted by n;
- the symbols themselves are denoted by x1, . . . , xn;
- the probabilities of occurrence of each of the symbols, denoted by P(x1), . .
. , P(xn).
Such a source is called discrete memory-less source. By memory-less we mean that the
probability of a symbol occurring is independent of any other symbol.
As an example let us model a source based upon binary signals:
The number of symbols n is two. That is, one of two possible voltage
levels may be transmitted.
Symbols are denoted as x0 representing binary 0 and x1 representing
binary 1.
Finally, the probability of occurrence of these symbols is denoted:
P(x ) = P(x ) = 0.5 that is they are equiprobable.
0 1
Similarly a simple telegraphic code comprising 26 letters and a space character

may be modelled:
n = 27
x1 = A, x2 = B, . . . , x26 = Z, x27 = space
Information measure
A formal method of defining, or modelling, sources of information it is now possible to
establish a technique to measure information. a measure of the information content of a
message can be based on the amount of uncertainty it removes. This may be illustrated by
the following three examples:
1. An information source consists of the outcomes of tossing a fair coin. The source can be
modelled as follows:
The number of symbols n = 2. The symbols represent the outcome of tossing a fair coin so
that x1 represents heads and x2 tails.
The probabilities of occurrence of the symbols are, assuming that it is a fair coin: P(x1) = P(x2)
= 0.5. Hence there is equal uncertainty with regard to the outcome of a single toss, so that
either outcome removes the same amount of uncertainty and therefore contains the same
amount of information.
2. However, when the symbols represent the answers to the question ‘Did you
watch television last night?’, then the source may be modelled as follows:
The number of symbols n = 2.
The symbols are then x1 = Yes, x2 = No.
On the assumption that 80% of the population watched television last night,
the probabilities of occurrence of the symbols are:
P(x1) = 0.8 and P(x2) = 0.2
3. Consider the source where the number of symbols is n = 2: The symbols are a
binary source with x1 = 1, x2 = 0.
The probabilities of occurrence are: P(x1) = 1 and P(x2) = 0
The receipt of x1 is certain and therefore there is no uncertainty and hence no
information; x2 of course will never be received.
Let the information content conveyed by xi be denoted by I(xi). Then from the
relationships established above we can say that:
1. If P(xi) = P(xj) then I(xi) = I(xj)
2. If P(xi) < P(xj) then I(xi) > I(xj)
3. If P(xi) = 1 then I(xi) = 0
A mathematical function that will satisfy the above constraints is given by:
I(xi) = logb(1/P(xi)) = -logb(P(xi))
The standard convention in information theory is for b to be numerically equal

to two, in which case the unit of information is called a bit. The choice of two
assumes two equiprobable messages, as in the case of binary 1 and binary 0,
and is known as an unbiased binary choice.
The information unit is therefore normalized to this lowest order situation and 1
bit of information is the amount required or conveyed by the choice between
two equally likely possibilities. That is:
If P(xi) = P(xj) = 0.5
then I(xi) = I(xj) = log2(2) = 1 bit
Hence in general:
I(xi) = log2(1/P(xi)) = -log2(P(xi))

2, 3, 3. 1
In order to work out information content it is often necessary to change the
base of the log from 2 in order to make use of logarithms which are readily to
hand. This may be achieved as follows:
Base x may conveniently be either e or 10, the logarithms for which are widely
available.
Entropy – (H) expresses the average amount of information conveyed by a
single symbol within a set of codewords and may be determine as follows:
and where: log2(1/P(xi)) is, as we have seen, simply the information content of the ith symbol.
The formula for entropy shown above is therefore simply averaging each of the products of a
symbol’s probability and length, in bits, for all of the symbols or codewords in the codeword
set.
In designing source codes it is the average information content of the code that is of interest
rather than the information content of particular symbols. Entropy effectively provides the
average content and this may be used to estimate the bandwidth of a channel required to
transmit a particular code, or the size of memory to store a certain number of symbols within
a code.
Source codes
As indicated earlier, the aim of source coding is to produce a code which, on
average, requires the transmission of the maximum amount of information for
the fewest binary digits. This can be quantified by calculating the efficiency η of
the code. However before calculating efficiency we need to establish the length
of the code. The length of a code is the average length of its codewords and is
obtained by:
where li is the number of digits in the ith symbol and n is the number of symbols
the code contains.
The efficiency of a code is obtained by dividing the entropy by the average code
length:
Analysis and design of source codes
Where messages consisting of sequences of symbols from an n-symbol source have to be
transmitted to their destination via a binary data transmission channel, each symbol must be
coded into a sequence of binary digits at the transmitter to produce a suitable input to the
channel.
In order to design suitable source codes some descriptors for classifying source codes have
been produced, as follows:
- A codeword is a sequence of binary digits.

- A code is a mapping of source symbols into codewords.
- A code is distinct if all its codewords are different, or unique.
- A distinct code is uniquely decodable if all its codewords are identifiable
when embedded in any sequence of codewords from the code. A code must have this
property if its codewords are to be uniquely decoded to its original message symbols.
- A uniquely decodable code is instantaneously decodable if all its
codewords are identifiable as soon as their final binary digit is received.
.
- A prefix of a codeword is a binary sequence obtained by truncating the
codeword.
For example, the codeword 1010 has the following prefixes:
1, 10, 101 and 1010
Instantaneous codewords may be generated and decoded by using a code tree
or decision tree, which is rather like a family tree. The root is at the top of the
tree and the branches are at the bottom. The tree is read from root to branch.
As the binary code is received the digits are followed from the root until the
branch is reached when the codeword can be read. A code tree is shown in
Figure 3.1 for a three-digit binary code. When the codeword 101 is received,
take the right-hand branch to junction (a), then from there the left-hand branch
to junction (b), then the right-hand branch to the end of the branch to read off
the codeword x6. In the case of an instantaneous code, the codewords used to
represent the possible states of the source must all correspond to the ends of a
branch.
Huffman code
Codewords are listed vertically and to the left in descending order of probability. The lowest
two probabilities are then combined to give a new, intermediate, probability which is simply
the sum of their probabilities. The procedure of combining the lowest pair of probabilities is
iteratively repeated until a final sum of probability 1 is reached. Where there are three, or
more, equal probabilities available, any one of them may be selected. At each combination
point the binary symbols 0 and 1 are allocated. Which way round symbols are assigned does
not effect the outcome of the Huffman coding. However, for ease and consistency, binary 0
may be assigned to the upper division and binary 1 to the lower one, or vice versa. (Another
approach is to assign one binary symbol to the highest probability, and vice versa.) Finally
codewords are deduced from the diagram by reading the binary symbols assigned from right
to left.
Data compression seeks to reduce the data transmitted by reducing, or
eliminating, any redundancy in the data. One of the prime reasons for this is that
some messages which contain large amounts of data often require high-
bandwidth communication links in order to transmit a message within a
reasonable amount of time. This is particularly true of many Internet
connections to an ISP over dial-up modems.
For example, the alphabet contains 26 characters and, if binary encoded,

requires 5 bits per character. This example highlights a problem in encoding
which is that of redundancy. Strictly mathematically, 26 codewords only
require about 4.7 bits, but this is a nonsense since the number of bits must be
integer and so in this example we round to 5.
Different compression techniques
Difference compression, also known as relative encoding, is where instead of transmitting

consecutive data, only the difference between current data and last data is sent. is well
suited to digitized speech and sensor measurements but unsuitable for documents and
moving and still images.
run-length encoding instead of transmitting absolute values, repeated patterns or runs are
detected. The repeated value itself is then sent, and the number of times that it is repeated
index compression.
Repeated patterns are placed in a table and both transmitter and receiver hold a copy of the
table. In order to transmit a run an index is used at the transmitter to point to the entry of
the run in the table. It is this index which is then transmitted and which the receiver uses to
extract the run in question. This is widely used with zip compression techniques and is
suitable for text and images, but not for speech. A commonly used code is Lempel–Ziv (LZ)
code.
Facsimile compression
Fax, as in television, is based upon scanning a document line by line but differs
inasmuch that only monochrome is provided. Operation is by means of a sharply focused
light source scanned across the document in a series of closely spaced lines. An optical
detector detects the reflected light from the scanned area which is encoded as either binary
0 for a ‘white’ area or binary 1 for ‘dark’. The receiver then interprets the data as a black dot
for 1 or ‘no printing’ for 0. The identical dots described are termed picture elements, or pels.
The ITU has produced a number of fax standards, namely T2 (Group 1), T3
(Group 2), T4 (Group 3) and T6 (Group 4). Only Group 3 and Group 4 are commonly used.
Group 3 is intended to operate over analogue PSTN lines using modulation and operating at
14.4 kbps. Group 4 operates digitally at 64 kbps by means of baseband transmission over ISDN
lines.
Group 3 scans an A4 sheet of information from the top left-hand corner to the bottom right-
hand corner. Each line is subdivided into 1728 picture elements (pels). Each pel is quantized
into either black or white. In the vertical direction the page is scanned to give approximately
1145 lines.
Termination codes are merely white, or black, runs 0 to 63 pels long. Make-up
codes (Table 3.4(b)) are multiples of 64 pels of the same color. Coding is based on
assuming that the first pel of a line is always white. Runs of less than 64 pels are
simply encoded directly and the appropriate termination code selected. Runs of
64 pels or more are encoded using an appropriate make-up code and, if
necessary, one or more termination codes. Runs in excess
of 2623 pels make use of more than one make-up code. This use of Huffman
coding whereby codewords may in some instances be selected from both tables,
rather than directly encoding into a single codeword, is known as modified
Huffman coding.
Group 4 specifies a different coding strategy to that of modified Huffman in
order to cope with images. Although photographs and images may contain large
degrees of shades in one line, there is often only a very small change between
adjacent lines.
Much better compression may therefore be achieved if difference encoding
across adjacent lines is used. Group 4 uses a code known as modified relative
element address designate, or modified READ code, which is also optionally
specified for Group 3.
(The term modified is to indicate that it is a variant of an earlier READ code.)
Modified Huffman coding only encodes a single line at a time and is referred to
as a one-dimensional code. Modified READ coding is known as a two-
dimensional code because the encoding is based upon a pair of lines.
a1 is simply the first pel to the right of a0 which has opposite color, that is the
first pel of the next run.
a2 is similar to a1 and denotes the start of the next run after that defined by a1.
b1 is the first pel on the reference line to the right of a0 whose color differs from
the pel immediately to its left and also that of a0.
b2 is the start of the next run after b1.
A run to be encoded may be in one of three different modes:
1. Pass mode: This occurs when a run in the reference line is no longer present in the
succeeding coding line, Figure 3.5(a).
The run that has ‘disappeared’, b1b2, is represented by the codeword 0001. The next
codeword to be produced will be based upon the run starting at a1, and which is in this case
a run of only 1 pel. The significance of a1′, which corresponds to b2 in the reference line
above, is that this position in the coding line will be regarded as a0 of the next run to be
encoded. Note that this run, which is the next to be encoded after the run that disappeared,
has the same color.
2. Vertical mode: This in fact applies to most runs and is where a black run in the coding line
is within ±3 pels of the start of a corresponding black run in the reference line. The two
extreme cases are shown in Figure 3.5(b).
There are five other possibilities, namely ±1 pel or ±2 different, or the commencement of the
two runs coincides.
3. Horizontal mode: This mode is similar to the vertical mode but where the degree of
overlap is in excess of ±3 pels. Two examples are shown in Figure 3.5(c).
Encoding
uses the codeword 001 to indicate that it is horizontal mode followed by codewords for the
run length a0a1 and a1a2. In the case of the upper example, the coding line commences with
the disappearance of a run of two black pels and would be encoded as pass mode. This is
followed by two white pels, a0a1, which do not fall within the category of pass or vertical
mode and must therefore be in horizontal mode.
To complete horizontal coding, the next black run, a1a2, is also encoded. Hence the horizontal
mode is encoded 001 0111 00011. Similarly it is left to readers to satisfy themselves that the
lower example is encoded 001 00111 11.
Video compression
Current video compression predominantly uses MPEG
where MPEG is the acronym for Moving Picture
Experts Group set up by ISO in 1990 to develop
standards for moving pictures. MPEG in turn is partly
based upon the use of a parallel standard originally
devised for digital coding of still pictures and now used
in digital photographic equipment. This standard is the
Joint Photographic Expert Group (JPEG) and drew
upon experts from industry, the universities,
broadcasters and so on. The group worked with the then
CCITT and ISO and commenced work in the mid-
1980s. JPEG compresses single still images by means of
spatial compression.
Video signals are based upon a series of still pictures, or frames, which are
obtained at a constant rate using a scanning technique. Very often interleaved
scanning is used, as in public broadcast transmissions, where on one cycle of
scanning, odd lines of the picture are produced, and even lines, the next. These
‘half frames’ are known as fields.
MPEG-1 produces four types of frame:

1. I-frames (intrapicture) which are free-standing JPEG encoded images.
2. P-frames (predictive) which, as with B-frames next, are not free standing. A
P-frame specifies the differences compared with the last I-frame or P-frame.
3. B-frames (bidirectional) which specify the difference between the immediately
preceding/succeeding I- and/or P-frames.
4. D-frames (dc-coded) which use low-level compression and are optionally
inserted at regular intervals in order to facilitate display of low-resolution images
during fast forward, or rewind, of a VCR or Video-on-Demand video source.
P-frames use a combination of motion estimation and motion
compensation leading to approximately double the compression
ratio achieved within an I-frame. Motion estimation is based upon
the principle of dividing the image to be compressed into a number
of smaller rectangular areas, or macroblocks. A macroblock in one
frame, and regarded as a reference, Figure 3.9(a), is compared with
macroblocks in the same region in a subsequent frame on a pixel-by-
pixel basis. This process is called block matching.
Audio compression
MPEG also includes compression of audio to accompany moving images or for standalone
purposes such as audio compact discs (CD). MPEG-1 supports two audio channels and is
therefore suitable for normal stereophonic use. The main difference with MPEG-2 is that
multiple channels are supported making it suitable for applications such as surround sound.
Standard CD audio recording is based upon a sampling rate of 44.1 kHz and 16 bits per sample.
Additional bits are also required to provide synchronization and error correction, which results
in 49 bits being produced per 16-bit sample.

Lesson 4 Information Theory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lesson 4 Information Theory

Uploaded by

Copyright:

Available Formats

Data and Digital Communication

Lesson 4: Information Theory

Engr. RHANDEL O. ZAMORA, EcE, MSME, LPT

Similarly a simple telegraphic code comprising 26 letters and a space character

The standard convention in information theory is for b to be numerically equal

I(xi) = log2(1/P(xi)) = -log2(P(xi))

- A codeword is a sequence of binary digits.

For example, the alphabet contains 26 characters and, if binary encoded,

Difference compression, also known as relative encoding, is where instead of transmitting

MPEG-1 produces four types of frame:

You might also like