You are on page 1of 32

Department of Electronics and Communication

AY 2022-23 Sem-V
Information Theory and Coding

Unit-2
Source Coding

Need of source coding:


The aim of source coding is to represent information as accurately as possible us- ing as few
bits as possible and in order to do so redundancy from the source needs to be removed. The
source coding reduces redundancy to improve the efficiency of the system.
It is the process of arranging the efficient representation of data generated by the source.
Two approaches:
 Fixed length Coding: All symbols are given equal codeword length
 Variable length Coding: codeword length varies in accordance with the probability of
occurrence of each symbols.
Source encoder performs coding
The Code produced by a discrete memoryless source, has to be efficiently represented, which is
an important problem in communications. For this to happen, there are code words, which
represent these source codes.
For example, in telegraphy, we use Morse code, in which the alphabets are denoted
by Marks and Spaces. If the letter E is considered, which is mostly used, it is denoted
by “.” Whereas the letter Q which is rarely used, is denoted by “--.-”

Where Sk is the output of the discrete memoryless source and bk is the output of the source encoder
which is represented by 0s and 1s.
The encoded sequence is such that it is conveniently decoded at the receiver.
Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)
Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Let us assume that the source has an alphabet with k different symbols and that
the kth symbol Sk occurs with the probability Pk, where k = 0, 1…k-1.
Let the binary code word assigned to symbol Sk, by the encoder having length lk, measured in
bits.Hence, we define the average code word length (𝐿)of the source encoder as
𝑀

𝐿 = ∑ 𝑃𝐾 𝑙𝐾
𝐾=1
Consider : Assume Prob.s as 1/4
Symbols bit source encoder length (𝑙𝐾 )
S1 000 0 1
S2 0100 01 2
S3 0010 10 2
S4 0011 11 2
(𝐿)=7/4 bits
𝐿 represents the average number of bits per source symbol
If 𝐿 min=minimum possible value of 𝐿
Then coding efficiency can be defined as

With 𝐿 ≥ 𝐿 min we will have η≤1 (which means source coding aim to give compact
representation i.e for 7 bit ,it should give 1 or 2 bit representation. 𝐿 should be kept as min.as
possible, but it should satisfy 𝐿 ≥Lmin ,that is the condition for good or optimal code ,The value
of 𝐿 min is given by Shannon’s first theorem.)
However, the source encoder is considered efficient when η=1
For this, the value𝐿 min has to be determined.
Let us refer to the definition, “Given a discrete memoryless source of entropy H(X), the average
code-word length 𝐿 for any source encoding is bounded as 𝐿 ≥H(X)” (Shannon’s first theorem)
in simpler words, the code word example: Morse code for the word QUEUE is−.−..−...−. is
always greater than or equal to the source code QUEUE in example. Which means, the symbols
in the code word are greater than or equal to the alphabets in the source code.
Hence with 𝐿 min=H(X) the efficiency of the source encoder in terms of Entropy H(X) may be
written as:
Source Coding theorem: noiseless coding theorem / Shannon’s first theorem.

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

𝐿 ≥H(X)

Lower and upper limit for 𝐿


This source coding theorem is called as noiseless coding theorem as it establishes an error-free
encoding. It is also called as Shannon’s first theorem.

State and Prove Shannon’s first theorem.


Proof :
𝐿 ≥H(X)

Proof for : 𝑳 −H(X) ≥0


First write the equation of 𝑳 𝒂𝒏𝒅 H(X)
∑𝑀 𝑀
𝐾=1 𝑃𝐾 𝑙𝐾 *1 -∑𝐾=1 𝑃𝐾 𝑙𝑜𝑔2 1/𝑃𝐾 ≥0

𝑙𝑜𝑔22 = 1 Rewriting
∑𝑀 2 𝑀
𝐾=1 𝑃𝐾 𝑙𝐾 𝑙𝑜𝑔2 -∑𝐾=1 𝑃𝐾 𝑙𝑜𝑔2 1/𝑃𝐾 ≥0

Take common term outside


∑𝑀 2
𝐾=1 𝑃𝐾 (𝑙𝐾 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 1/𝑃𝐾 ) ≥0

∑𝑀 2
𝐾=1 𝑃𝐾 (𝑙𝐾 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 𝑃𝐾 ) ≥0
𝑙
∑𝑀 2𝑘
𝐾=1 𝑃𝐾 (𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 𝑃𝐾 ) ≥0

Remember log along b=logab


𝒍
∑𝑴 𝟐𝒌
𝑲=𝟏 𝑷𝑲 (𝒍𝒐𝒈𝟐 𝑷𝑲 ) ≥0

Change to natural log for simplifying


1
∑𝑀
𝐾=1 𝑃𝐾 (𝑙𝑜𝑔 𝑃𝐾 2
𝑙𝑘
) ≥0
𝑙𝑜𝑔2

Remember Maths identity


log x≤ x-1
-log 1/x≤ x-1
log 1/x ˃1-x

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

in our case let x=𝑃𝐾 2𝑙𝑘


𝑙𝑜𝑔 𝑃𝐾 2𝑙𝑘 ≤ 𝑃𝐾 2𝑙𝑘
−𝑙𝑜𝑔 1/𝑃𝐾 2𝑙𝑘 ≤ 𝑃𝐾 2𝑙𝑘 − 1
𝑙𝑜𝑔 1/𝑃𝐾 2𝑙𝑘 ≥ 1 − 𝑃𝐾 2𝑙𝑘
Rewrite in log2 form
1 1 1
∑𝑀
𝐾=1 𝑃𝐾 (𝑙𝑜𝑔 𝑃𝐾 2
𝑙𝑘
) ≥ 𝑙𝑜𝑔2 1/𝑃𝐾 ∑𝑀
𝐾=1 𝑃𝐾 (1 − )
𝑙𝑜𝑔2 𝑙𝑜𝑔2 𝑃𝐾 2𝑙𝑘

1
∑𝑀
𝐾=1 𝑃𝐾 (𝑙𝑜𝑔 𝑃𝐾 2
𝑙𝑘
) is nothing but 𝐿 −H(X)
𝑙𝑜𝑔2

1 1 1
𝐿 −H(X) ≥ 𝑙 ∑𝑀
𝐾=1 𝑃𝐾 − 𝑙 ∑𝑀
𝐾=1 𝑃𝐾 )
𝑜𝑔2 𝑜𝑔2 𝑃𝐾 2𝑙𝑘
1
≥1(always 1) − 𝑙 ∑𝑀
𝐾=1 2
−𝑙𝑘
𝑜𝑔2

𝐿 −H(X) ≥1(always 1) − ( ≤ 1(always)s


𝐿 −H(X) ≥0
Proof for 𝑳˂ H(X)+1:
To prove this let’s consider 2 equations first

∑ 2−𝑙𝑘 ≤ 1
𝐾=1
∑𝑀
𝐾=1 𝑃𝐾 =1

2−𝑙𝑘 ≤ 𝑃𝐾 ___A
𝑃𝐾 ≥ 2−𝑙𝑘
𝑃𝐾 ˂2−𝑙𝑘 +1 __B
We are inferring from A to B…
Take log2
𝑙𝑜𝑔2 𝑃𝐾 ˂𝑙𝑜𝑔2 2−𝑙𝑘 +1
𝑙𝑜𝑔2 𝑃𝐾 ˂(−𝑙𝑘 + 1)𝑙𝑜𝑔2 2
𝑙𝑜𝑔2 𝑃𝐾 ˂(−𝑙𝑘 + 1)

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

𝑙𝑜𝑔2 𝑃𝐾 − 1˂ − 𝑙𝑘
𝑙𝑘 ˂1 − 𝑙𝑜𝑔2 𝑃𝐾
𝑙𝑘 ˂1 + 𝑙𝑜𝑔2 1/𝑃𝐾
Using ∑𝑀
𝐾=1 𝑃𝐾 =1
𝑀 𝑀
1
∑ 𝑃𝐾 𝑙𝑘 ˂ ∑ 𝑃𝐾 (1 + 𝑙𝑜𝑔2 )
𝑃𝑘
𝐾=1 𝐾=1

𝑀 𝑀 𝑀
1
∑ 𝑃𝐾 𝑙𝑘 ˂ ∑ 𝑃𝐾 + ∑ 𝑃𝐾 𝑙𝑜𝑔2
𝑃𝑘
𝐾=1 𝐾=1 𝐾=1

𝐿˂ H(X)+1 ____________Hence Proved

Prefix Code :
A prefix code is a type of code system distinguished by its possession of the "prefix property",
which requires that there is no whole code word in the system that is a prefix (initial segment) of
any other code word in the system.
For example, a code with code words {9, 55} has the prefix property; a code consisting of
{9, 5, 59, 55} does not, because "5" is a prefix of "59" and also of "55". A prefix code is
a uniquely decodable code: given a complete and accurate sequence, a receiver can identify each
word without requiring a special marker between words. However, there are uniquely decodable
codes that are not prefix codes; for instance, the reverse of a prefix code is still uniquely
decodable (it is a suffix code), but it is not necessarily a prefix code.

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Prefix codes are also known as


 prefix-free codes/comma-free code ,
 prefix condition codes and
 instantaneous codes.
 Although Huffman coding is just one of many algorithms for deriving prefix codes,
prefix codes are also widely referred to as "Huffman codes", even when the code was not
produced by a Huffman algorithm.
 Using prefix codes, a message can be transmitted as a sequence of concatenated code
words, without any out-of-band markers or (alternatively) special markers between words
to frame the words in the message. The recipient can decode the message unambiguously,
by repeatedly finding and removing sequences that form valid code words. Prefix codes
are not error-correcting codes. In practice, a message might first be compressed with a
prefix code, and then encoded again with channel coding (including error correction)
before transmission.
 For any uniquely decodable code there is a prefix code that has the same code word
lengths.Kraft's inequality characterizes the sets of code word lengths that are possible in
a uniquely decodable code
Examples of prefix codes include:

 variable-length Huffman codes


 country calling codes
 Chen–Ho encoding
 the country and publisher parts of ISBNs
 the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless
Standard

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

 VCR Plus+ codes


 Unicode Transformation Format, in particular the UTF-8 system for
encoding Unicode characters, which is both a prefix-free code and a self-
synchronizing code
 variable-length quantity

Kraft McMillan Inequality property – KMI


In coding theory, the Kraft–McMillan inequality gives a necessary and sufficient condition for the
existence of a prefix code[1] (in Leon G. Kraft's version) or a uniquely decodable code (in Brockway
McMillan's version) for a given set of codeword lengths.

Shannon Fano Algorithm:

Shannon Fano Algorithm is an entropy encoding technique for lossless data compression
of multimedia. Named after Claude Shannon and Robert Fano, it assigns a code to each
symbol based on their probabilities of occurrence. It is a variable length encoding
scheme, that is, the codes assigned to the symbols will be of varying length.
HOW DOES IT WORK?
The steps of the algorithm are as follows:
1. Create a list of probabilities or frequency counts for the given set of symbols so that the
relative frequency of occurrence of each symbol is known.
2. Sort the list of symbols in decreasing order of probability, the most probable ones to
the left and least probable to the right.
3. Split the list into two parts, with the total probability of both the parts being as close to
each other as possible.
4. Assign the value 0 to the left part and 1 to the right part.
5. Repeat the steps 3 and 4 for each part, until all the symbols are split into individual
subgroups.
The Shannon codes are considered accurate if the code of each symbol is unique.
EXAMPLE:.
1.Using Shannon Fano coding lossless compression technique.
Find
Codewords, average code word length, code efficiency, entropy and Information rate if
nyquist rate is 1000 samples/sec

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Step:

Tree:

Solution:

Use
The average codeword length:
𝑀

𝐿 = ∑ 𝑃𝐾 𝑙𝐾
𝐾=1
R= nyquist rate as 1000 samples/sec
1
Entropy : H(s) = ∑3𝑖=1 𝑃𝑖 𝑙𝑜𝑔2
𝑃𝑖

Information Rate:R = H(s)r


Code efficiency=

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Let P(x) be the probability of occurrence of symbol x:

1. Arranging the symbols in decreasing order of probability:

And Cal Prob.


P(D) + P(B) = 0.30 + 0.2 = 0.58
and,
P(A) + P(C) + P(E) = 0.22 + 0.15 + 0.05 = 0.42
And since the almost equally split the table, the most is dividedit the blockquote table is
blockquotento {D, B} and {A, C, E} and assign them the values 0 and 1 respectively.
Now, in {D, B} group, P(D) = 0.30 and P(B) = 0.28

which means that P(D)~P(B), so divide {D, B} into {D} and {B} and assign 0 to D and 1 to
B.

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Step:

Tree:

In {A, C, E} group,

P(A) = 0.22 and P(C) + P(E) = 0.20


So the group is divided into
{A} and {C, E}
and they are assigned values 0 and 1 respectively.

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

1. In {C, E} group,

P(C) = 0.15 and P(E) = 0.05


So divide them into {C} and {E} and assign 0 to {C} and 1 to {E}
Step:
Tree:

The Shannon codes for the set of symbols are:

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

The average codeword length:


𝐿 = ∑𝑀
𝐾=1 𝑃𝐾 𝑙𝐾
5

𝐿 = ∑ 𝑃𝐾 𝑙𝐾
𝐾=1

=0.22*2+0.28*2+0.15*3+0.30*2+0.05*3
𝐿 = 2.2

r=nyquist rate as 1000 samples/sec


1 1 1 1 1
Entropy : H(S) = ∑5𝑖=1 𝑃𝑖 𝑙𝑜𝑔2 =0.22𝑙𝑜𝑔2 0.22 +0.28𝑙𝑜𝑔2 0.28 +0.15𝑙𝑜𝑔2 0.15 +0.30𝑙𝑜𝑔2 0.30
𝑃𝑖
1
+0.05𝑙𝑜𝑔2 0.05

H(S) =2.142 symbols/sec

Information Rate:
R = H(S)*r = =2.142 x 1000
R = 2142 bits/sec

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Code efficiency=

H(S) =2.142 symbols/sec

𝐿 min=H(S)

Code efficiency=2.142/2.2 =0.95=95%


Redundancy =1-code efficiency =1-0.95=0.05

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Huffman coding algorithm:


The source symbols are listed in order of decreasing probability.
The two source symbols of lowest probability are assigned a 0 and 1. This part of step is
referred to as splitting stage.
These two source symbols are regarded as being combined into a new source symbol with
probability equal to the sum of two original probabilities. The probability of the new
symbolis placed in the list in accordance with its value.
The procedure is repeated until we are left with a final list of source statics of only two for
which a 0 and a 1 are assigned.

In discrete Memoryless source content (S0 = 0.4, S1 = 0.2, S2 =


0.2, S3 = 0.1 , S4 = 0.1. calculate efficiency of source.

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

As high as possible

MSB

The average codeword length

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Redundancy=1-codeeffciency=1-0.95=0.05

𝐿 min=H(S)

As low as possible

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Probability Codeword
Symbol
S0 0.4 1

S1 0.2 01

S2 0.2 000

S3 0.1 0010

S4 0.1 0011

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Which clearly reflects that Huffman encoding process is not


unique.

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

0.4

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

2.422

2.422 2422

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


2.422 2.422 96.8%
2.422

0.032 REDUNDANCY
Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Extended Huffman Coding/Adaptive Huffman


Coding :
In applications where the alphabet size is large, pmax is generally quite small, and the
amount of deviation from the entropy, especially in terms of a percentage of the rate, is
quite small.
However, in cases where the alphabet is small and the probability of occurrence of the
different letters is skewed, the value of pmax can be quite large and the Huffman code
can become rather inefficient when compared to the entropy.
To overcome this inefficiency we use adaptive Huffman coding, the same can be
illustrated with the help of following example:
Consider a source that puts out iid letters from the alphabet A = {a1, a2, a3} with the
probability model P(a1) = 0.8, P(a2) = 0.02, and P(a3) = 0.18. The entropy for this
source is 0.816 bits/symbol. A Huffman code for this source is shown in Table below
TABLE 1: The Huffman code.

Letter Codeword

a1 0

a2 11

a3 10

The average length for this code is 1.2 bits/symbol. The difference between the average
code length and the entropy, or the redundancy, for this code is 0.384 bits/symbol,
which is 47% of the entropy. This means that to code this sequence we would need
47% more bits than the minimum required.

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Now for the source described in the above example, instead of generating a codeword
for every symbol, we will generate a codeword for every two symbols. If we look at the
source sequence two at a time, the number of possible symbol pairs, or size of the
extended alphabet, is 32 = 9. The extended alphabet, probability model, and Huffman
code for this example are shown in Table below
TABLE 2: The extended alphabet and corresponding Huffman code.

Letter Probability Code

a1a1 0.64 0

a1a2 0.016 10101

a1a3 0.144 11

a2a1 0.016 10100

a2a2 0.0004 10100101

a2a3 0.0036 1010011

a3a1 0.1440 100

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Letter Probability Code

a3a2 0.0036 10100100

a3a3 0.0324 10100

The average codeword length for this extended code is 1.7228 bits/symbol. However,
each symbol in the extended alphabet corresponds to two symbols from the original
alphabet.
Therefore, in terms of the original alphabet, the average codeword length is 1.7228/2 =
0.8614 bits/symbol.
This redundancy is about 0.045 bits/symbol, which is only about 5.5% of the entropy.
Advantage of extended Huffman coding
We see that by coding blocks of symbols together we can reduce the redundancy of
Huffman codes.

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)


Department of Electronics and Communication
AY 2022-23 Sem-V
Information Theory and Coding

Dr.Tanuja S.Dhope (Bharati Vidyapeeth (DU),COE,Pune)

You might also like