Information Theory2 PDF

Computer Engineering Department
Information Theory
Second Year
Dr. Eng. Riyadh J.S. Al-Bahadili
PDF created with pdfFactory Pro trial version www.pdffactory.com

Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
Information Theory
Information theory provides a quantitative measure of info contained in message signals and
allows us to determine the capacity of a communication system to transfer this info from source
to destination. Information theory is originally known as ‘Mathematical Theory of
Communication’ and it deals with mathematical modeling and analysis of a communication system
rather than with physical channel.
By the information theory, we can consider the efficient way to communicate the data. It basically
provides limits on:
1. The minimum number of bits per symbol required to fully represent the source.
2. The maximum rate at which reliable communication can take over the channel.
1. Concept of Information
An info source is an object that produces an event, the outcome of which is selected at random
according to a probability distribution.
A discrete info source is a source that has only a finite set of symbols as outputs. The set of source
symbols is called the source alphabet, and the elements of the set are called symbols or letters. Info
sources can be classified as having memory or being memoryless.
A memory source is one for which a current symbol depends on the previous symbols. A
memoryless source is one for which each symbol produced is independent of the previous symbols.
The communication system never be described in the deterministic sense, it can be considered of
Statistical nature. It means to describe a communication system completely we have to use its
unpredictable ‘or’ uncertain behavior.
It can be easily understand by example that each transmitter transmit the information randomly,
we cannot predict which one message, and transmitter is going to be transfer just next moment.
But we know the probability of transmitting a particular message.
So to define a system completely, we need statistical study of system and statistical study of system
is performed with the help of concept of probability.
Now, just an example of two messages:
1|Page

(a) Bird flies

(b) Cat flies
Sentence (a) has minimum information, but sentence (b) has maximum information as this
sentence (b) has least probability to occur. So we can say there is a sort of inverse relationship
between the probability of event and the amount of information associated with it. Thus, we can
say:
1
( )=
( )
Where xi is an event with a probability of P ( ) and the amount of information associated with it
is I ( ).
Generally, for simplicity we define the logarithm measurement of information.
( ) = log = − log ( )
( )
Example.1:
How many bits per symbol to encode 32 different symbols?
We have M=32 symbols, so P(x) = 1/32, ( ) = log 32 = 5 /
The advantage of logarithm presentation is that if we have probability of joint event ( , ) and
if both are statistically independent
, = ( ). ( )
So, , = ( )+ ( ) [proof H.W]
Example.2:
The symbols A, B, C, and D occur with probability ½, ¼, 1/8, and 1/8 respectively. Find the info
content in the message ‘BDA’, where the symbols are independent.
I (BDA) = I (B) + I (D) + I (A) = log 4 + log 8 + log 2 =6 bits
As we know that the base of logarithm can be different, so we may have different units of
information:
• Bits (Base 2)
• Nats (Base e)
• Decits (Base 10)
2|Page

e.g.
1
1 = = 0.6932
log
1
1 = = 0.3010
log 10
1
1 = = 3.219
log 2
2. Entropy
In the communication system, we don’t have only a single message, but have a number of
messages. So to calculate the total information instead of calculating information due to individual
messages and adding them, we calculate the average information of the system knows as entropy
of the source.
Let there be M different messages m1, m2... mM with their respective probability P1, P2, PM.
Let us assume that in a long time interval, L messages have been generated. Let L be very large so
that L > > M; then the number of messages m1=P1 L
The amount of information in message ml = log (1/P1)
Thus, the total amount of information in ml = P1 L log (1/P1)
The total amount of information in all L messages will be
It = P1 L log (1/P1) + P2 L log (1/P2) + … + PM L log (1/PM)
So, the average information will be
H=It / L = P1 log (1/P1) + P2 log (1/P2) + … + PM log (1/PM)
Or =∑ log 1/
Thus the unit of entropy wi1l be information/message. I (x) is called self-information and simply
H (x) is called self-entropy.
3|Page

Example.3:
A discrete source has 4 symbols x=[x1, x2, x3, x4] with probability P= [1/2, 1/4, 1/8, 1/8]. Find
info content in each symbol then calculate the entropy. Also calculate the bit average for the
message ‘x1 x2 x1 x4 x3 x1 x1 x2’.
( ) = log
( )
I(x1) = 1 bit, I(x2) = 2 bits, I(x3) = 3 bits, I(x4) = 3 bits

( ) = −∑ P ( ) log P ( ) = 1.75 bit/symbol
The message ‘x1 x2 x1 x4 x3 x1 x1 x2’ has 8 symbols which consist of 14 bits
So the bit average = 14/8 = 1.75 bit/symbol
Exercise.1: For binary discrete source, x has two symbols x1 and x2. Prove mathematically that
H(x) is maximum when p(x1) = p(x2) and max H(x) = 1 bit/symbol.
3. Rate of Information
If a message source generates message (or symbols) at the rate of r messages (or symbols) per
second; then rate of information
R= r H bits/second
Example.4:
An event has total six outcomes with probability P1=1/2, P2=1/4, P3=1/8, P4=1/16, P5=1/32, and
P6=1/32. Find the entropy of the system. Also find the rate of information if these are 18 outcomes
per second.
Solution. By the formula of entropy ∑ log 1/

H= ½ log 2+ ¼ log 4+1/8 log 8+ 1/16 log 16+ 1/32 log 32+1/32 log 32 = 31/16 bits/message
But now r= 18 outcomes / second
R= r H =18 (31/16) = 34.875 bits/sec
Exercise.2: A discrete source emits one of five symbol once every one microsecond with
probabilities ½, ¼, 1/8, 1/16, 1/32 respectively. Determine the entropy and information rate.
(Check answer: H=57/32 bits/symbol, R=1.78125 Mb/s)
Exercise.3: TV picture consists of 2 × 10 pixels and 16 different grey levels. The pictures are
repeated at the rate of 32 picture/sec. All grey levels have equal likelihood of occurrence. Find the
average rate of info conveyed by this TV.
(Check answer: R= 256 Mbits/sec)
4|Page

Exercise.4: A telegraph source has two symbols (dot and dash). The time of dot is 0.5 sec, the
time of dash is 3 times of dot time, and the time between symbols is 0.2 sec. The probability of
dot’s occurring is twice that of the dash. Find the average rate of info for this telegraph.
(Check answer: R= 1.725 bit/sec)
4. Discrete Memoryless Channel (DMC)

A communication channel is the path or medium through the symbols flow to the receiver. DMC
is a statistical model with input X and output Y. Its ‘memoryless’ when the current output depends
only on the current input and not on any of the previous inputs.
4.1 Conditional Probability

( / ) Represents the conditional probability of obtaining output given that the input , it’s
also called channel transition probability.
x1 y1
x2 X ( / ) Y y2
xi yj
xm yn
A channel is completely specified by the complete set of transition probabilities:

( 1⁄ 1) ( 2⁄ 1) ( ⁄ 1)
⋯
( 1⁄ 2) ( 2⁄ 2) ( ⁄ 2)
( ⁄ )=
⋮ ⋱ ⋮
( 1⁄ ) ( 2⁄ ) ⋯ ( ⁄ )
The matrix ( ⁄ ) is called channel matrix, and each row in this matrix must sum to unity.
∑ ( / )=1
Let
P(X) = [p (x1) p (x2) … p (xm)]
P(Y) = [p (y1) p (y2) … p (yn)]
Then
( )= ( ) ( / )
5|Page

4.2 Joint Probability

If P(X) is represented as a diagonal matrix:
( 1) 0 0
⋯
0 ( 2) 0
( ) =
⋮ ⋱ ⋮
0 0 ⋯ ( )
Then
( , )= ( ) ( ⁄ )
The matrix P(X, Y) is the joint probability matrix, and the element p ( , ) is the joint probability
of transmitting and receiving .
, = ( ) ⁄ = ( ) ( ⁄ )
Note:
( )=∑ ( , ) And =∑ ( , )
Example.5:
For the following binary channel (p(x1) = p(x2) =1/2)
X1 .9 y1
0.1
0.2
X2 0.8 y2
a. Construct the channel matrix for this channel
b. Find p(y1) and p(y2)
c. Find the joint probability p(x1, y2) and p(x2,y1)
Solution:
( 1⁄ 1) ( 2⁄ 1) .9 .1
(a) ( ⁄ )= =
( 1⁄ 2) ( 2⁄ 2) .2 .8
(b) P(X) = [p(x1) p(x2)] = [ 0.5 0.5]
P(Y) = P(X) P(Y/X)
.9 .1
= [0.5 0.5] = [.55 .45] = [p(y1) p(y2)]
.2 .8
(c) ( , ) = ( ) ( ⁄ )
.5 0 .9 .1 . 45 . 05
= =
0 .5 .2 .8 .1 .4
Hence p(x1, y2) =.05 and p(x2, y1) =.1
6|Page

Exercise.5:
The following channel matrix has P(X) = [.5 .5]
1− 0
( ⁄ )=
0 1−
a. Draw the channel
b. Find P(Y) if p=0.2
5. Joint Entropy and Conditional Entropy

Suppose, we have totally m messages
[X] = [x1, x2, ... xm]
And now at receiver, we receive totally n messages
[Y] = [y1, y2, ... yn]
Then P ( ) = called marginal probability of x messages
P ( ) = called marginal probability of y messages
Thus, marginal entropy

( ) = −∑ P ( ) log P ( )
( ) = −∑ P ( ) log P ( )
Where P ( ) may be defined as

( )=∑ ( , )
(Note: 0 ≤ ( )≤ )
( , ) = joint probability of event x and y
Joint entropy of x and y

( , ) = −∑ ∑ , log ( , )
Note that ( ⁄ ) and ( ⁄ ) are called conditional probability that will be clear by their
definitions.
7|Page

( ⁄ ) = Probability of X when Y has been received

( ⁄ ) = Probability of Y when X has been transmitted
For conditional entropy; we use the relation
( ⁄ ) = −∑ ∑ , log ( ⁄ )
Similarly
( ⁄ ) = −∑ ∑ , log ( ⁄ )
6. Relations between the different entropies

The relations between joint, conditional and marginal entropies given by,
( , )= ( ⁄ )+ ( )= ( ⁄ )+ ( )
Exercise.6:
Find H(X), P(X, Y), and H(Y) for given channel shown in figure, given that P(X1) =0.2, P(X2)
=0.5, and P(X3) =0.3
0.8
X1 Y1
0.2
X2 1 Y2
0.3
X3 0.7 Y3
8|Page

Exercise.7:
A transmitter has an alphabet of four letters [x1, x2, x3, x4] and the receiver has an alphabet of
three letters. Calculate all entropies if the joint probability matrix is:
0.3 0.05 0
P(X, Y) = 0 0.25 0
0 0.15 0.05
0 0.05 0.15
(Check answer: H(X) = 1.96, H(Y/X) =0.53, H(X, Y) =2.49, H(Y) =1.49, H(X/Y) =1.0)
7. Mutual Information
We have
( ) = Probability of transmitting
( ⁄ ) = Probability of transmitting , when has been received
Thus ( ) shows the probability ‘or’ uncertainty of x; when we have not received any thing called
prior uncertainty and ( ⁄ ) called final uncertainty of ‘x’ when we have received that at
receiver side and the difference of these uncertainties called mutual information.
; Mutual information represents the uncertainty about input that is resolved by observing
the output.
( ⁄ )
( ; )=∑ ∑ ( , ) log
( )
Properties of ( ; )
• I(X; Y) = I(Y; X)
• I(X; Y) ≥ 0
• I(X; Y) = H(Y) − H(Y⁄X) = H(X) − H(X⁄Y)
• I(X; Y) = H(Y) + H(X) − H(X, Y)
8. Channel Types
8.1 Lossless Channel
Channel matrix with only nonzero element in each column. No source info is lost in transmission.
9|Page

e.g.
3/4 1/4 0 0 0
( ⁄ )= 0 0 1/3 2/3 0 , where ⁄ =0 1
0 0 0 0 1
8.2 Deterministic Channel

Channel matrix with only nonzero element in each row. The element must be 1.
1 0 0
⎡1 0 0⎤
e.g. ( ⁄ ) = ⎢⎢0 1 0⎥⎥
⎢0 1 0⎥
⎣0 0 1⎦
8.3 Noiseless Channel
It’s both lossless and deterministic, with m = n.
X1 y1
X2 y2
Xm yn
1 =
Where ⁄ =
0 ≠
8.4 Binary Symmetric Channel (BSC)
1−
( ⁄ )=
1−
Example.6:
Consider BSC channel with ( ) =∝
1-p
p
p
1-p
a. Show that:
( ; )= ( ) + log + (1 − ) log (1 − )
10 | P a g e

b. Calculate ( ; ) for α=0.5 and p=0.1

c. Repeat (b) for p=0.5 and comment on the result.
Solution:
a. We have ( ) =∝ , so ( ) = 1−∝
( , )= ( ) ( ⁄ )
1− ∝ (1 − ) ∝
( , )= ∝ 0
=
0 1−∝ 1− (1−∝) (1−∝)(1 − )
( ⁄ ) = −∑ ∑ , log ( ⁄ )
= −∝ (1 − ) log (1 − ) −∝ log − (1−∝) log − (1−∝)(1 − ) log (1 − )

= − log − (1 − ) log (1 − )
I(X; Y) = H(Y) − H(Y⁄X)
= ( ) + log + (1 − ) log (1 − )
b. When α=0.5 and p=0.1
.9 .1
P(Y) = P(X) P(Y/X) = [. 5 .5] = [. 5 .5] = [ ( ) ( )]
.1 .9
( ) = −∑ p ( ) log p ( ) = −.5 log . 5 − .5 log . 5 = 1
log + (1 − ) log (1 − ) = −0.469

Hence I(X; Y) = 1 − 0.469 = 0.531
c. When α=0.5 and p=0.5
.5 .5
P(Y) = P(X) P(Y/X) = [. 5 .5] = [. 5 .5] = [ ( ) ( )]
.5 .5
( ) = −∑ p ( ) log p ( ) = −.5 log . 5 − .5 log . 5 = 1
log + (1 − ) log (1 − ) = −1
Hence I(X; Y) = 1 − 1 = 0
When I(X; Y) = 0, the channel is useless, i.e. when p=0.5 no information is being transmitted at
all.
Exercise.8:
For a lossless channel show that: H(X/Y) = 0
11 | P a g e

Exercise.9:
For a noise channel with inputs = outputs = m, show that
H(X) = H(Y), and H(Y/X) =0
Exercise.10:
Show that: ( , )= ( ⁄ )+ ( )
Exercise.11:
( ⁄ )
Show that: ( ; ) = ∑ ∑ ( , ) log ( )
Exercise.12:
Show that: I(X; Y) = I(Y; X)
Exercise.13:
Show that: I(X; Y) ≥ 0
(Hint: log = − log( ) , and ln ∝≤∝ −1)
9. Channel Capacity
The mutual information also shows the average information per symbol transmitted the system.
And it will be practical as the Shannon has also showed that capacity of channel can be said as the
max practical rate of information. So capacity per symbol Cs of channel is given by:
Cs = max I(X; Y) … Bit/symbol
( )
= max [H(X) – H(X/Y)] = max [H(Y) – H(Y/X)]

If r is symbol rate, then the channel capacity per second C is:
C= r Cs …. Bit/sec
• For lossless channel: Cs=log2 m
• For deterministic channel: Cs=log2 n
• For noiseless channel: Cs=log2 m = log2 n
• For BSC channel: Cs=1 + log + (1 − ) log (1 − )
12 | P a g e

Example.7:
Find the capacity for the following channel (where ( ) =∝)
1-p
p
p
1-p
Solution:
1− 0
( ⁄ )=
0 1−
1− 0
( ) = ( ) ( ⁄ ) = [∝ 1−∝]
0 1−
= [∝ (1 − ) (1−∝)(1 − )]
1− 0
( ⁄ )= ∝ 0
( , )= ( )
0 1−∝ 0 1−
∝ (1 − ) ∝ 0
=
0 (1−∝) (1−∝)(1 − )
( ) = −∑ p ( ) log p ( )
=(1 − )[−∝ log ∝ − (1−∝) log (1−∝)] − log − (1 − ) log (1 − )
( ⁄ ) = −∑ ∑ , log ( ⁄ )
=− log − (1 − ) log (1 − )
( ; )= ( ) − ( ⁄ ) = (1 − ) ( )
Cs=max I(X; Y) = max(1 − p)H(X) = (1 − ) max ( ) = (1 − )
( ) ( ) ( )
(Note: max H(X) = log = log 2 = 1)

( )
13 | P a g e

10. Additive White Gaussian Noise Channel (AWGN)
X y
N
The noise characteristic of channels practically observed are assumed Gaussian. The channel
capacity for this channel is:
C= Max{R}
Or = log (1 + / ) … bit/sec
Where:
B: bandwidth of the channel (Hz)
S/N: signal to noise ratio (SNR)
S: signal power in watt
N: noise power in watt (N=B N0) and N0 is PSD of the noise (W/HZ)
Note: The channel is error-free if and only if ≥
Example.8:
Consider AWGN channel with 4 KHz bandwidth and noise PSD is 2x10-12 W/HZ. The signal
power required at the receiver is 0.1 mW. Calculate the capacity of this channel.
Solution:
We have: B=4000 Hz, S=0.1x10-3 W, N0=2x10-12 W/Hz
N=N0B =2 (10-12) (4000) = 8x10-9 W
. ( )
SNR= S/N = ( )
= 1.25(10 )
Hence = log 1+ = 4000 log [1 + 1.25(10 )] = 54.44 /
Example.9:
The terminal of a computer used to enter alphabetic data is connected to the computer through a
voice grade telephone line having a usable bandwidth of 3 KHz and SNR=10. Assume that
terminal has 128 characters, determine:
a. Capacity of channel
b. The max rate of transmission without error
14 | P a g e

Solution:
a. = log 1+ = 3000 log (1 + 10) = 10.378 /

b. Average information: = log 128 = 7 / ℎ
The rate is =
For error-free transmission: ≤
≤ 7 ≤ 10378
Hence ≤ 1482 ℎ /
Exercise.14:
Calculate the capacity of low pass channel with a usable bandwidth of 3 KHz and SNR=100 at
channel output. Assume the channel noise to be white Gaussian.
Exercise.15:
A discrete signal with 256 samples is transmitted by rate of 104 sample/sec.
a. What is the information rate?
b. Can the output be transmitted without error over AWGN channel with B=10 KHz and
SNR=100?
c. Find the required SNR for error-free transmission for part b
d. Find the required Bandwidth for AWGN channel for error-free transmission if SNR=100.
11. Code Length, Code Efficiency and Redundancy

The length of a code word is the number of bits in the code word (symbol). The average code word
length per source symbol is:
=∑ ( )
Where is the length of symbol in bits.
Code efficiency may be defined as the ratio of actual transmission rate to the max transmission
rate.
( ; )
= ( ; )
( )
Or = . 100%
The redundancy of the channel is defined as

=1−
We know that rate of information
15 | P a g e

= /
Where r is the symbol rate (symbol/sec). If all symbols convey the same amount of information
then, = log so = log
Now consider we are using an encoder that converts the incoming symbols to code words
consisting of bits produced at same fixed rate. Then
= log 2 =
If the symbols have different probabilities, then ≤ or ≥
Here represents a very important parameter; and called average code length denoted by .
For optimum source coding = , but practically ≥
12. Kraft Inequality

A necessary and sufficient condition for a binary code to be uniquely decipherable, the code length
must be such that,
=∑ 2 ≤1
Now simplest coding is that we generate a fixed length code in which all codes have the same
length given by: = , so
= 2
This means that for decipherable (equally identified) codes in the case of fixed length coding, we
need
≥ log
So, the resulting efficiency can be calculated as:
Here the result of this discussion is that if < log and we need higher efficiency, we have to
reduce the average code length . That is why we use variable code length code.
16 | P a g e

Example.10:
For the following codes:
xi Code A Code B Code C Code D
00 0 0 0
01 10 11 100
10 11 100 110
11 110 110 111
a. Show that all codes except code B satisfy Kraft Inequality.

b. Show that code A and D are uniquely decodable but codes B and C are not.
Solution:
a. For Code A: = = = =2
=∑ 2 =1/4+1/4+1/4+1/4 = 1
For Code B: = 1, = = 2, =3
=∑ 2 =1/2+1/4+1/4+1/8 = 9/8 >1
For Code C: = 1, = 2, = =3
=∑ 2 =1/2+1/4+1/8+1/8 = 1
For Code D: = 1, = = =3
=∑ 2 =1/2+1/8+1/8+1/8 = 7/8 <1
b. Codes A and D are prefix-free codes so they are uniquely decodable. Code B is not
uniquely decodable because it does not satisfy Kraft Inequality. Code C satisfy Kraft
Inequality but not uniquely decodable (0110110 corresponds to the ‘x1 x2 x1 x4’ or ‘x1 x4
x4’ )
13. Source Coding Theorem

The last stage of digital system is encoding. So we can say that in communication system; for an
efficient and error free transmission of data; coding techniques are very important. Basically we
have two things in our mind:
(a) Codes generated should be in binary form.
(b) The code should be decipherable means it should be uniquely identified so that at the receiver
side; it should be decoded easily and without any error at the receiver side.
17 | P a g e

An objective of source encoding is to minimize the average bit rate required for representation of
source by reducing the redundancy of information source.
13.1 Prefix Coding

Codes generated by encoder should be unique for minimum error during decoding. But due to this;
every source have limitation on the number of codes generated. For the purpose that we have codes
that are uniquely identified can be solved by using ‘prefix coding ’.
‘A prefix code is defined as a code in which no code word is the prefix of any other code-word’
This can be understand by the following example:
Code I Code II (Prefix code)
S0 0 0
S1 1 10
S2 00 S0 is the prefix 110
S3 11 S1 is the prefix 111
It can be easily see that these code can be easily decoded by decoder as this follows the Kraft
inequality.
It is also found that there are some codes that can follow the Kraft inequality but not prefix code,
so they can be decoded without any error. For example:
Code III
0
01
011
0111
13.2. Shannon-Fano Coding

For variable length coding; if we apply a practical concept that the generally used codes (means
codes with high-probability) should be coded in the minimum length and rarely used codes should
be coded in the long length so that don’t effect the efficiency too much.
Shannon Fano coding generates efficient codes in which the word length increases as the
probability of symbol decreases.
18 | P a g e

In this method; first of all arrange message according to the descending order of probability. Now
draw a line that divides the symbol into two groups such that the group probabilities are as nearly
as equal as possible.
Then assign the digit 0 to each symbol in the group above the line and digit l to each symbol in the
group below the line. For all the subsequent steps, subdivide each group into subgroups and again
assign digits following the previous rule. Whenever a group contains one symbol no further
subdivision is possible and code word for that symbol is completed. When all the groups have been
reduced to one symbol, the code words for each symbol is assigned.
Example.11: For the given message sequence with their probabilities, Apply Shannon Fano
coding, calculate the code efficiency.
[x] = [x1 x2 x3 x4 x5 x6 x7 x8]
[P] = [1/4 1/8 1/16 1/16 1/16 1/4 1/16 1/8]
Solution. Arrange the probabilities in descending order.
Message Prob. Encoding Code Length
X
X1 ¼ 0 0 2
X6 ¼ 0 1 2
X2 1/8 1 0 0 3
X8 1/8 1 0 1 3
X3 1/16 1 1 0 0 4
X4 1/16 1 1 0 1 4
X5 1/16 1 1 1 0 4
X7 1/16 1 1 1 1 4
Now length/message =∑ = 2.75 letters/message

( )=∑ P ( ) log 1/P ( ) = 2.75 bits/message
( ) .
Code efficiency, = . 100 = .
.100 = 100 %
19 | P a g e

Exercise.16:
Apply the Shannon Fano coding and find the code efficiency
[x] = [x1 x2 x3 x4 x5 x6 x7]
[P] = [.4 .2 .12 .08 .08 .08 .04]
(Check answer: efficiency = 96.03 % )
13.3. Huffmann Coding

The Huffmann coding (compact coding) is an optimum coding in the sense that no other uniquely
decodable set of code-words has a smaller average code-word length for a given source.
The Huffmann encoding algorithm proceeds as follows:
1. The source symbols are arranged in the descending order
2. The two source symbols of least probabilities are regarded as being combined into a new source
symbol with probability equal to the sum of the two original probabilities. The probability of the
new symbol is placed in the list in accordance with its value.
3. The procedure is repeated until we are left with a final list of source. Symbol of only two for
which a ‘0’ and a ‘1 ‘are assigned. The code for each symbol is found by working backward and
tracing the sequence of 0s and 1s assigned to that symbol as well as its successor.
Example.12: We have 5 symbols for a discrete source

Xi S0 S1 S2 S3 S4
P (xi) .4 .2 .2 .1 .1
Obtain Huffmann coding, average code word length, entropy of the given system.
Solution (a) Entropy of the system is given by:
( )=∑ P ( ) log 1/P ( ) = 2.12193 bits/message
(b) Huffmann coding can be performed as follow:
20 | P a g e

Symbol Step 1 Final Codes Step 2 Step 3 Step 4
S0 .4 00 .4 00 .4 1 .6 0
S1 .2 10 .2 01 .4 00 .4 1
S2 .2 11 .2 10 .2 01
S3 .1 010 .2 11
S4 .1 011
Thus we have finally

Symbol Code Code length (Ni)
S0 00 2
S1 10 2
S2 11 2
S3 010 3
S4 011 3
(c) Average Code length

=∑ = 2.2 letter/message
Exercise.17:
A message source generates ten messages with probabilities 0.1, 0.13, 0.01, 0.04, 0.08, 0.29, 0.06,
0.22, 0.05 and 0.02. The rate of message generation is 300 message/sec. Find the entropy of source
and information rate. Obtain the Huffmann codes for message and calculate the average number
of bits/message. What is code redundancy?
(Check answer: H= 2.38 bits/message, R=714 bits/sec, = 2.43 letter/message, = 97.94 %,
Redundancy γ= 2.06 %)
21 | P a g e

14. Error Detection and Correction

Tx
Message Source Channel Modulator

Encoder Encoder
Codec Modem Channel
User Source Channel Demodulator

Decoder Decoder
Rx
Model of digital communication system
The goal of the channel encoding and decoding process is to detect (or decode) the data digits with
minimum probability of error. This is an effective way of increasing the channel capacity.
The basic idea of coding is to add a group of check digits to the message digits. The check digits
may then provide the receiver with sufficient information to either detect or correct channel errors.
= +
k: number of message digits
r: number of check digits
n: code word
Single parity check
A simple one error detection with r= 1 bit
k r
xor
Hamming code
A class of linear codes which can correct all patterns of single error in received word.
=2 −1
Block coding
Let the encoding word
22 | P a g e

=[ … … ]
Message check digits

Then
= ( , ,…, )
= ( , ,…, )
…
= ( , ,…, )
For k=3, and r=3: =[ ]
With the following functions for check digits:
= ⨁
= ⨁
= ⨁
(Note: the operator ⨁ is modulo-2 addition)
Then
= 0. ⨁ 1. ⨁ 1.
= 1. ⨁ 0. ⨁ 1.
= 1. ⨁ 1. ⨁ 0.
Or in matrix form:
0 1 1
= 1 0 1
1 1 0
0 1 1 1 0 0
1 0 1 0 1 0 =0
1 1 0 0 0 1
In general:
=0
Where H is r x n matrix called parity check matrix.
23 | P a g e

Decoding process
Let the error vector be
0…
=[ … ] , where =
1…
The received word is
=[ … ]
And = ⊕
The decoder begins computing the syndrome S
=
Now we have two cases:
• If E=0 (no error), then = 0, i.e. = =

• If E≠0 (there’s an error), then:
= = ( + ) = +
But = 0, then =
That’s mean represents a column of H matrix.
Example.13:
For k=3, n=6 and the parity check matrix is:
0 1 1 1 0 0
= 1 0 1 0 1 0
1 1 0 0 0 1
If the received word is R= [0 1 0 0 1 1], check if there’s an error occurred in R, then find the correct
transmitted word C.
Solution:
0
⎡1 ⎤
0 1 1 1 0 0 ⎢ ⎥ 1
= = 1 0 1 0 1 0 ⎢0 ⎥ = 1
0
1 1 0 0 0 1 ⎢ ⎥ 0
⎢1 ⎥
⎣1 ⎦
24 | P a g e

Since S≠0 (an error occurred)

E= [0 0 1 0 0 0]
= ⊕ = [0 1 0 0 1 1] ⊕[0 0 1 0 0 0] = [0 1 1 0 1 1]
Exercise.18:
For k=4, n=7. If the received word is R= [1 1 1 1 0 1 0], check if there’s an error occurred in R,
then find the correct transmitted word C for the flowing functions.
= ⨁ ⨁
= ⨁ ⨁
= ⨁ ⨁
25 | P a g e

Information Theory2 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information Theory2 PDF

Uploaded by

Copyright:

Available Formats

Computer Engineering Department

Dr. Eng. Riyadh J.S. Al-Bahadili

PDF created with pdfFactory Pro trial version www.pdffactory.com

PDF created with pdfFactory Pro trial version www.pdffactory.com

(a) Bird flies

So, , = ( )+ ( ) [proof H.W]

PDF created with pdfFactory Pro trial version www.pdffactory.com

The amount of information in message ml = log (1/P1)

Thus, the total amount of information in ml = P1 L log (1/P1)

The total amount of information in all L messages will be

It = P1 L log (1/P1) + P2 L log (1/P2) + … + PM L log (1/PM)

So, the average information will be

H=It / L = P1 log (1/P1) + P2 log (1/P2) + … + PM log (1/PM)

PDF created with pdfFactory Pro trial version www.pdffactory.com

I(x1) = 1 bit, I(x2) = 2 bits, I(x3) = 3 bits, I(x4) = 3 bits

Solution. By the formula of entropy ∑ log 1/

PDF created with pdfFactory Pro trial version www.pdffactory.com

4. Discrete Memoryless Channel (DMC)

4.1 Conditional Probability

A channel is completely specified by the complete set of transition probabilities:

PDF created with pdfFactory Pro trial version www.pdffactory.com

4.2 Joint Probability

PDF created with pdfFactory Pro trial version www.pdffactory.com

5. Joint Entropy and Conditional Entropy

Thus, marginal entropy

Where P ( ) may be deﬁned as

Joint entropy of x and y

PDF created with pdfFactory Pro trial version www.pdffactory.com

( ⁄ ) = Probability of X when Y has been received

For conditional entropy; we use the relation

6. Relations between the different entropies

PDF created with pdfFactory Pro trial version www.pdffactory.com

• I(X; Y) = H(Y) − H(Y⁄X) = H(X) − H(X⁄Y)

• I(X; Y) = H(Y) + H(X) − H(X, Y)

PDF created with pdfFactory Pro trial version www.pdffactory.com

8.2 Deterministic Channel

PDF created with pdfFactory Pro trial version www.pdffactory.com

b. Calculate ( ; ) for α=0.5 and p=0.1

= −∝ (1 − ) log (1 − ) −∝ log − (1−∝) log − (1−∝)(1 − ) log (1 − )

log + (1 − ) log (1 − ) = −0.469

PDF created with pdfFactory Pro trial version www.pdffactory.com

= max [H(X) – H(X/Y)] = max [H(Y) – H(Y/X)]

PDF created with pdfFactory Pro trial version www.pdffactory.com

=(1 − )[−∝ log ∝ − (1−∝) log (1−∝)] − log − (1 − ) log (1 − )

(Note: max H(X) = log = log 2 = 1)

PDF created with pdfFactory Pro trial version www.pdffactory.com

10. Additive White Gaussian Noise Channel (AWGN)

Hence = log 1+ = 4000 log [1 + 1.25(10 )] = 54.44 /

PDF created with pdfFactory Pro trial version www.pdffactory.com

a. = log 1+ = 3000 log (1 + 10) = 10.378 /

11. Code Length, Code Efﬁciency and Redundancy

The redundancy of the channel is defined as

PDF created with pdfFactory Pro trial version www.pdffactory.com

If the symbols have different probabilities, then ≤ or ≥

For optimum source coding = , but practically ≥

12. Kraft Inequality

PDF created with pdfFactory Pro trial version www.pdffactory.com

a. Show that all codes except code B satisfy Kraft Inequality.

13. Source Coding Theorem

PDF created with pdfFactory Pro trial version www.pdffactory.com

13.1 Prefix Coding

13.2. Shannon-Fano Coding

PDF created with pdfFactory Pro trial version www.pdffactory.com