Professional Documents
Culture Documents
Information Theory
Second Year
Information Theory
Information theory provides a quantitative measure of info contained in message signals and
allows us to determine the capacity of a communication system to transfer this info from source
to destination. Information theory is originally known as ‘Mathematical Theory of
Communication’ and it deals with mathematical modeling and analysis of a communication system
rather than with physical channel.
By the information theory, we can consider the efficient way to communicate the data. It basically
provides limits on:
1. The minimum number of bits per symbol required to fully represent the source.
2. The maximum rate at which reliable communication can take over the channel.
1. Concept of Information
An info source is an object that produces an event, the outcome of which is selected at random
according to a probability distribution.
A discrete info source is a source that has only a finite set of symbols as outputs. The set of source
symbols is called the source alphabet, and the elements of the set are called symbols or letters. Info
sources can be classified as having memory or being memoryless.
A memory source is one for which a current symbol depends on the previous symbols. A
memoryless source is one for which each symbol produced is independent of the previous symbols.
The communication system never be described in the deterministic sense, it can be considered of
Statistical nature. It means to describe a communication system completely we have to use its
unpredictable ‘or’ uncertain behavior.
It can be easily understand by example that each transmitter transmit the information randomly,
we cannot predict which one message, and transmitter is going to be transfer just next moment.
But we know the probability of transmitting a particular message.
So to define a system completely, we need statistical study of system and statistical study of system
is performed with the help of concept of probability.
Now, just an example of two messages:
1|Page
( ) = log = − log ( )
( )
Example.1:
How many bits per symbol to encode 32 different symbols?
We have M=32 symbols, so P(x) = 1/32, ( ) = log 32 = 5 /
The advantage of logarithm presentation is that if we have probability of joint event ( , ) and
if both are statistically independent
, = ( ). ( )
Example.2:
The symbols A, B, C, and D occur with probability ½, ¼, 1/8, and 1/8 respectively. Find the info
content in the message ‘BDA’, where the symbols are independent.
I (BDA) = I (B) + I (D) + I (A) = log 4 + log 8 + log 2 =6 bits
As we know that the base of logarithm can be different, so we may have different units of
information:
• Bits (Base 2)
• Nats (Base e)
• Decits (Base 10)
2|Page
e.g.
1
1 = = 0.6932
log
1
1 = = 0.3010
log 10
1
1 = = 3.219
log 2
2. Entropy
In the communication system, we don’t have only a single message, but have a number of
messages. So to calculate the total information instead of calculating information due to individual
messages and adding them, we calculate the average information of the system knows as entropy
of the source.
Let there be M different messages m1, m2... mM with their respective probability P1, P2, PM.
Let us assume that in a long time interval, L messages have been generated. Let L be very large so
that L > > M; then the number of messages m1=P1 L
Or =∑ log 1/
Thus the unit of entropy wi1l be information/message. I (x) is called self-information and simply
H (x) is called self-entropy.
3|Page
Example.3:
A discrete source has 4 symbols x=[x1, x2, x3, x4] with probability P= [1/2, 1/4, 1/8, 1/8]. Find
info content in each symbol then calculate the entropy. Also calculate the bit average for the
message ‘x1 x2 x1 x4 x3 x1 x1 x2’.
( ) = log
( )
3. Rate of Information
If a message source generates message (or symbols) at the rate of r messages (or symbols) per
second; then rate of information
R= r H bits/second
Example.4:
An event has total six outcomes with probability P1=1/2, P2=1/4, P3=1/8, P4=1/16, P5=1/32, and
P6=1/32. Find the entropy of the system. Also find the rate of information if these are 18 outcomes
per second.
4|Page
Exercise.4: A telegraph source has two symbols (dot and dash). The time of dot is 0.5 sec, the
time of dash is 3 times of dot time, and the time between symbols is 0.2 sec. The probability of
dot’s occurring is twice that of the dash. Find the average rate of info for this telegraph.
(Check answer: R= 1.725 bit/sec)
xi yj
xm yn
Let
P(X) = [p (x1) p (x2) … p (xm)]
P(Y) = [p (y1) p (y2) … p (yn)]
Then
( )= ( ) ( / )
5|Page
, = ( ) ⁄ = ( ) ( ⁄ )
Note:
( )=∑ ( , ) And =∑ ( , )
Example.5:
For the following binary channel (p(x1) = p(x2) =1/2)
X1 .9 y1
0.1
0.2
X2 0.8 y2
a. Construct the channel matrix for this channel
b. Find p(y1) and p(y2)
c. Find the joint probability p(x1, y2) and p(x2,y1)
Solution:
( 1⁄ 1) ( 2⁄ 1) .9 .1
(a) ( ⁄ )= =
( 1⁄ 2) ( 2⁄ 2) .2 .8
(b) P(X) = [p(x1) p(x2)] = [ 0.5 0.5]
P(Y) = P(X) P(Y/X)
.9 .1
= [0.5 0.5] = [.55 .45] = [p(y1) p(y2)]
.2 .8
(c) ( , ) = ( ) ( ⁄ )
.5 0 .9 .1 . 45 . 05
= =
0 .5 .2 .8 .1 .4
Hence p(x1, y2) =.05 and p(x2, y1) =.1
6|Page
Exercise.5:
The following channel matrix has P(X) = [.5 .5]
1− 0
( ⁄ )=
0 1−
a. Draw the channel
b. Find P(Y) if p=0.2
(Note: 0 ≤ ( )≤ )
( , ) = joint probability of event x and y
Note that ( ⁄ ) and ( ⁄ ) are called conditional probability that will be clear by their
definitions.
7|Page
( ⁄ ) = −∑ ∑ , log ( ⁄ )
Similarly
( ⁄ ) = −∑ ∑ , log ( ⁄ )
( , )= ( ⁄ )+ ( )= ( ⁄ )+ ( )
Exercise.6:
Find H(X), P(X, Y), and H(Y) for given channel shown in figure, given that P(X1) =0.2, P(X2)
=0.5, and P(X3) =0.3
0.8
X1 Y1
0.2
X2 1 Y2
0.3
X3 0.7 Y3
8|Page
Exercise.7:
A transmitter has an alphabet of four letters [x1, x2, x3, x4] and the receiver has an alphabet of
three letters. Calculate all entropies if the joint probability matrix is:
0.3 0.05 0
P(X, Y) = 0 0.25 0
0 0.15 0.05
0 0.05 0.15
(Check answer: H(X) = 1.96, H(Y/X) =0.53, H(X, Y) =2.49, H(Y) =1.49, H(X/Y) =1.0)
7. Mutual Information
We have
( ) = Probability of transmitting
( ⁄ ) = Probability of transmitting , when has been received
Thus ( ) shows the probability ‘or’ uncertainty of x; when we have not received any thing called
prior uncertainty and ( ⁄ ) called final uncertainty of ‘x’ when we have received that at
receiver side and the difference of these uncertainties called mutual information.
; Mutual information represents the uncertainty about input that is resolved by observing
the output.
( ⁄ )
( ; )=∑ ∑ ( , ) log
( )
Properties of ( ; )
• I(X; Y) = I(Y; X)
• I(X; Y) ≥ 0
8. Channel Types
8.1 Lossless Channel
Channel matrix with only nonzero element in each column. No source info is lost in transmission.
9|Page
e.g.
3/4 1/4 0 0 0
( ⁄ )= 0 0 1/3 2/3 0 , where ⁄ =0 1
0 0 0 0 1
Xm yn
1 =
Where ⁄ =
0 ≠
8.4 Binary Symmetric Channel (BSC)
1−
( ⁄ )=
1−
Example.6:
Consider BSC channel with ( ) =∝
1-p
p
p
1-p
a. Show that:
( ; )= ( ) + log + (1 − ) log (1 − )
10 | P a g e
log + (1 − ) log (1 − ) = −1
Hence I(X; Y) = 1 − 1 = 0
When I(X; Y) = 0, the channel is useless, i.e. when p=0.5 no information is being transmitted at
all.
Exercise.8:
For a lossless channel show that: H(X/Y) = 0
11 | P a g e
Exercise.9:
For a noise channel with inputs = outputs = m, show that
H(X) = H(Y), and H(Y/X) =0
Exercise.10:
Show that: ( , )= ( ⁄ )+ ( )
Exercise.11:
( ⁄ )
Show that: ( ; ) = ∑ ∑ ( , ) log ( )
Exercise.12:
Show that: I(X; Y) = I(Y; X)
Exercise.13:
Show that: I(X; Y) ≥ 0
(Hint: log = − log( ) , and ln ∝≤∝ −1)
9. Channel Capacity
The mutual information also shows the average information per symbol transmitted the system.
And it will be practical as the Shannon has also showed that capacity of channel can be said as the
max practical rate of information. So capacity per symbol Cs of channel is given by:
Cs = max I(X; Y) … Bit/symbol
( )
12 | P a g e
Example.7:
Find the capacity for the following channel (where ( ) =∝)
1-p
p
p
1-p
Solution:
1− 0
( ⁄ )=
0 1−
1− 0
( ) = ( ) ( ⁄ ) = [∝ 1−∝]
0 1−
= [∝ (1 − ) (1−∝)(1 − )]
1− 0
( ⁄ )= ∝ 0
( , )= ( )
0 1−∝ 0 1−
∝ (1 − ) ∝ 0
=
0 (1−∝) (1−∝)(1 − )
( ) = −∑ p ( ) log p ( )
( ⁄ ) = −∑ ∑ , log ( ⁄ )
=− log − (1 − ) log (1 − )
( ; )= ( ) − ( ⁄ ) = (1 − ) ( )
Cs=max I(X; Y) = max(1 − p)H(X) = (1 − ) max ( ) = (1 − )
( ) ( ) ( )
13 | P a g e
X y
N
The noise characteristic of channels practically observed are assumed Gaussian. The channel
capacity for this channel is:
C= Max{R}
Or = log (1 + / ) … bit/sec
Where:
B: bandwidth of the channel (Hz)
S/N: signal to noise ratio (SNR)
S: signal power in watt
N: noise power in watt (N=B N0) and N0 is PSD of the noise (W/HZ)
Note: The channel is error-free if and only if ≥
Example.8:
Consider AWGN channel with 4 KHz bandwidth and noise PSD is 2x10-12 W/HZ. The signal
power required at the receiver is 0.1 mW. Calculate the capacity of this channel.
Solution:
We have: B=4000 Hz, S=0.1x10-3 W, N0=2x10-12 W/Hz
N=N0B =2 (10-12) (4000) = 8x10-9 W
. ( )
SNR= S/N = ( )
= 1.25(10 )
Example.9:
The terminal of a computer used to enter alphabetic data is connected to the computer through a
voice grade telephone line having a usable bandwidth of 3 KHz and SNR=10. Assume that
terminal has 128 characters, determine:
a. Capacity of channel
b. The max rate of transmission without error
14 | P a g e
Solution:
( )
Or = . 100%
15 | P a g e
= /
Where r is the symbol rate (symbol/sec). If all symbols convey the same amount of information
then, = log so = log
Now consider we are using an encoder that converts the incoming symbols to code words
consisting of bits produced at same fixed rate. Then
= log 2 =
Here represents a very important parameter; and called average code length denoted by .
=∑ 2 ≤1
Now simplest coding is that we generate a fixed length code in which all codes have the same
length given by: = , so
= 2
This means that for decipherable (equally identified) codes in the case of fixed length coding, we
need
≥ log
So, the resulting efficiency can be calculated as:
Here the result of this discussion is that if < log and we need higher efficiency, we have to
reduce the average code length . That is why we use variable code length code.
16 | P a g e
Example.10:
For the following codes:
xi Code A Code B Code C Code D
00 0 0 0
01 10 11 100
10 11 100 110
11 110 110 111
For Code C: = 1, = 2, = =3
=∑ 2 =1/2+1/4+1/8+1/8 = 1
For Code D: = 1, = = =3
=∑ 2 =1/2+1/8+1/8+1/8 = 7/8 <1
b. Codes A and D are prefix-free codes so they are uniquely decodable. Code B is not
uniquely decodable because it does not satisfy Kraft Inequality. Code C satisfy Kraft
Inequality but not uniquely decodable (0110110 corresponds to the ‘x1 x2 x1 x4’ or ‘x1 x4
x4’ )
17 | P a g e
An objective of source encoding is to minimize the average bit rate required for representation of
source by reducing the redundancy of information source.
It can be easily see that these code can be easily decoded by decoder as this follows the Kraft
inequality.
It is also found that there are some codes that can follow the Kraft inequality but not prefix code,
so they can be decoded without any error. For example:
Code III
0
01
011
0111
18 | P a g e
In this method; first of all arrange message according to the descending order of probability. Now
draw a line that divides the symbol into two groups such that the group probabilities are as nearly
as equal as possible.
Then assign the digit 0 to each symbol in the group above the line and digit l to each symbol in the
group below the line. For all the subsequent steps, subdivide each group into subgroups and again
assign digits following the previous rule. Whenever a group contains one symbol no further
subdivision is possible and code word for that symbol is completed. When all the groups have been
reduced to one symbol, the code words for each symbol is assigned.
Example.11: For the given message sequence with their probabilities, Apply Shannon Fano
coding, calculate the code efficiency.
[x] = [x1 x2 x3 x4 x5 x6 x7 x8]
[P] = [1/4 1/8 1/16 1/16 1/16 1/4 1/16 1/8]
Solution. Arrange the probabilities in descending order.
Message Prob. Encoding Code Length
X
X1 ¼ 0 0 2
X6 ¼ 0 1 2
X2 1/8 1 0 0 3
X8 1/8 1 0 1 3
X3 1/16 1 1 0 0 4
X4 1/16 1 1 0 1 4
X5 1/16 1 1 1 0 4
X7 1/16 1 1 1 1 4
19 | P a g e
Exercise.16:
Apply the Shannon Fano coding and find the code efficiency
[x] = [x1 x2 x3 x4 x5 x6 x7]
[P] = [.4 .2 .12 .08 .08 .08 .04]
(Check answer: efficiency = 96.03 % )
2. The two source symbols of least probabilities are regarded as being combined into a new source
symbol with probability equal to the sum of the two original probabilities. The probability of the
new symbol is placed in the list in accordance with its value.
3. The procedure is repeated until we are left with a final list of source. Symbol of only two for
which a ‘0’ and a ‘1 ‘are assigned. The code for each symbol is found by working backward and
tracing the sequence of 0s and 1s assigned to that symbol as well as its successor.
20 | P a g e
S0 .4 00 .4 00 .4 1 .6 0
S1 .2 10 .2 01 .4 00 .4 1
S2 .2 11 .2 10 .2 01
S3 .1 010 .2 11
S4 .1 011
21 | P a g e
Rx
Model of digital communication system
The goal of the channel encoding and decoding process is to detect (or decode) the data digits with
minimum probability of error. This is an effective way of increasing the channel capacity.
The basic idea of coding is to add a group of check digits to the message digits. The check digits
may then provide the receiver with sufficient information to either detect or correct channel errors.
= +
k: number of message digits
r: number of check digits
n: code word
Single parity check
A simple one error detection with r= 1 bit
k r
xor
Hamming code
A class of linear codes which can correct all patterns of single error in received word.
=2 −1
Block coding
Let the encoding word
22 | P a g e
=[ … … ]
23 | P a g e
Decoding process
Let the error vector be
0…
=[ … ] , where =
1…
The received word is
=[ … ]
And = ⊕
The decoder begins computing the syndrome S
=
Example.13:
For k=3, n=6 and the parity check matrix is:
0 1 1 1 0 0
= 1 0 1 0 1 0
1 1 0 0 0 1
If the received word is R= [0 1 0 0 1 1], check if there’s an error occurred in R, then find the correct
transmitted word C.
Solution:
0
⎡1 ⎤
0 1 1 1 0 0 ⎢ ⎥ 1
= = 1 0 1 0 1 0 ⎢0 ⎥ = 1
0
1 1 0 0 0 1 ⎢ ⎥ 0
⎢1 ⎥
⎣1 ⎦
24 | P a g e
25 | P a g e