# • Transition Probability

It is the probability that one event occurs in connection with another event. ie. The transition
probability p(y
j
|x
i
) gives the probability that y
j
i
is transmitted.
 The transition matrix or channel matrix(given below) gives all the probability that y
j
when x
i
is transmitted, for j=1,2,…,m and i=1,2,…,n.
_
P(y
0
|x
0
) P(y
1
|x
0
) … P(y
m
|x
0
)
P(y
0
|x
1
) P(y
1
|x
1
) … P(y
m
|x
1
)
.
P(y
0
|x
n
) P(y
1
|x
n
) … P(y
m
|x
n
)
_
It can be noted that sum of each row is 1.
ie. ∑ P(y
]
|x
ì
)
m
]=1
=1 for all i=1,2,…,n
• Channel Capacity
Consider a discrete memoryless channel (DMC) with input X Æ{x
1
, x
2
,…,x
n
} and transition
probabilities P(y
j
|x
i
) for j=1,2,…,m and i=1,2,…,n. The mutual information is defined as,
I(X; ¥) = ∑ ∑ P(x
ì
, y
]
)
m
]=1
log
2
_
P(¡
]
|x
i
)
P(¡
]
)
_
n
ì=1
. Æ(1)
But , P(x
ì
, y
]
) = P(y
]
|x
ì
)P(x
ì
) Æ (2)
and by Total probability theorem, P(y
]
) = ∑ P(y
]
|x
ì
)P(x
ì
)
n
ì=1
. Æ(3)
It can be observed from (1),(2) and (3) that the mutual information between X and Y depends not only
on the channel, but also in the way it is used. ie. P(x
i
) plays an important role in the calculation of
I(X;Y) as pdf of x
i
is known at the transmitter while pdf of y
j
may not be available most of the time.
 If the probability of occurrence of an input symbol can be maximized, then the uncertainty
associated, at the receiver, while receiving it can be reduced; that means larger mutual information.
 Channel capacity of a Discrete Memoryless Channel (DMC) is the mutual information, with the
input probabilities maximized in a single use of the channel. In other words, it’s the mutual
information when the symbol with maximum probability is transmitted.
 Thus, capacity of a DMC,
C = max
p(x
i
)
I(X; ¥) bits/channel usage

 Capacity of a BSC

Figure 1.
 Consider the binary symmetric channel (BSC) shown in Figure 1, with the following transition
probabilities:
1−p
y
1

y
0

1− p
p
x
0

x
1

p
i) the probabilities of incorrect decisions,
P(y
1
|x
0
) = P(y
0
|x
1
) = p = P(x
1
|y
0
) = P(x
0
|y
1
)
ii) and thus, the probabilities of correct decisions,
P(y
0
|x
0
) = P(y
1
|x
1
) = (1− p)= P(x
1
|y
1
) = P(x
0
|y
0
)
 For a two input system, the maximum probability the inputs can have is ½. Thus,
P(x
0
) = P(x
1
) = 1/2
 The capacity, C = max
p(x
i
)
I(X; ¥)
= P(y
0
|x
0
)P(x
0
) log
2
j
P(x
0

0
)
P(x
0
)
[ +
P(y
1
|x
0
)P(x
0
) log
2
_
P(x
0
|y
1
)
P(x
0
)
_ +
P(y
0
|x
1
)P(x
1
) log
2
_
P(x
1
|y
0
)
P(x
1
)
_ +
P(y
1
|x
1
)P(x
1
) log
2
_
P(x
1
|y
1
)
P(x
1
)
_
=
1 - p
2
log
2
|2(1 -p)] +
p
2
log
2
|2p] +
1 - p
2
log
2
|2(1 -p)] +
p
2
log
2
|2p]
= (1 - p) log
2
|2(1 -p)] +p log
2
|2p]
= (1 - p) log
2
|2] +(1 -p) log
2
|(1 -p)] +
p log
2
|2] + p log
2
|p]
= 1 +(1 -p) log
2
|(1 -p)] +p log
2
|p]
C = 1 - E(X),
where H(X) is the entropy of the source.

 Source Coding

 Source coding of information symbol is required for efficient transmission of the symbols. In
most cases, like image processing, source coding will result in compression of data.
 Symbols can be encoded based on their probability of appearance. ie. the more frequent
symbols are encoded into lesser bits and vice versa. Thus, bit rate can be increased which, in
turn, means improved information rate.
Information
source
Source
Encoder
Channel encoder,
Modulator,
Transmitter
Channel
Channel
Demodulator,
Channel decoder
Source
Decoder
Information
symbols
 Consider the example of encoding the 26 English alphabets. Five binary bits are required for
each alphabet if Fixed length Coding (FLC)is used. But it can be easily observed that the
frequency of appearance of some of the letters like ‘x’, ‘q’, ‘z’ etc. is much lesser than that of
‘e’, ‘i’, ‘a’ etc. So, FLC will result in inefficient transmission of symbols compared to
Variable length Coding (VLC)
 In VLC, symbols are coded based on their frequency (probability) of appearance.
 A Prefix Code vector is one in which no code word forms the prefix of any other code words.
Such codes are also called Uniquely Decodable or Instantaneous codes.
Letter Code word Letter codeword
A 00 E 100
B 001 F 101
C 010 G 110
D 011 H 111
Table 1
Letter Code word Letter codeword
A 0 E 10
B 1 F 11
C 00 G 000
D 01 H 111
Table 2
 Table 1 and Table 2 are examples of different possible VLCs for letters A to H. But it can
noted that the codes in Table 2 are not Prefix codes. Here, the code of A is prefix of the codes
of C, D and G. Similarly, code of B forms the prefix of E, F and H. So, the entries of Table 2
are not prefix codes.
 In general, if a source generates L symbols x
i
with probability P(x
i
) where i= 1,2,…,L , then
the average number of bits/source-symbol is
R = ∑ n(x
ì
)P(x
ì
)
L
ì=1

where n(x
i
) is the length of the code-word for x
i
.
 Kraft inequality
A necessary and sufficient condition for the existence of a binary code with codewords of
lengths n
1
¸ n
2
¸ · ¸ n
L
that satisfies the prefix condition is
2
n
i
L
ì=1
¸ 1
This called Kraft inequality
 Source Coding Theorem
The concept of source coding theorem is as follows: For any prefix code which is used to
represent symbols from a source, the minimum number of bits required to represent the
source symbols, on an average, must be at least equal to the entropy of the source.
 Let X Æ{x
i
, i = 1,2,…,L} be the output symbols of a DMS with finite entropy H(X),
occurring with probabilities P(x
i
). Given these parameters, it is possible to construct a code
(set of codes) that satisfies the prefix condition and has an average length R that satisfies the
inequality
E(X) ¸ R

< E(X) + 1
This is termed as the Source coding theorem.

 Efficiency of a prefix code:
p = E(X)¡ R

.
 It can be noted that η ≤ 1
 Huffman Coding
 It is a Variable length coding, based on symbol probabilities
 Suggested by Huffman in 1952
 Algorithm Steps:
i) Arrange the source symbols in decreasing order of their probabilities
ii) Take the bottom two symbols; tie them together, and add the two probability values. Here,
P
1
and P
2
are two lower probabilities after arranged in decreasing order; P
1
≤ P
2
. Also, label
the branches as ‘0’ and ‘1’ as shown below

iii) Treat the sum P
1
+P
2
as the probability of the new symbol, and is to be considered with
other probability values remaining(replacing P1 and P2).
iv) Now again group two lower probabilities and do the process in step ii. Repeat this until the
sum of the last two probabilities becomes 1.
v) Now the Huffman tree is formed.
vi) The prefix code for any symbol is formed by grouping the ‘1’s or ‘0’s in the path from
final node to the symbol.
¾ An example is given below:
Symbols Probabilities
x
1
0.37
x
2
0.33
x
3
0.16
x
4
0.07
x
5
0.04
x
6
0.02
x
7
0.01

Figure 2. Huffman Tree.
¾ Figure 2 shows the Huffman tree and encoding for seven symbols with the probabilities
marked against them. (The Green dash-line shows the path to symbol x
5
, forming the
prefix code 11110)
¾ Table 3 (given below) shows the prefix codes formed for all the seven symbols.
P
2

P
1

P
1
+P
2

1
1
1
1

0
0
0
0

0.03
0.07
0.14
0.3
0.63
1
Symbols Probabilities Code word
x
1
0.37 0
x
2
0.33 10
x
3
0.16 110
x
4
0.07 1110
x
5
0.04 11110
x
6
0.02 111110
x
7
0.01 111111
Table 3
¾ The entropy of the source,
E(X) = -∑ P(x
ì
)
7
ì=1
log
2
P(x
ì
)
E(X) = -| u.S7 log
2
(u.S7) +u.SSlog
2
(u.SS) +u.16log
2
(u.16) +u.u7log
2
(u.u7) +
u.u4 log
2
(u.u4) +u.u2log
2
(u.u2) +u.u1 log
2
(uu1)]
= 2.1152 bits/symbols
¾ Average number of binary digits per symbol,
R

= n(x
ì
)
7
ì=1
P(x
ì
)
where n(x
i
) is the length of the code of symbol x
i
.
· R

= 1(u.S7) +2(u.SS) + S(u.16) + 4(u.u7) +S(u.u4) + 6(u.u2) + 6(u.u1)
= 2.17 bits/ symbol
¾ The efficiency of the code
p = E(X)¡ R

.
= 2.1152/2.17
ie. p = u.9747

 Channel Coding
 Channel introduces error in the transmitted data through noise, interference etc. There are limits
of errors that an application can afford for reliable decoding of the data. Table 4 shows some of
the applications and their affordable errors.
Application Acceptable error
Speech Telephony 10
-4

Voice band data 10
-6

e-mail, e-newspaper 10
-6

Video telephony, highspeed computing 10
-7

Table 4. Acceptable Probability of error

 Channel coding is the technique of adding redundancies (extra bits) to the information bits so that
the errors while receiving the bits can be detected and corrected.
 Source coding compresses information bits ; but, channel coding increases the size of the data
and thus, more data-rate is required.
 Channel encoding is sometimes referred to as Error control coding.

 Channel coding theorem (Noisy coding theorem)
 This is Shannon’s Second theorem of Information theory. (First one is the source coding
theorem)
 Let X Æ{x
i
, i = 1,2,…,L} be the output symbols that a DMS produces in every T
s
second
with finite entropy H(X), occurring with probabilities P(x
i
). Let ‘C’ be the capacity of a
discrete memoryless channel (DMC) when used once in T
c
seconds.If
E(X)
I
s
¸
C
I
c

then, there exists a coding scheme for which the source output can be transmitted over noisy
channel, and be reconstructed with an arbitrarily low probability of error.
 The parameter C/ T
c
is called the Critical rate.
• Shannon’s Information capacity theorem
It gives the information capacity of a band-limted, power-limited Gaussian channel.
 Consider a stationary random process X(t) that is band-limited to W Hz. Random Variable X
k

is obtained by sampling X(t) at Nyquist rate, ie. 2W Samples/second. The power of X
k
is
P = E|X
k
2
] where k= 1,2,3,…,K
 Now X
k
is transmitted through a channel with AWGN, and the received RV, Y
k
is modeled as
¥
k
= X
k
+N
k
Æ(1)
where N
k
is a Gaussian RV with double-sided psd No/2.
 Capacity of the channel,
C = max
p(x
i
)
I(X; ¥) |
P=E|X
k
2
]
| Æ (1a)
but I(X
k
; ¥
k
) = b(¥
k
) -b(¥
k
|X
k
). Æ(1b)
Also, the uncertainty associated with the received symbol Y
k
in comparison with the
transmitted symbols X
k
is b(¥
k
|X
k
) and is given by
b(¥
k
|X
k
) = b(N
k
)
as noise is the only uncertainty associated in this case.
Substituting this in (1b),
I(X
k
; ¥
k
) = b(¥
k
) - b(N
k
) Æ (1c)
Information
source, source
encoder
Channel
Encoder
Modulator,
Transmitter
Channel
Channel
Demodulator
Channel
decoder
Source coded
symbol/Information
symbols
 As per (1a) , the capacity is the maximization of (1c). To maximize (1c), h(Y
k
) has to be
maximum. But it has been theoretically proved Gaussian RV ahs the maximum entropy.
Therefore, the RV Y has to have Gaussian distribution. That means, random variable X also has
to be Gaussian.
ie. the condition for maximizing the mutual information, in a Gaussian Channel is to send data
 The entropy of a Gaussian RV with variance σ
2
, can be derived as ½ log
2
[2лe σ
2
].
b(X
k
) =
1
2
log
2
|2ncP]
b(N
k
) =
1
2
log
2
|2ncN
0
w]
· b(¥
k
) =
1
2
log
2
|2nc(P + N
0
w)] Æ(1d)

Substituting in (1d) in (1b) and then in (1a),
C =
1
2
log
2
|2nc(P +N
0
w)] -
1
2
log
2
|2ncN
0
w]
C =
1
2
log
2
|2nc(P +N
0
w)¡2ncN
0
w]
C =
1
2
log
2
|1 + P¡N
0
w] bits/channel use
Since the channel is used 2W times per second,
C = 2w_
1
2
log
2
|1 + P¡N
0
w]]
C = wlog
2
|1 + P¡N
0
w] bits/second
OR
C = wlog
2
|1 + P¡N]

| | | | P P P P log log log log P P | P P P | | P P | P 2 2 log2 2 1 log2 2 1 log2 2 1 log2 2 log2 2 log2 2 1 log2 2 log2 2 log2 1 log2 1 log2 1 log2 1 . P(y1|x0) = P(y0|x1) = p = P(x1|y0) = P(x0|y1) and thus. Source Coding Information  source  Source  Encoder  Channel encoder. P(y0|x0) = P(y1|x1) = (1− p)= P(x1|y1) = P(x0|y0) For a two input system. like image processing. P(x0) = P(x1) = 1/2 The capacity.  Channel decoder  Information  symbols  Source coding of information symbol is required for efficient transmission of the symbols.  Modulator. bit rate can be increased which. Thus.  Transmitter  Source  Decoder  Channel  Channel  Receiver. Thus. the more frequent symbols are encoded into lesser bits and vice versa.  Demodulator. In most cases. ie. means improved information rate.i) ii) the probabilities of incorrect decisions. 1 where H(X) is the entropy of the source. in turn. . Symbols can be encoded based on their probability of appearance. source coding will result in compression of data. max P P P P 1 1 1 1 2 2 . the probabilities of correct decisions. the maximum probability the inputs can have is ½.

code of B forms the prefix of E. Given these parameters. on an average. ‘z’ etc. the code of A is prefix of the codes of C. Let X {xi . In general. A Prefix Code vector is one in which no code word forms the prefix of any other code words. then the average number of bits/source-symbol is ∑ P where n(xi) is the length of the code-word for xi. i = 1.Consider the example of encoding the 26 English alphabets. it is possible to construct a code (set of codes) that satisfies the prefix condition and has an average length R that satisfies the inequality 1 This is termed as the Source coding theorem.…. Kraft inequality A necessary and sufficient condition for the existence of a binary code with codewords of that satisfies the prefix condition is lengths 2 This called Kraft inequality 1 Source Coding Theorem The concept of source coding theorem is as follows: For any prefix code which is used to represent symbols from a source. ‘i’. . Letter A B C D Code word Letter 00 E 001 F 010 G 011 H Table 1 Code word Letter 0 E 1 F 00 G 01 H Table 2 codeword 100 101 110 111 Letter A B C D codeword 10 11 000 111 Table 1 and Table 2 are examples of different possible VLCs for letters A to H. must be at least equal to the entropy of the source. But it can be easily observed that the frequency of appearance of some of the letters like ‘x’. Here. Similarly. But it can noted that the codes in Table 2 are not Prefix codes.L . D and G. ‘a’ etc.….L} be the output symbols of a DMS with finite entropy H(X). FLC will result in inefficient transmission of symbols compared to Variable length Coding (VLC) In VLC. F and H.2. Five binary bits are required for each alphabet if Fixed length Coding (FLC)is used. So. occurring with probabilities P(xi).2. is much lesser than that of ‘e’. Such codes are also called Uniquely Decodable or Instantaneous codes. So. symbols are coded based on their frequency (probability) of appearance. the entries of Table 2 are not prefix codes. if a source generates L symbols xi with probability P(xi) where i= 1. ‘q’. the minimum number of bits required to represent the source symbols.

0  0 0 0 0 0  0.07 1 0. forming the prefix code 11110) Table 3 (given below) shows the prefix codes formed for all the seven symbols. label the branches as ‘0’ and ‘1’ as shown below 0  P2  P1+P2  P1  1  iii) Treat the sum P1+P2 as the probability of the new symbol.16 0. P1 and P2 are two lower probabilities after arranged in decreasing order.02 0. Repeat this until the sum of the last two probabilities becomes 1. iv) Now again group two lower probabilities and do the process in step ii. . (The Green dash-line shows the path to symbol x5.Efficiency of a prefix code: /  .04 0. v) Now the Huffman tree is formed. Here.63  1  1 Figure 2 shows the Huffman tree and encoding for seven symbols with the probabilities marked against them.07 0. and is to be considered with other probability values remaining(replacing P1 and P2).03 1 0.3  1 0. It can be noted that η ≤ 1 Huffman Coding It is a Variable length coding.37 0.33 0. based on symbol probabilities Suggested by Huffman in 1952 Algorithm Steps: i) Arrange the source symbols in decreasing order of their probabilities ii) Take the bottom two symbols. vi) The prefix code for any symbol is formed by grouping the ‘1’s or ‘0’s in the path from final node to the symbol. tie them together. and add the two probability values.01 1  Figure 2. Also. An example is given below: Symbols Probabilities x1 x2 x3 x4 x5 x6 x7 0. P1≤ P2. Huffman Tree.14 1 0.

01 = 2.37 2 0. There are limits of errors that an application can afford for reliable decoding of the data. highspeed computing Acceptable error 10-4 10-6 10-6 10-7 Table 4.04 log 0.17 ie. 2. e-newspaper Video telephony.02 0.9747 Channel Coding Channel introduces error in the transmitted data through noise. .37 0.16 0. Table 4 shows some of the applications and their affordable errors. interference etc.Symbols Probabilities x1 x2 x3 x4 x5 x6 x7 0. Acceptable Probability of error Channel coding is the technique of adding redundancies (extra bits) to the information bits so that the errors while receiving the bits can be detected and corrected.02 0.02 6 0. 1 0.04 0.33 0.1152/2.01 log 001 ] = 2.04 0. P where n(xi) is the length of the code of symbol xi.16 log 0.17 bits/ symbol The efficiency of the code /  .07 0.16 4 0.07 5 0.33 0. Application Speech Telephony Voice band data e-mail.37 0.04 6 0. 0.1152 bits/symbols Average number of binary digits per symbol.33 log 0. ∑ P log P 0.07 log 0.07  0.33 3 0.16 0.02 log 0.37 log 0.01 Table 3 Code word 0 10 110 1110 11110 111110 111111 The entropy of the source.

occurring with probabilities P(xi). • Shannon’s Information capacity theorem It gives the information capacity of a band-limted. the uncertainty associated with the received symbol Yk in comparison with the | and is given by transmitted symbols Xk is  | as noise is the only uncertainty associated in this case.….Source coding compresses information bits . The power of Xk is where k= 1. Substituting this in (1b). (1c) . . Capacity of the channel. and be reconstructed with an arbitrarily low probability of error.2.L} be the output symbols that a DMS produces in every Ts second with finite entropy H(X). . ie. Random Variable Xk is obtained by sampling X(t) at Nyquist rate.2. Let ‘C’ be the capacity of a discrete memoryless channel (DMC) when used once in Tc seconds. i = 1.  Transmitter  Channel  Channel  Receiver. source  encoder  Channel  Encoder  Modulator. (First one is the source coding theorem) Let X {xi . | E 2  | (1a) . more data-rate is required. 2W Samples/second. there exists a coding scheme for which the source output can be transmitted over noisy channel. channel coding increases the size of the data and thus. The parameter C/ Tc is called the Critical rate.If   then.…. power-limited Gaussian channel. Yk is modeled as (1) where Nk is a Gaussian RV with double-sided psd No/2. but. Channel encoding is sometimes referred to as Error control coding. Consider a stationary random process X(t) that is band-limited to W Hz.3. (1b) | Also.  Demodulator  Channel  decoder  Source coded  symbol/Information  symbols  Channel coding theorem (Noisy coding theorem) This is Shannon’s Second theorem of Information theory. and the received RV.K E Now Xk is transmitted through a channel with AWGN. max but . Information  source.

ie. To maximize (1c). the capacity is the maximization of (1c).As per (1a) . in a Gaussian Channel is to send data that had Gaussian distribution. 1 log 2 2 1 log 2 2 log 2 Substituting in (1d) in (1b) and then in (1a). The entropy of a Gaussian RV with variance σ2. the RV Y has to have Gaussian distribution. 1 log 1 2 log 1 / log 1 / OR . the condition for maximizing the mutual information. But it has been theoretically proved Gaussian RV ahs the maximum entropy. Therefore. can be derived as ½ log2[2лe σ2]. h(Yk) has to be maximum. random variable X also has to be Gaussian. (1d) 1 log 2 2 1 log 2 2 log 1 2 1 log 2 2 /2 / / bits/second bits/channel use Since the channel is used 2W times per second. That means.