Information Theory Handout BW

3/23/2009
Module IV Part I
INFORMATION THEORY
MODULE III
PART I
Information theory and coding: Discrete messages - amount of information entropy information rate. Coding- Shannons theorem, Channel capacity - capacity of Gaussian channelBandwidth S/N Trade off - Use of orthogonal signal to attain Shannons limit Efficiency of orthogonal signal transmission.
AMOUNT OF INFORMATION
Consider a communication system in which the allowable messages are m1,m2. with probabilities of occurrence p1,p2 Then p1+p2+.=1. Let the transmitter transmit a message mk with probability pk. Let the receiver has correctly identified the message. Then the amount of information conveyed by the system is defined as: Ik= logb (1/pk) where b is the base of log. = -logb pk The base may be 2,10 or e. When the base is 2 the unit of Ik is is bit (binary unit) when it is 10 the unit is Hartley or decit. When the natural logarithmic base is used the unit is nat. Base 2 is commonly used to represent Ik.
AMOUNT OF INFORMATION
The above units are related as :
log 2 a =
ln a log10 a = ln 2 log10 2
1Hartley = 3.32bits 1Nat = 1.44bits
The base of 2 is preferred because in binary PCM the possible messages 0 and 1 occur with likely hood and the amount of information conveyed by each bit is log22=1 bit.
IMPORTANT PROPERTIES OF IK
Ik approaches 0 as pk approaches 1. pk=1 means the receiver already knows the message and there is no need for transmission so Ik=0. Eg: The statement sun rises in the east conveys no information. Ik must be a non-negative quantity since each message contains some information in the worst case Ik=0. The information content of a message having higher probability of occurrence is less than the information content of message having lower probability. As pk approaches 0, Ik approaches infinity. The information content in a highly improbable event approaches unity.
NOTES
When the symbols 0 and 1 of a PCM data occur with equal likely hood with probabilities the amount of information conveyed by each bit is Ik(0) = Ik(1) = log22= 1 bit When the probabilities are different the less probable symbol conveys more information. Let p(0)=1/4 p(1)=3/4 Ik(0)=log2 4=2 bits Ik(1)=log2 4/3=0.42 bit When there are M equally likely and independent messages such that M=2N with N an integer, the information in each message is Ik=log2 M=log2 2N = N bits.
3/23/2009
NOTES
In this case if we are using binary PCM code for representing the M messages the number of binary digits required to represent all the 2N messages is also N. i.e when there are M (=2N) equally likely messages the amount of information conveyed by each message is equal to the number of binary digits needed to represent all the messages. When two independent messages mk and mI are correctly identified the amount of information conveyed is the sum of the information associated with each of the messages individually. 1 1 I I = log 2 I k = log 2 pI pk When the messages are independent the probabilities of the composite message is pkpI.
EXAMPLE
EXAMPLE 1 A source produces one of four possible symbols during each interval having probabilities: p(x1)=1/2, p(x2)=1/4, p(x3) = p(x4) = 1/8. Obtain the information content of each of these symbols. ANS: I(x1)=log22 I(x2)=log24 I(x3)=log28 I(x4)=log28
=1 bit =2 bits =3 bits =3 bits
I k , I = log 2
1 1 1 = log 2 + log 2 = Ik + I I pk p I pk pI
AVERAGE INFORMATION,ENTROPY
Suppose we have M different and independent messages m1,m2 with probabilities of occurrence p1,p2 Suppose further that during a long period of transmission a sequence of L messages has been generated. If L is very large we may expect that in the L message sequence we transmitted p1L messages of m1, p2L messages of m2,etc. The total information in such a message will be:
H pk log 2
k =1 M
1 pk
Average information is also referred to as Entropy. Its unit is information bits/symbol or bits/message.
I Total = p1 L log 2
Average information per message interval is represented by the symbol H is given by:
1 1 + p2 L log 2 + ................... p1 p2
H pk log 2
k =1
1 pk
I Total 1 1 = p1 log 2 + p2 log 2 + ................... L p1 p2
When pk=1, there is only a single possible message and the receipt of that message conveys no information. H = log2 1 = 0 When pk0 amount of information Ik and the average information in this case is :
H 1
Plot of H as a function of p
lim0 p
1 p log 2 = 0 p
HMAX
The average information associated with an extremely unlikely message as well as an extremely likely message is zero. Consider that a source generates two messages with probabilities p and (1-p). The average information per message is :
H = p log 2
1 1 + (1 p ) log 2 (1 p ) p
when p = 1, H = 0
1/2
when p = 0, H = 0
3/23/2009
dH =0 The maximum value of H may be located by setting dp
dH 1 p = 0 log =0 dP p 1 p 1 = 1 1 p = p p = p 2
1 1 H = p log 2 + (1 p ) log 2 (1 p ) p
H = p log 2 p (1 p ) log 2 (1 p )
1 dH 1 = p + l p (1 p ). log ( ) + log(1 p ). 1 l ( ) dp p (1 p )
= 1 log p + 1 + log(1 p )
Similarly when there are 3 messages the average information H becomes maximum when the probability of each of these messages p=1/3. 1/3
1 1 1 H MAX = log 2 3 + log 2 + 3 log 2 3 = log 2 3 3 3 3
= log(1 p ) log p
1 p = log p
Extending this, when there are M messages H becomes a maximum when all the messages are equally likely with p=1/M. In this case each message has a probability 1/M and
H max =
1 log 2 M = log 2 M M
INFORMATION RATE R
Let a source emits symbol at the rate r symbols/second. Then information rate of the source: R= r H information bits/second. R information rate, H entropy of the source r r rate at which symbols are generated. generated R= r (symbols/second) x H (information bits/symbol) R= rH (information bits/second)
EXAMPLE 1
A discrete source emits one of the five symbols once very milliseconds with probabilities 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Determine source entropy and information rate.
H =
P log
i =1 i
1 = Pi
P log
i =1 i
1 1 1 1 1 log 2 2 + log 2 4 + log 2 8 + log 2 16 + log 2 16 2 4 8 16 16 1 1 3 1 1 15 = = 1.875 bits/symbol = + + + + 2 2 8 4 4 8 =
1 Pi
Symbol rate r = f b =
1 1 = = 1000 symbols/sec Tb 10 3 Information rate R = rH = 1000 1.875 = 1875 bits/se c
EXAMPLE 2
The probabilities of five possible outcomes of an experiment are given as 1 1 1 1 P( x1 ) = , P( x2 ) = , P( x3 ) = , P( x4 ) = P( x5 ) = 2 4 8 16 Determine the entropy and information rate if there are 16 outcomes per second.
EXAMPLE 3
An analog signal band limited to 10kHz is quantized into 8 levels of a PCM system with probabilities of 1/4, 1/5, 1/5,1/10, 1/10, 1/20, 1/20 and 1/20 respectively. Find the entropy and rate of information. fm= 10 kHz fs = 2 x 10kHz = 20 kHz 3 Rate at which messages are produced r = fs = 20 10 messages / sec
H ( X ) = P( xi ) log2 g
i =1
Rate of outcomes r = 16 outcomes/sec 15 Rate of information R = rH ( X ) = 6 = 30 bits/sec 8
1 bits/symbol y P( xi ) 1 1 1 1 1 = log2 2 + log 2 4 + log2 8 + log 2 16 + log2 16 2 4 8 16 16 1 2 3 4 4 15 = = 1.875 bits/outcome = + + + + 8 2 4 8 16 16
1 1 1 1 H ( X ) = log2 4 + log2 5 2 + log2 10 2 + log2 20 3 4 5 10 20 = 2.84 bits/messages R = rH ( X )

= 20000 2.84
= 56800 bits/sec
3/23/2009
EXAMPLE 4
Consider a telegraph source having two symbols dot and dash. The dot duration is 0.2s. The dash duration is 3 times the dot duration. The probability of the dots occurring is twice that of the dash and the time between symbols is 0.2s. Calculate the information rate of the telegraph source.
EXAMPLE 4 (Contd..)
Average time per symbol is
T s = P(dot) t (dot) + P(dash) t (dash) + t space

2 1 0.2 + 0.6 + 0.2 3 3 = 0.5333 seconds/sym bol
p(dot) = 2 p(dash)
1 p(dash) = 3
H ( X ) = p(dot) log2
p(dot)) + p(dash)) = 3 p((dash)) = 1 ( (

p(dot) = 2 3
Average symbol rate is r =
1 1 + p(dash) log2 p(dot) p(dash)
1 = 1.875 symbols/sec Ts Average information rate R = rH = 1.875 0.92 = 1.72 b/s
= 0.667 0.585 + 0.333 1.585 = 0.92 b/symbol H ( X ) = 0.92 b/symbol
SOURCE CODING
Let there be M equally likely messages such that If the messages are equally likely, the entropy H becomes maximum and is given by M=2N.
SOURCE CODING
The more likely a message is, the fewer the number of bits that should be used in its code word. Let X be a DMS with finite entropy H(X) and an alphabet { x1, x2,,xm } with corresponding probabilities of occurrence p(xi) where i = 1, 2, 3, ..m. Let the binary code word assigned to symbol xi by the encoder h d have l length ni measured i bit L th d in bits. Length of a code th f d word is the number of bits in the code word. The average code word length L per source symbol is given by
H max = log2 M = log2 2N = N

The number of binary digits needed to encode each message is also N N. So entropy H = N if the messages are equally likely. The average information carried by individual bit is H/N = 1 bit. If however the messages are not equally likely H is less than N and each bit carries less than 1 bit of information. This situation can be corrected by using a code in which not all messages are encoded into the same number of bits.
L = p( xi )n1 + p( x2 )n2 + ... + p( xm )nm
= p( xi )ni
i =1
SOURCE CODING
x1 x2
. . . . . . . . . .p(xm) .
SOURCE CODING
y1 y2
. . .
p(x1) p(x2) p(x3)
n1 n2 n3
The parameter L represents the average number of bits per source symbol used in the source coding process. L Code efficiency is defined as = min where Lmin is the L minimum possible value of L. When approaches unity the code is said to be efficient. Code redundancy is defined as = 1
CHANNEL
. . . . .
DMS
SOURCE CODING
bk
xm
nm
yn
Binary sequence
3/23/2009
SOURCE CODING
The conversion of the output of a DMS into a sequence of binary symbols (binary codes) is called source coding. The device that performs this is called source encoder. If some symbols are known to be more probable than others then we may assign short codes to frequent source symbols and long code words to rare source symbols. Such a code is called a variable length code. As an example, in Morse code the letter E is encoded into a single dot where as the letter Q is encoded as _ _ . _ . This is because in English language letter E occurs more frequently than the letter Q.
SHANNONS SOURCE CODING THEOREM

Source coding theorem states that for a DMS X with entropy H(x) the average code word length per symbol is bounded as
L H ( x)
and L can be made as close to H(x) as desired for some Lmin = H ( x ) suitably chosen code. When the code efficiency i = H ( x ) ffi i is L No code can achieve efficiency greater than 1, but for any source, there are codes with efficiency as close to 1 as desired. The proof does not give a method to find the best codes. It just sets a limit on how good they can be.

Proof of the statement: Length of code H(X) 0 [0 H(X) N]

M 1 i =0
p log
i
qi pi
Consider any two probability distributions { p0 , p1,..., pM 1} and {q0 , q1,...,qM 1} on the alphabet { x0 , x1,..., xM 1} of a discrete memoryless channel.
1 M 1 qi pi 1 ln2 i =0 pi 1 M 1 (qi pi ) ln2 i =0
q q 1 M 1 p i log 2 pi = ln 2 p i ln pi .............. (1) i =0 i =0 i i By a special property of the natural logarithm (ln), we have,
M 1
M 1 1 M 1 qi pi = 0 ln2 i =0 i =0 Thus we obtain the fundamental inequality
M 1
i =0
p llog
i
ln x x 1, x 0
Applying this property to Eq-(1)
qi 0 ..............(2) pi If there are M equally probable messages x1, x2 ,..., xM 1 with probabilities
M 1 i =0
qi 0 pi
p log
i
q1,q2 ,..., qM 1 the entropy of this DMS is given by

M 1 i =0
q log
i
1 = log2 M ..............(3) qi

Also, qi = 1 for i = 0,1,...,M 1 ..............(4) M

H(X) = 0 if and only if the probability pi = 1 for some i and the remaining probabilities in the set are all zero. This lower bound on entropy corresponds to no uncertainty.
Substituting Eq-(4) in Eq- (2)

M 1 i =0
p lo g
i
1 0 pi M
2
log 2 M , sinc e p i = 1
M 1 i =0
M 1 i =0
p log
i i =0
1 M 1 1 + p i log 2 0 pi i =0 M
M 1 1 1 p i log 2 pi M i =0
H ( X ) log 2 M
H(X ) N H( X ) L
H(X) = log2M if and only if pi = 1/M for all i, i.e., all the symbols in the alphabet are equi-probable. This upper bound on entropy corresponds to maximum uncertainty. Proof of the lower bound: Each probability pi is less than or equal to 1. Each term pi log2(1/ pi) is zero if and only if pi = 0 or 1. i.e., pi = 1 for some I and all others are zeroes. Since each probability pi 1, each term pi log2(1/ pi) is always non negative.
if M = 2 N
M 1
pi log 2
If the symbols are equally likely.
M 1 i =0
p log
i
3/23/2009
CLASSIFICATION OF CODES
Fixed Length Code: A fixed length code is one whose code word length is fixed. Code 1 and code 2 of Table 1 are fixed length codes. Variable Length Code: A variable length code is one whose code word length is not fixed. Shannon-Fano and Huffmans codes are examples of variable length codes. Code 3, 4, 5 in the Table 1 are variable length codes codes. Distinct Code: A code is distinct if each code word is distinguishable from other code words. Codes 2,3,4,5 and 6 are distinct codes. Prefix Code: A code in which no code word can be formed by adding code symbols to another code word is called a prefix code. No code word should be a prefix to another. e.g. Codes 2,4 and 6
Uniquely Decodable Code: A code is uniquely decodable if the original source sequence can be reconstructed perfectly from the encoded binary sequence. Code 3 of the table is not uniquely decodable since the binary sequence 1001 may correspond to source sequences x2x3x2 or x2x1x1x2. A sufficient condition to ensure that a code is uniquely decodable i th t no code word i a prefix of another. Th d d bl is that d d is fi f th Thus codes 2,4 and 6 are uniquely decodable codes. Prefix-free condition is not a necessary condition for unique decodability. e.g. code 5 Instantaneous Codes: A code is called instantaneous if the end of any code word is recognizable without examining subsequent code symbols. Prefix-free codes are instantaneous codes e.g. code 6
.
Consider CODING (INSTANTANEOUS CODING) PREFIX a discrete memory-less source of alphabet {x0, x1,,
xi x1 x2 x3 x4
Code 1
Code 2
Code 3
Code 4
Code 5
Code 6
00 01 00 11
00 01 10 11
0 1 00 11
0 10 110 111
0 01 011 0111
1 01 001 0001
Fixed Length Codes: 1,2 Variable Length Code: 3,4,5,6 Distinct Code: 2,3,4,5,6
Prefix Code: 2,4,6 Uniquely Decodable Code: 2,4,6 Instantaneous Codes: 2,4,6
xm-1} with statistics {p0, p1, , pm-1} Let the code word assigned to source symbol xk be denoted by {mk1, mk2, , mkn} where the individual elements are 0s and 1s and n is the code word length. Initial part of code word is represented by mk1, , mki for some in Any sequence made of the initial part of the code word is called a prefix of the code word. A prefix code is defined as a code in which no code word is a prefix of any other code word. It has the important property that it is always uniquely decodable. But the converse is not always true. Thus, a code that does not satisfy the prefix condition is also uniquely decodable.
EXAMPLE 1(Contd)
xi x1 x2 x3 x4 Code A 00 01 10 11
B anc C are not uniquely decodable
EXAMPLE
An analog signal is band-limited to fm Hz and sampled at Nyquist rate. The samples are quantized into 4 levels. Each level represents one symbol. The probabilities of occurrence of these 4 levels (symbols) are p(x1) = p(x4) = 1/8 and p(x2) = p(x3) = 3/8. Obtain the information rate of the source. Answer:
p( x 2 ) = p( x3 ) =
H(X ) =
Code B Code C Code D 0 10 11 110 0 11 100 110 0 100 110 111
3 8
p( x1) = p( x 4 ) =
1 8
Nyquist rate means fs = 2fm

A and D are Prefix codes. A,C,D satisfies Kraft inequality
1 3 8 3 8 1 log2 8 + log2 + log2 + log2 8 8 8 3 8 3 8
= 1.8 bits/symbol.
Rate at which symbols are generated r = 2fm symbols/second

bits symbols R = 2 fm 1.8 symbol = 3.6fm bits/second second
A prefix code always satisfies Kraft inequality. But the converse is not always true.
3/23/2009
EXAMPLE
We are transmitting 3.6fm bits/second. There are four levels, these four levels may be coded using binary PCM as shown below. Symbol Probabilities Binary digits Q1 1/8 00 Q2 3/8 01 Q3 3/8 10 Q4 1/8 11 Two binary digits are needed to send each symbol. Since symbols are sent at the rate 2fm symbols/sec, the transmission rate of binary digits will be:
EXAMPLE
Binary digit rate = 2 binary digits symbols 2 fm symbol second = 4 f m binary digits/sec ond
Since one binary digit is capable of conveying 1 bit of y g p y g information, the above coding scheme is capable of conveying 4fm information bits/sec. But we have seen earlier that we are transmitting 3.6fm bits of information per second. This means that the information carrying ability of binary PCM is not completely utilized by this transmission scheme.
EXAMPLE
In the above example if all the symbols are equally likely ie p(x1)=p(x2)=p(x3)=p(x4)=1/4 With binary PCM coding, maximum information rate is achieved if all messages are equally likely. Often this is difficult to achieve. So we go for alternative coding schemes t i h to increase average i f information per bit ti bit.
SHANNON-FANO CODING PROCEDURE

(i) List the source symbols in the order of decreasing probability. (ii) Partition the set in to two sets that are as close to equiprobables as possible. ( ) (iii) Assign 0s to upper set and 1s to the lower set. g pp (iv) Continue the process each time partitioning the sets with as nearly equal probabilities as possible until further partitioning is not possible. (v) The rows of the table corresponding to the symbol gives the Shannon Fano code.
SHANNON-FANO CODING
Find out the Shannon-Fano Codes corresponding to eight messages m1,m2,m3m7 with probabilities 1/2, 1/8, 1/8, 1/16, 1/16, 1/16, 1/32 and 1/32
Message m1 m2 m3 m4 m5 m6 m7 m8 Probabilities 1/2 1/8 1/8 1/16 1/16 1/16 1/32 1/32 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 0 0 1 1 1 0 1 0 1 1 0 1 Codes 0 100 101 1100 1101 1110 11110 11111 No of bits/message 1 3 3 4 4 4 5 5
SHANNON-FANO CODING
m1 m2 m3 m4 m5 m6 m7 m8 0 1 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1
3/23/2009
SHANNON-FANO CODING
L=
SHANNON-FANO CODING
There are 6 possible messages m1, m2, m3.m6 with probabilities 0.3, 0.25, 0.2, 0.12, 0.08, 0.05 Obtain ShannonFano Codes
i =1
p ( xi )ni
1 1 1 1 1 1 1 1 = 1 + 3 + 3 + 4 + 4 + 4 + 5 + 5 32 32 16 16 16 8 8 2 = 2.31 8 1 H = p ( x i ) log 2 p ( xi ) i =1 1 1 1 1 = log2 2 + log2 8 2 + log2 16 3 + log2 32 2 2 8 16 32 = 2.31 H = = 100% L
Messages Probablities
codes
length
m1 m2 m3 m4 m5 m6
03 0.3 0.25 0. 2 0.12 0.08 0.05
0 0 1 1 1 1
0 1 0 1 1 1
0 1 1
0 1
00 01 10 110 1110 1111
2 2 2 3 4 4
SHANNON-FANO CODING
L=
SHANNON-FANO CODING
A DMS has five equally likely symbols. Construct Shannon-Fano Code
xi x1 x2 x3 x4 x5 P(xi) 0.2 0.2 0.2 0.2 0.2 0 0 1 1 1 0 1 0 1 1 0 1 Codes 0 0 0 1 1 0 1 1 0 1 1 1 Length 2 2 2 3 3
i =1
p ( xi )ni
= 0.3 2 + 0.25 2 + 0.2 2 + 0.12 3 + 0.08 4 + 0.05 4
= 2.38 b / symbol
H =
i =1
p ( x i ) log 2 g
1 1 1 1 + 0.25 log2 + 0.2 log2 + 0.12 log2 0.3 0.25 0.2 0.12 1 1 = 2.36 b / symbol + 0.08 log2 + 0.05 log2 0.05 0.08 H 2.36 = = = 0.99 = 99% Redundancy = 1 = 0.01 = 1% L 2.38 = 0.3 log2
1 p ( xi )
SHANNON-FANO CODING
L =
SHANNON-FANO CODING
A DMS has five symbols x1, x2, x3, x4, x5, construct ShannonFano Code xi P(xi) 0.4 0.19 0.16 0.15 0.1 0 0 1 1 1 0 1 0 1 1 0 1 Codes 0 0 0 1 1 0 1 1 0 1 1 1 Length 2 2 2 3 3
i =1
p ( xi )ni
= 0.2(2 + 2 + 2 + 3 + 3) = 2.4
= 2.4 b / symbol
H =
i =1
p ( x i ) log 2 g
1 p ( xi )
1 = 0.2 log2 5 = 2.32 b / symbol 0.2
x1 x2 x3 x4 x5
H 2.32 = = = 0.967 = 96.7% L 2.4
3/23/2009
HUFFMANN -CODING
(i) List the source symbols in the order of decreasing probability. (ii) Combine the probabilities (add) of two symbols having the lowest probabilities and reorder the resultant probabilities. This process is called bubbling. (iii) During the bubbling process if the new weight is equal to existing probabilities the new branch is to be bubbled to the top of the group having same probabilities. p g p g p (iv) Complete the tree structure and assign a 1 to the branch rising up and 0 to that coming down. (v) From the final point trace the path to the required symbol and order the 0s and 1s encountered in the path to form the code.
EXAMPLES OF HUFFMANNS CODING

x1 (1 1) x2 x3 x4 x5 x6
(0 0)
0.4 0.2 0.1 0.1 0.1 0.1

1 0
0.4 0.2 0.2 0.1 0.1

1 0
0.4 0.2 0.2 0.2

1 0
0.4 0.4 0.2

1 0
0.6 0.4
1 0
(1 0 1) (1 0 0) (0 1 1) (0 1 0)
* It produces the optimum code . * It has the highest efficiency.

x1 (1 1) x2 x3 x4 x5 x6
(0 1) (0 0)

x1
(0)
0.30 0.25 0.20 0.12 0.08 0.05

1 0
0.30 0.25 0.20 0.13 0.12

1 0
0.30 0.25 0.25 0.20

1 0
0.45 0.30 0.25

1 0
0.55 0.45
1 0
0.4 0.19 0.16 0.15 0.1

1 0
0.4 0.25 0.19 0.16

1 0
0.4 0.35 0.25

1 0
0.6 0.4
1 0
x2 x3 x4 x5
(1 1 1) (1 1 0) (1 0 1) (1 0 0)
(1 0 0) (1 0 1 1) (1 0 1 0)

x1
(1 0) (0 1) (0 0)
CHANNEL REPRESENTATION
A communication channel may be defined as the path or medium through which the symbols flow to the receiver. A Discrete Memory-less Channel (DMC) is a statistical model with an input X and an output Y as shown below. During each signalling interval, the channel accepts an input signal from X, and in response it generates an output symbol from Y. The channel is discrete when the alphabets of X and Y are both finite values. It is memory-less when the current output depends only on the current input and not on any of the previous inputs.
0.2 0.2 0.2 0.2 0.2

1 0
0.4 0.2 0.2 0.2

1 0
0.4 0.4 0.2

1 0
0.6 0.4
1 0
x2 x3 x4 x5
(1 1 1) (1 1 0)
3/23/2009
x1 x2
. . . . . . . .
A diagram of a DMC with m inputs and n outputs is shown above. The input X consists of input symbols x1,x2,..xm. The output Y consists of output symbols y1,y2.yn. Each possible input to output path is indicated along with a conditional probability p(yj/xi) which indicates the conditional probability of obtaining output yj given that input is xi and is called channel transition probability. A channel is completely specified by the complete set of transition probabilities. So a DMC is often specified by a matrix of transition probabilities [P(y/x)]
y1 y2
. .
p(yj/xi)
. . . . .
xm
yn
CHANNEL MATRIX
P( y1 | x1 ) P( y2 | x1 ) .................. P( yn | x1 ) P( y | x ) P( y | x ) ................. P( y | x ) 1 2 2 2 n 2 ....................................................................... ....................................................................... P( y1 | xm ) P( y2 | xm ) ................. P( yn | xm )
CHANNEL MATRIX
Matrix [P(y|x)] is called channel matrix. Each row of the matrix specifies the probabilities of obtaining y1,y2.yn, given x1. So, the sum of elements in any row should be unity.
n
p( y
j =1
| xi ) = 1 for all i
If the probabilities P(X) are represented by the row matrix, then we p ( ) p y , have [P(X)] = [p(x1) p(x2) ...p(xm)] The output probabilities P(y) are represented by the row matrix [P(Y)] = [p(y1) p(y2).p(yn)] The output probabilities may be expressed in terms of input probabilities as [P(Y)] = [P(X)] [P(Y|X)]
CHANNEL MATRIX
If [P(X)] is represented as a diagonal matrix
LOSSLESS CHANNEL
A channel described by a channel matrix with only one nonzero element in each column is called a lossless channel. In a lossless channel no source information is lost in transmission.
x1 3/4 1/4 y1 y2 y3 y4 1 y5
p( x1 ) 0 ...................0 0 p( x2 ) .... .........0 .................................... 0 ........... p( xm ) 0

Then [P(X,Y)] = [P(X)]d [P(Y|X)] The (i,j) element of matrix [P(X,Y)] has the form p(xi,yj). The matrix [P(X,Y)] is known as the joint probability matrix and the element p(xi,yj) is the joint probability of transmitting xi and receiving yj.
x2
1/3 2/3
x3
3 4 0 0
1 4 0 0
0 1 3 0
0 2 0 3 0 1 0
10
3/23/2009
DETERMINISTIC CHANNEL
A channel described by a channel matrix with only one nonzero element in each row is called a deterministic channel
DETERMINISTIC CHANNEL
1 1 0 0 0 0 0 1 1 0 0 0 0 0 1
x1 x2 x3 x4 x5 1 1 1 1 1 y1 y2 y3
[P(Y|X)] =
Since each row has only one non-zero element, this element must be unity. When a given source symbol is sent to a deterministic channel, it is clear which output symbol is received.
NOISELESS CHANNEL
A channel is called noiseless if it is both lossless and deterministic. The channel matrix has only one element in each row and each column, and this element is unity. The input and output alphabets are of the same size.
BINARY SYMMETRIC CHANNEL

A binary symmetric channel is defined by the channel diagram shown below and its channel matrix is given by
[ P(Y | X )] =
x1=0 p p x2=1
p 1 p p 1 p
1-p y1=0
x1 x2 x3 xm
1 1 1 1
y1 y2 y3 ym
y2=1 1-p
BINARY SYMMETRIC CHANNEL

The channel matrix has two inputs 0 and 1 and two outputs 0 and 1. This channel is symmetric because the probability of receiving a 1 if a 0 is sent is the same as the probability of receiving a 0 if a 1 is sent. This common transition probability is represented by p.
EXAMPLE 1
p(x1) p(x2)
0.9
y1 y2
0.8
(i) Find the channel matrix of the binary channel. (ii) Find p(y1) and p(y2) when p(x1)=p(x2)=0.5 (iii) Find the joint probabilities p(x1,y2) and p(x2,y1) when p(x1)=p(x2)=0.5
11
3/23/2009
SOLUTION
p(y1 | x1) p(y2 | x1) Channel Matrix p(y | x)= p(y1 | x2 ) p(y2 | x2 )
0.9 0.1 = 0.2 0.8
EXAMPLE 2
Two binary channels of the above example are connected in cascade. Find the overall channel matrix and draw the resultant when equivalent channel diagram. Find p(z1), p(z2) p(x1)=p(x2)=0.5
[P (Y )] = [P ( X )][P (Y | X )]
0.9 0.1 = [0.5 0.5] = [0.55 0.45] 0.2 0.8 p( y1) = 0.55, p(y 2 ) = 0.45
0.5 0 0.9 0.1 0.45 0.05 = = 0 0.5 0.2 0.8 0.1 0.4
x1
0.9
0.9
z1
[P ( X ,Y )] = [P ( X )]d [P (Y | X )]
p(x1, y1) p(x1, y2 ) 0.45 0.05 = p(x2, y1) p(x2, y2 ) 0.1 0.4
x2
0.8
0.8
z2
p( x1, y 2 ) = 0.05 p( x2 , y1) = 0.1
SOLUTION
P (Z | X ) = [P (Y | X )][ P (Z | Y ]
0.9 0.1 0.9 0.1 0.83 0.17 = = 0.1 0.8 0.2 0.8 0.34 0.66
EXAMPLE 3
A channel has the channel matrix
[P (Y | X )] =
0 1 p p p 1 p 0
x1
0.83
z1
(i) Draw the channel diagram (ii) If the source has equally likely outputs compute the p probabilities associated with the channel outputs for p=0.2 p p
[P (Z )] = [P ( X )][P (Z | X )]
x2
P(Z) = [0.5
x1=0
y1=0 y2=e (erasure)
0.66
z2
0.415]
0.83 0.17 0.5] = [0.585 0.34 0.66
x2=1
y3=1
SOLUTION
This channel is known as binary erasure channel (BEC) It has two inputs x1=0 and x2=1 and three outputs y1=0, y2=e, y3=1 where e denotes erasure. This means that the output is in doubt and hence it should be erased
EXAMPLE 5 1
x1
1 3
3
1 3
y1
4 1 4
x2
0 .8 [ P (Y )] = [0 . 5 0 . 5 ] 0 = [0 . 4 0 . 2 0 . 4 ]
0 .2 0 .2
0 0 .8
y2
4 1 4 1 2
x3
(i)
y3
1 2
Find the channel matrix (ii) Find output probabilities if p ( x1 ) = (iii) Find the output entropy H (Y ) .
p( x2 ) = p(x3 ) =
1. 4
12
3/23/2009
SOLUTION1 1 1
3 P [Y / X ] = 1 4 1 4 3 1 2 1 4
3 0.33 0.33 0.33 1 = 0.25 0.5 0.25 4 1 0.25 0.25 0.5 2
MUTUAL INFORMATION AND CHANNEL CAPACITY OF DMC

Let a source emit symbols x1 , x 2 ,..., x m and the receiver receive symbols y1 , y 2 ,..., y n . The set of symbols y k may or may not be identical to the set of symbols x k depending on the nature of the receiver. Several types of probabilities will be needed to deal with the two alphabets x k and y k p
x1
P [Y ] = [P ( X )][P (Y | X )]
17 17 = 7 48 48 24
H (Y ) =
0.33 0.33 0.33 = [0.5 0.25 0.25] 0.25 0.5 0.25 0.25 0.25 0.5
p ( y )log
i =1 i
1 p( y i )
y1 y2 .
7 24 17 48 17 48 = log2 + log2 log2 + 24 7 48 17 48 17
x2 . . . xn
CHANNEL
= 1.58 b/symbols
. . yn
PROBABILITIES ASSOCIATED WITH A CHANNEL

i.
p(xi) is the probability that the source selects symbol xi for transmission ii. p(yi) is the probability that the symbol yj is received. iii. p(xi,yi) is the joint probability that xi is transmitted and yj is received iv. iv p(xi/yj) is the conditional probability that xi was transmitted given that yj is received v. p(yj/xi) is the conditional probability that yj is received given that xi was transmitted.
ENTROPIES ASSOCIATED WITH A CHANNEL

Correspondingly we have the following entropies also:
i. H(X) is the entropy of the transmitter. ii. H(Y) is the entropy of the receiver iii. H(X,Y) is the joint entropy of the transmitted and received symbols iv. H(X|Y) is the entropy of the transmitter with a knowledge of the received symbols. v. H(Y|X) is the entropy of the receiver with a knowledge of the transmitted symbols.
ENTROPIES ASSOCIATED WITH A CHANNEL

H (X ) =
H (Y ) =
m 1 i =0
RELATIONSHIP BETWEEN ENTROPIES

H ( X ,Y ) = p( xi , y j )log2
j = 0 i =0 n 1 m 1
p ( x ) log
i
1 p( xi )
1 p( xi , y j )
m 1
i =0
1 p ( y i ) lo g 2 p(y i )
= p( xi , y j )log2
j = 0 i =0
n 1 m 1
1 p( x i | y j ) p( y j )
n 1 m 1 1 H ( X | Y ) = p( xi , y j )log2 p( x | y ) j =0 i =0 i j
n 1 m 1 1 1 = p( xi , y j ) log2 + log2 p( x i | y j ) p( y j ) j =0 i =0
1 H (Y | X ) = p( xi , y j )log2 p( y | x ) j =0 i =0 j i
n 1 m 1
n 1 m 1 1 1 = p( xi , y j )log2 + p( xi , y j )log2 p( xi | y j ) p( y j ) j =0 i =0
H ( X ,Y ) = p( xi , y j )log2
j =0 i =0
n 1 m 1
1 p( xi , y j )
= H ( X | Y ) + p( xi , y j )log2
j =0 i =0
n 1 m 1
1 p( y j )
13
3/23/2009
RELATIONSHIP BETWEEN ENTROPIES

n 1 m 1 1 H ( X ,Y ) = H ( X | Y ) + p( xi , y j ) log2 p( y j ) j =0 i =0 n 1 1 = H ( X | Y ) + p( y i )log2 m1 p( y j ) j =0 p( xi , y j ) = p(y j )
MUTUAL INFORMATION
If the channel is noiseless then the reception of some symbol yj uniquely determines the message transmitted. Because of noise there is a certain amount of uncertainty regarding the transmitted symbol when yj is received. p(xi|yj) represents the conditional probability that the transmitted symbol was xi given that yj is received. The average uncertainty about x when yj is received is represented as
m1 1 H( X | Y = y j ) = p( xi | y j )log2 i =0 p( xi | y j )
i =0
= H ( X | Y ) + H (Y )
H ( X ,Y ) = H ( X | Y ) + H (Y )
Similarly
H (X ,Y)= H (Y | X ) + H ( X )
The quantity H(X|Y=yj) is itself a random variable that takes on values H(X|Y=y0), H(X|Y=y1),, H(X|Y=yn) with probabilities p(y0), p(y1),, p(yn).
MUTUAL INFORMATION
Now the average uncertainty about X when Y is received is
1 H ( X | Y ) = p( xi | y j )log2 p( y j ) p( x | y ) j = 0 i =0 i j
n 1 m 1
MUTUAL INFORMATION
If the channel were noiseless the average amount of information received would be H(X) bits per received symbol. H(X) is the average amount of information transmitted per symbol. Because of channel noise we lose an average of H(X|Y) information per symbol. Due to this loss the receiver receives on the average H(X) H(X|Y) bits per symbol. p y The quantity H(X) H(X|Y) is denoted by I(X;Y) and is called mutual information.
m 1 1 m1 n 1 1 I ( X ;Y ) = p( xi )log2 p( xi , y j )log2 p( x | y ) i =0 i j p( xi ) i =0 j =0
n 1 m 1 1 = p( xi | y j ) p( y j )log2 p( x | y ) j =0 i =0 i j n 1 m 1 1 = p( xi , y j )log2 p( x | y ) j = 0 i =0 i j
H(X|Y) represents the average loss of information about a transmitted symbol when a symbol is received. It is called equivocation of X w. r. t. Y.
MUTUAL INFORMATION
But
CHANNEL CAPACITY
A particular communication channel has fixed source and destination alphabets and a fixed channel matrix. So the only variable quantity in the expression for mutual information I(X;Y) is the source probability p(xi). Consequently maximum information transfer requires specific source statistics obtained through source coding. A suitable measure of the efficiency of information transfer through a DMS is obtained by comparing the actual information transfer to the upper bound of such transinformation for a given channel. The information transfer in a channel is characterised by mutual information and Shannon named the maximum mutual information as the channel capacity.
p( x , y ) = p( x )
j =0 i j i
n 1
1 1 I(X;Y ) = p(xi , y j )log2 p(xi , y j )log p(x | y ) i =0 j =0 p(xi ) i =0 j =0 i j m1 n1 p(xi / y j ) = p(xi , y j )log p(xi ) i =0 j =0 m 1 n 1 p( xi , y j ) = p( xi , y j )log ...........(1) p( y j ) p( xi ) i = 0 j =0
m1 n1 m1 n1
If we interchange the symbols xi and yj the value of eq(1) is not altered, so we get I(X;Y)=I(Y;X). H(X) H(X|Y)=H(Y) H(Y|X)
Channel capacity C=I(X;Y)max
14
3/23/2009
CHANNEL CAPACITY
Channel capacity C is the maximum possible information transmitted when one symbol is transmitted from the transmitter. Channel capacity depends on the transmission medium, kind of signals, kind of receiver, etc. and it is a property of the system as a whole.
CHANNEL CAPACITY OF A BSC

x1=0 1- y1=0
p
x2=1
y2=1 1 1-
1-p 1
The source alphabet consists of two symbols x1 and x2 with probabilities p(x1)=p and p(x2)=1-p. The destination alphabet is y1,y2. The average error probability per symbol is
p e = p ( x1 ) p ( y 2 | x1 ) + p ( x 2 ) p ( y 1 | x 2 )
= p + (1 p ) =

The error probability of a BSC is 1 The channel matrix is given by [ P (Y | X ) ] = 1 Destination entropy H(Y) is

H 1
Plot of H as a function of
1 1 H ( Y ) = p ( y 1 ) log 2 + p ( y 2 ) log 2 p ( y1 ) p( y2 ) 1 1 = p ( y 1 ) log 2 + (1 p ( y 1 ) ) log 2 p ( y1 ) 1 p ( y1 )

= [ p( y1 )]
HMAX
( x ) = x log 2
1 1 + (1 x ) log 2 1 x x
1/2

The maxima occurs at x=0.5 and Hmax =1 bit/symbol The probability of the output symbol y1 is

H (Y | X ) = p ( xi )
i =1 2
p ( y1 ) = p ( xi , y1 )
x
p( y
j =1
| xi ) log 2
1 p ( y j | xi )
= p ( y1 | x1 ) p( x1 ) + p( y1 | x2 ) p( x2 ) = (1 ) p + (1 p ) ( ( = + p 2 p H (Y ) = ( + p 2 p )
= p ( x1 ) p ( y1 | x1 ) log 2 + p ( x 2 ) p ( y1 | x 2 ) log 2
= p(1 )log2 = (1 )log2
1 1 + p ( x1 ) p ( y 2 | x1 ) log 2 p ( y1 | x1 ) p ( y 2 | x1 ) 1 1 + p ( x 2 ) p ( y 2 | x 2 ) log 2 p ( y1 | x 2 ) p ( y 2 | x2 )
Noise entropy H (Y | X ) = p ( xi , y j ) log 2

j =1 i =1 2 2
1 p ( y j | xi ) 1 p( y j | xi )
1 1 1 1 + p log2 + (1 p) log2 + (1 p)(1 )log2 1 1

= ( ) H (Y | X ) = ( )
= p( xi ) p( y j | xi )log2
j =1 i =1
1 1 + log2 1
15
3/23/2009

I ( X ;Y ) = H (Y ) H (Y | X )
I ( X ;Y ) = ( + p 2 p ) ( )
If the noise is small, error probability <<1 and the mutual information becomes almost the source entropy.

This condition is satisfied by any if p=1/2. So the channel capacity of a BSC can be written as
C = 1 ( )
C = 1 ( )
I ( X ; Y ) = ( p ) = H ( X )
On the other hand if the channel is very much noisy, =1/2.
I ( X ;Y ) = 0
For a fixed , () is a constant, but the other term (+p-2 p) varies with source probability. This term reaches a maximum value of 1 when +p-2 p=1/2
SHANNONS THEOREM ON CHANNEL CAPACITY

i. Given a source of M equally likely messages with M>>1 which is generating information at a rate R, given a channel with channel capacity C, then if RC, there exists a coding technique such that the output of the source may be transmitted over the channel with a probability of error in the received message which may be made arbitrarily small. Given a source of M equally likely messages with M>>1 which is generating information at a rate R; then if R>C, the probability of error is close to unity for every possible set of M transmitter signals.
DIFFERENTIAL ENTROPY H(X)

Consider a continuous random variable X with the probability density function fX(x). By analogy with the entropy of a discrete random variable we can introduce the definition
h( X ) =
1 ( x)log2 dx f X ( x)
ii.
h(x) is called differential entropy of X to distinguish it from the ordinary or absolute entropy. The difference between h(x) and H(X) can be explained as below.

We can view the continuous random variable X as the limiting form of a discrete random variable that assumes the values xk=kx where k=0, 1, 2, and x approaches zero. The continuous random variable X assumes a value in the interval xk, xk + x with probability fX(xk)x. Hence permitting x to approach zero the ordinary entropy of p g pp y py the continuous random variable X may be written in the limit as follows 1 H ( X ) = lim fX ( xk )x log2 x 0 f X ( xk ) x (continuous) k = 1 = lim fX ( xk )log2 x log2 x fX ( xk )x x 0 fX ( xk ) k =

1 ( x )log2 lim dx x 0 log2 x fX ( x )dx fX ( x ) H ( X ) = h( X ) lim log2 x since fX ( x )dx = 1 x 0 (continuous) In the limit as x 0 lim ,log 2 x approaches infinity. x 0 H(X ) . This implies that the entropy of a continuous random variable is infinitely large. A continuous random variable may assume a value anywhere in the interval to + and the uncertainty associated with the variable is on the order of infinity. So we define h(X) as differential entropy with the term log 2 x serving as a reference.
16
3/23/2009
EXAMPLE
A signal amplitude X is a random variable uniformly distributed in the range (-1,1). The signal is passed through an amplifier of gain 2. The output Y is also a random variable uniformly distributed in the range (-2,+2). Determine the differential entropies of X and Y
EXAMPLE
h( x) = 1 2 log 2 2 dx = 1 2 [x ]1 = 1bit
1 1
h( y ) =
log 2 4 dy = 1 4 2[ x]2 = 2bits 2
12 , x <1 fx ( x ) = 0, Otherwise
14 , fy ( y ) = 0, y <2 Otherwise
The entropy of the random variable Y is twice that of X. Here Y=2X and a knowledge of X uniquely determines Y. Hence the average uncertainty about X and Y should be identical. Amplification can neither add nor subtract information. But here h(Y) is twice as large as h(X). This is because h(X) and h(Y) are differential entropies and they will be equal only if their reference entropies are equal.
EXAMPLE
The reference entropy R1 for X is log x and reference entropy R2 for Y is log y In the limit as x, y 0
CHANNEL CAPACITY AND MUTUAL INFORMATION

Let a random variable X is transmitted over a channel. Each value of X in a given continues range is now a message that may be transmitted. e.g. a pulse of height X. The message recovered by the receiver will be a continuous random variable Y. If the channel were noise free the received value Y would uniquely determine the transmitted value X X. Consider the event that at the transmitter a value of X in the interval (x, x+x) has been transmitted (x0). Here the amount of information transmitted is log [1 f X ( x ) x ] since the probability of the above event is fx(x)x. Let the value of Y at the receiver be y and let fx(x|y) is the conditional pdf of X given Y.
R1 = lim log x
x 0
R2 = lim log y
y 0
R1 R2 = lim log g
y = log dy = log d (2 x) dx x , y 0 dx x = log 2 = 1bit i.e., R1 = R2 + 1 bits
R1, reference entropy of X is higher than the reference entropy R2 for Y. Hence if X and Y have equal absolute entropies their differential entropies must differ by 1 bit.
MUTUAL INFORMATION
Then fx(x|y) x is the probability that X will lie in the interval (x, x+x) when Y=y provided x0. There is an uncertainty about the event that X lies in the interval (x,x+x). This uncertainty log [1 f X ( x | y ) x ] arises because of channel noise and therefore represents a loss of information. g[ Because log[1 f X ( x ) x ] is the information transmitted and log [1 f X ( x | y ) x ] is the information lost over the channel the net information received is given by the different between the two. = log [1 f X ( x ) x ] log [1 f X ( x | y ) x ] f ( x | y) = log X f X ( x)
.
MUTUAL INFORMATION
Comparing with the discrete case we can write the mutual information between random variable X and Y as

I ( X ;Y ) =
f ( x | y) f XY ( x, y ) log 2 X dxdy f X ( x)
f XY ( x, y ) log 2
1 dxdy + f XY ( x, y ) log 2 f X ( x | y )dxdy f X ( x)
=
=
( x) fY ( y | x)log2
1 dxdy + f XY ( x, y)log2 f X ( x | y)dxdy f X ( x)

f X ( x)log2
1 dx fY ( y | x)dy + f XY ( x, y)log2 f X ( x | y )dxdy f X ( x)
17
3/23/2009
MUTUAL INFORMATION
Now, f X ( x ) log

MUTUAL INFORMATION
1 dx = h( x ) and fY ( y | x )dy = 1 f X ( x)
I ( x ; y ) = h( x ) +
= h( x )
f XY ( x, y ) log 2 f X ( x | y )dxdy
f XY ( x, y ) log 2 1 dxdy f X ( x | y)
It is the loss of information over the channel. The average of log [1 f X ( x | y )] is the average loss of information over the channel when some x is transmitted and y is received. By definition this quantity is represented by h(x|y) and is called equivocation of X and Y q
h( X | Y ) =
f XY ( x, y )log 2
1 dxdy f x ( x | y)
The second term on the RHS represents the average over x and y of log [1 f X ( x | y )] But this term log [1 f X ( x | y ) ] represents the uncertainty about x when y is received.
I ( X ;Y ) = h ( x ) h ( x | y )
CHANNEL CAPACITY
That is when some value of X is transmitted and when some value of Y is received the average information transmitted over the channel is I(X;Y). Channel capacity C is defined as the maximum amount of information that can be transmitted on the average.
MAXIMUM ENTROPY FOR CONTINUOUS CHANNELS

For discrete random variables the entropy is maximum when all the outcomes were equally likely. For continuous random variables there exists a PDF fx(x) that maximizes h(x). It is found that the PDF that maximizes h(x) is Gaussian distribution given by g y
C = max[ I ( X ; Y )]
C = max[ I ( X ; Y )]
f X ( x) =
1 e 2
( x )2 2 2
Also the random variables X and Y must have the same mean and same variance 2

Consider an arbitrary pair of random variation X and Y Whose PDF are respectively denoted by fy(x) and fx(x) where x is a dummy variable. Adapting the fundamental inequality

fY ( x ) log 2
1 fY ( x ) log 2 f X ( x )dx fY ( x )
h(Y ) fY ( x ) log 2 f X ( x )dx..........(1)

p
k =1
log 2
qk 0 pk
we may write
f ( x) fY ( x ) log 2 X dx 0 fY ( x )
1 + fY ( x ) log 2 f X ( x ) 0 fY ( x )
When the random variable X is Gaussian its PDF is given by ( x ) 1 2 f X ( x) = e 2 .............(2) 2 Substituting (2) in (1)
fY ( x ) log 2
h(Y ) fY ( x )log 2
1 e 2
( x )2 2 2
dx
18
3/23/2009

Converting the logarithm to base e using the relation

It is given that the random variable X and Y has the properties (i) mean=, (ii) variance=2
log 2 ( x ) = log 2 e[log e ( x )]

( x ) 1 2 h (Y ) fY ( x ) log 2 e log e e 2 dx 2 ( x )2 log 2 e fY ( x ) log e 2 dx 2 2
2
fY ( x ) = 1,
(x )
fY ( x )dx = 2
1 h(Y) log2 e loge 2 g g 2 1 log2 e + log2 e.loge 2 2

1 lo g 2 e + lo g 2 2 2 1 1 lo g 2 e + lo g 2 2 2 2 2
1 lo g 2 2 2 e 2
h (Y )
1 lo g 2 2 2 e 2
( x )2 log 2 e fY ( x ) dx fY ( x ) log e 2 dx 2 2


Maximum value of h(Y) is
CHANNEL CAPACITY OF A BAND LIMITED AWGN CHANNEL (SHANNON HARTLEY THEOREM)

The channel capacity C is the maximum rate of information transmission over a channel . The mutual information I(X;Y) is given by I(X;Y)=h(Y)-h(Y|X) The channel capacity is the maximum value of the mutual information I(X;Y). Let a channel is band limited to B Hz and disturbed by di t b d b a white G hit Gaussian noise of PSD ( /2) i i f (/2 Let the signal power be S. The disturbance is assumed to be additive so the received signal y(t)=x(t) + n(t) Because the channel is band limited both the signal x(t) and the noise n(t) are bandlimited to B Hz . y(t) is also bandlimited to B Hz.
h( y ) =
1 log 2 (2 e 2 ) 2
For a finite variance 2 the gaussian random variable has the largest differential entropy attainable by any random variable. py q y y The entropy is uniquely determined by its variance.

All these signals are therefore completely specified by samples taken at the uniform rate of 2B samples / second . Now we have to find the maximum information that can be transmitted per sample . Let x,n and y represent samples of x(t) , n(t) and y(t) . The information I(X;Y) transmitted per sample is given by I(X;Y) = h(Y)-h(Y|X) By definition

h (Y | X ) =

f X ( x ) dx
fY ( y | x ) log 2 (1 fY ( y | x ))dy
For a given x, y is equal to a constant x+n . Hence the distribution of Y when X has a given value is identical to that of n except for a translation by x . If fn represents the PDF of the noise sample n
fY ( y | x ) = f N ( y x )
h (Y | X ) =
f XY ( x , y ) log(1 / fY ( y | x )) dxdy
fY ( y | x)log2 (1 fY ( y | x) dy = f N ( y x)log2 (1 f N ( y x))dy
putting y-x = z
f X ( x ) fY ( y | x )log(1 fY ( y | x ) dxdy
f Y ( y | x ) log 2 (1 f Y ( y | x ) =
f N ( z ) log 2 (1 f N ( z )) dz
19
3/23/2009

h (Y | X ) = h ( z ) = h ( y x ) = h ( n ) h (Y | X ) = h ( n ) I ( X ;Y ) = h ( y ) h (n )
The mean square value of the x(t) = S and the mean square value of the noise = N . 2 Mean square value of y is given by y = S + N Now mean square value of output entropy h(y) is obtained when Y is Gaussian and is given by
I max ( x, y ) = hmax ( y) h(n)

= 1 log(2e( S + N )) h(n) 2
For a white gaussian noise with mean square value N
h (n ) =
Channel capacity per sample CS = I max ( x, y )
1 lo g 2 e N 2
N = B
CS =
h m ax ( y ) =
1 lo g 2 e (S + N ) 2
2 = S + N
1 1 log [2 e ( S + N ) ] [log(2 eN ) ] 2 2 1 2 e ( S + N ) = log 2 2 eN

1 S lo g 1 + 2 N Channel capacity per sample is log( 1+S/N) . There are 2B samples per second .So the channel capacity per second is given by
CS = 1 (S + N ) log 2 N
CAPACITY OF A CHANNEL OF INFINITE BANDWIDTH BW

The Shannon Hartley Theorem indicates that a noiseless Gaussian channel with S/N= infinity has an infinite capacity since
S C = B lo g 2 1 + N
S 1 C = 2 B log 1 + N 2 S = B lo g 1 + b its /s e c o n d N
S C = B log 1 + bits/second N
When the bandwidth B increases the channel capacity does not become infinite as expected because with an increase in BW the noise power also increases. Thus for a fixed signal power and in presence of white Gaussian noise the channel capacity approaches an upper limit with increase in band width .
S C = B log 2 1 + N
Putting N = B
CAPACITY OF A CHANNEL OF INFINITE BANDWIDTH BW

S C = B log 2 1 + B S B S = . log 2 1 + S B
= S S S log 2 1 + B
B
CAPACITY OF A CHANNEL OF INFINITE BANDWIDTH

C = S
1
log 2 e = 1.44
C = 1.44
S s B log 2 1 + B S
This equation indicates that we may trade off bandwidth for signal to noise ratio and vice versa
C = 1 .4 4 S
Putting
S = x B
this expression becomes

1/ x
C =
putting =
log 2 (1 + x )
N B
C = 1 .4 4
SB N
S = 1 .4 4 (B N
As B , x 0
C = lim
x0
log 2 (1 + x )1/ x
For a maximum C we can trade off S/N and B. If S/N is reduced we have to increase the BW . If the BW is to be reduced we have to increase S/N and so on .
20
3/23/2009
ORTHOGONAL SET OF FUNCTIONS

Consider a set of functions g1 ( x ), g 2 ( x ),.........., g n ( x ) defined over the interval x1 x x2 and which are related to one another as below x

The vectors vi and vj are perpendicular when = 90 i.e. vi.vj= 0. The vectors are then said to be orthogonal. In correspondence function whose integrated product is zero are also orthogonal to one another. Consider we have an arbitrary function f(x) and we are interested in f(x) only in the range from x1 and x2 ie over the interval in which the set of functions g(x) are orthogonal. Now we can expand f(x) as a linear sum of the functions gn(x)
g ( x )g ( x ) = 0
i j
x1 i j
If we multiply and integrate the functions over the interval x1 and x2 th result i zero except when th signal are th same. d the lt is t h the i l the A set of functions which has this property is described as being orthogonal over the interval from x1 to x2. The function can be compared to vector v i and v j whose dot product is given by
f ( x ) = c1 g1 ( x ) + c2 g 2 ( x )................ + cn g n ( x )...............(2)
where cs are numerical constants. The orthogonality of the gs makes it easy to compute the coefficients cn. To evaluate cn we multiply both sides of eq(2) by gn(x) and integrate over the interval of orthogonality. .
v i v j cos

x2

2 If we are selecting the denominator of RHS such that gn ( x )dx = 1 x2
f ( x)g ( x)dx = c g ( x)g ( x)dx + c g ( x)g ( x)dx + .....

n 1 1 n 2 2 n x1
x2
x2
x1
....... + cn gn ( x )gn ( x )dx

x1
x2
x1
we have
cn = f ( x)gn ( x )dx..........(3)
x1
x2
x2
x1
Because of orthogonality all of the terms on the right hand side becomes zero with a single exception.
x2
f ( x)g ( x)dx = c g
n n
x2
2 n
( x )dx
2 When orthogonal functions are selected such that gn ( x)dx = 1 x1 they are said to be normalised. y The use of normalised functions has the advantage that cns can be calculated from eq(3) without having to evaluate the
x2
x1
cn =
x1 x2
x1
f ( x ) g n ( x )dx
x2 x1
integral
g
x1
2 n
( x)dx
g n2 ( x )dx
A set of functions which are both orthogonal and normalised is called orthonormal set.
MATCHED FILTERS RECEPTION OF MARY FSK

Let a message source generates M messages each with equal likelihood. Let each message be represented by one of the orthogonal set of signals s1(t), s2(t), ., sn(t). The message interval is T. The signals are transmitted over a communication channel where they are corrupted by Additive White Gaussian Noise (AWGN). At th receiver a d t the i determination of which message h i ti f hi h has b been transmitted is made through the use of M matched filters or correlators. Each correlator consists of a multiplier followed by an integrator. The local inputs to the multipliers are si(t). Suppose that in the absence of noise the signal si(t) is transmitted and the output of each integrator is sampled at the end of a message interval.

s1 (t )
e1 e2
AWGN
SOURCE OF M MESSAGES
s2 (t )
T
. . . . .
. . . . .
sM (t)
eM
21
3/23/2009

Then because of the orthogonality condition all the integrator will have zero output except for the ith integrator whose output will be
T

ei =
T 0 T
( si (t ) + n(t ) ) si (t )dt
T 0
s i 2 ( t )d t
It is adjusted to produce an output of Es, symbol energy. In the presence of an AWGN wave from n(t), the output of the lth correlator will be T T T el = ( si (t ) + n(t )) sl (t )dt = sl (t )n (t )dt + si (t ) sl (t )dt
= si 2 (t )dt + n(t ) si (t )dt = Es + ni

0
= n(t ) sl (t )dt nl 0 The quantity nl is a random variable which is gaussian, which has a mean value of zero and has a mean square value given by 2=Es/2 The correlator corresponding to the transmitted message will have an output
T
0
To determine which message has been transmitted we shall compare the matched filters output e1, e2 .., eM. We may decide that si(t) has been transmitted if the corresponding output ei is larger than the output of any of the filters. The probability that some arbitrarily selected output el is less than the output ei is given by
ei = ( si (t ) + n (t )) si (t )dt
0
p ( e l < ei ) =
1 2
ei
el 2 2 2
d e l ................(1)

The probability that e1 and e2 both are smaller than ei is given by
p( e1 < ei and e2 < ei ) = p( e1 < ei ) p( e2 < ei )

= [ p(e1 < ei )]
2
pL =
Let
1 2
el =x 2
E s + ni
el 2 2 2
de l
M 1
= [ p(e2 < ei )]
When el = , x = When el = Es + ni , x = del = 2 dx Es n + i 2 2
The probability pL that ei is that largest of the outputs is given by
pL = p( ei > e1 , e2 , e3 ,.....eM ) = [ p( el < ei )]M 1
1 2
ei
el 2 2 2
de l
M 1
1 pL =
Es n + i 2
2 e x dx
M 1

2 = Es
2 , =
Es

pL depends on two deterministic parameters Es/ and M and on a single random variable ni/2
Es
2
+ ni 2
Es = 2
Es
1 pL =
x2
dx
M 1
E n pL = pL s , M , i 2
To find the probability that ei is the largest output without reference to the noise output ni of the ith correlator we need to average pL over all possible values of ni. The average is the probability that we shall be correct in deciding that the transmitted signal corresponds to the correlator which yields the largest output. Let this probability be pC. The probability of an error is then pE = 1-pC ni is a random variable (gaussian) with zero mean and variance 2 . Hence the average value of pL considering all possible values of ni is given by
..........( 4 )
E n pL = f s , M , i 2
22
3/23/2009

pC = 1 2
EFFICIENCY OF ORTHOGONAL SIGNAL TRANSMISSION

pe
E ni pL s , M , 2
2 i 2 dni e
n
The above equation is plotted after evaluation by numerical integration by a computer

1 0.5 0.1
2 10-2
If y =
ni 2
M
and using eq (4)
M=2 M=4 M=16
1 pC =
e y
pe = 1 pc
Es
+y
2 e x dx
M 1
dy
10-3 10-4 10-5
M=1024 M=2048 M
ln 2
0.71 2 3 6 10 20
Si =
Si
log M
2
EFFICIENCY OF ORTHOGONAL SIGNAL TRANSMISSION

pe = f ( M , Es )
Abscissa is Es =
=
Efficiency of Orthogonal Signal Transmission :

Observations About the Graph
For all M, pe decreases as
Si
increases. As
Si
, pe 0
log M
2
E sT T log2 M
SiT
Put
Es = Si T
For M ,pe=0 provided
Si
ln 2 and pe=1 otherwise
log M
2
Si
logT M
2
Put P
log 2 M g =R T
Si
For fixed M and R, pe decreases as as the noise density decreases. d For fixed M and , pe decreases as the signal power goes up. For fixed Si, and M, pe decreases as we allow more time T for the transmission of a single message or the rate R is decreased. For fixed Si, and T, pe decreases as M decreases As M, the number of messages increases, the error probability reduces.

As M the error probability pe 0 provided As M the bandwidth (B = 2Mfs) B Maximum allowable errorless transmission rate Rmax is the channel capacity.

As M increases, the bandwidth is increased and the number of matched filters increases and so does the circuit complexity Errorless transmission is really possible as predicted by Shannons Theorem provided Rmax=1.44 Si/ if M As we are required to transmit information at the same fixed rate R in the presence of fixed noise power spectral density with fixed error probability and fixed M, we have to control signal power Si
Si
ln 2
Si
Rmax
= ln 2
R max =
Si
ln 2
= 1.44
Si
The maximum rate obtained for this M ary FSK is the same as that obtained by Shannons Theorem
23

Information Theory Handout BW

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information Theory Handout BW

Uploaded by

Copyright:

Available Formats

3/23/2009

1Hartley = 3.32bits 1Nat = 1.44bits

=1 bit =2 bits =3 bits =3 bits

I Total 1 1 = p1 log 2 + p2 log 2 + ................... L p1 p2

1 1 1 H MAX = log 2 3 + log 2 + 3 log 2 3 = log 2 3 3 3 3

1 1 1 1 1 log 2 2 + log 2 4 + log 2 8 + log 2 16 + log 2 16 2 4 8 16 16 1 1 3 1 1 15 = = 1.875 bits/symbol = + + + + 2 2 8 4 4 8 =

1 1 = = 1000 symbols/sec Tb 10 3 Information rate R = rH = 1000 1.875 = 1875 bits/se c

Rate of outcomes r = 16 outcomes/sec 15 Rate of information R = rH ( X ) = 6 = 30 bits/sec 8

1 bits/symbol y P( xi ) 1 1 1 1 1 = log2 2 + log 2 4 + log2 8 + log 2 16 + log2 16 2 4 8 16 16 1 2 3 4 4 15 = = 1.875 bits/outcome = + + + + 8 2 4 8 16 16

1 1 1 1 H ( X ) = log2 4 + log2 5 2 + log2 10 2 + log2 20 3 4 5 10 20 = 2.84 bits/messages R = rH ( X )

T s = P(dot) t (dot) + P(dash) t (dash) + t space

p(dot)) + p(dash)) = 3 p((dash)) = 1 ( (

Average symbol rate is r =

1 1 + p(dash) log2 p(dot) p(dash)

1 = 1.875 symbols/sec Ts Average information rate R = rH = 1.875 0.92 = 1.72 b/s

= 0.667 0.585 + 0.333 1.585 = 0.92 b/symbol H ( X ) = 0.92 b/symbol

H max = log2 M = log2 2N = N

L = p( xi )n1 + p( x2 )n2 + ... + p( xm )nm

p(x1) p(x2) p(x3)

SHANNONS SOURCE CODING THEOREM

SHANNONS SOURCE CODING THEOREM

SHANNONS SOURCE CODING THEOREM

1 M 1 qi pi 1 ln2 i =0 pi 1 M 1 (qi pi ) ln2 i =0

M 1 1 M 1 qi pi = 0 ln2 i =0 i =0 Thus we obtain the fundamental inequality

q1,q2 ,..., qM 1 the entropy of this DMS is given by

SHANNONS SOURCE CODING THEOREM

SHANNONS SOURCE CODING THEOREM

Substituting Eq-(4) in Eq- (2)

If the symbols are equally likely.

B anc C are not uniquely decodable

Code B Code C Code D 0 10 11 110 0 11 100 110 0 100 110 111

Nyquist rate means fs = 2fm

1 3 8 3 8 1 log2 8 + log2 + log2 + log2 8 8 8 3 8 3 8

Rate at which symbols are generated r = 2fm symbols/second

SHANNON-FANO CODING PROCEDURE

1 1 1 1 1 1 1 1 = 1 + 3 + 3 + 4 + 4 + 4 + 5 + 5 32 32 16 16 16 8 8 2 = 2.31 8 1 H = p ( x i ) log 2 p ( xi ) i =1 1 1 1 1 = log2 2 + log2 8 2 + log2 16 3 + log2 32 2 2 8 16 32 = 2.31 H = = 100% L

03 0.3 0.25 0. 2 0.12 0.08 0.05

00 01 10 110 1110 1111

= 0.3 2 + 0.25 2 + 0.2 2 + 0.12 3 + 0.08 4 + 0.05 4

1 = 0.2 log2 5 = 2.32 b / symbol 0.2

H 2.32 = = = 0.967 = 96.7% L 2.4

EXAMPLES OF HUFFMANNS CODING

0.4 0.2 0.1 0.1 0.1 0.1

0.4 0.2 0.2 0.1 0.1

0.4 0.2 0.2 0.2

0.4 0.4 0.2

* It produces the optimum code . * It has the highest efficiency.

EXAMPLES OF HUFFMANNS CODING

EXAMPLES OF HUFFMANNS CODING

0.30 0.25 0.20 0.12 0.08 0.05

0.30 0.25 0.20 0.13 0.12

0.30 0.25 0.25 0.20

0.45 0.30 0.25

0.4 0.19 0.16 0.15 0.1

0.4 0.25 0.19 0.16

0.4 0.35 0.25

EXAMPLES OF HUFFMANNS CODING

0.2 0.2 0.2 0.2 0.2

0.4 0.2 0.2 0.2

0.4 0.4 0.2

p( x1 ) 0 ...................0 0 p( x2 ) .... .........0 .................................... 0 ........... p( xm ) 0

BINARY SYMMETRIC CHANNEL

BINARY SYMMETRIC CHANNEL