You are on page 1of 4

Today’s Topics

Information Theory • Communication Channel


• Noiseless binary channel
Mohamed Hamada • Binary Symmetric Channel (BSC)
Software Engineering Lab
The University of Aizu
• Symmetric Channel
Email: hamada@u-aizu.ac.jp
URL: http://www.u-aizu.ac.jp/~hamada
• Mutual Information
• Channel Capacity

Digital Communication Systems Digital Communication Systems


1. Huffman Code.
1. Memoryless
2. Two-pass Huffman Code.
2. Stochastic
3. Lemple-Ziv Code.
3. Markov
Information 4. Fano code.
User of Information 4. Ergodic User of
Source 5. Shannon Code. Information Source Information
6. Arithmetic Code.

Source Source Source Source


Encoder Decoder Encoder Decoder

Channel Channel Channel Channel


Encoder Decoder Encoder Decoder

Modulator De-Modulator Modulator De-Modulator

Channel Channel

2 3

INFORMATION TRANSFER ACROSS CHANNELS Communication Channel

Sent Received
A (discrete ) channel is a system consisting of an input alphabet
messages messages X and output alphabet Y and a probability transition matrix
symbols p(y|x) that expresses the probability of observing the output
symbol y given that we send the symbol x
Source Channel Channel Source
source Channel decoding decoding receiver
coding coding
Examples of channels:

Compression Error Correction Decompression CDs, CD – ROMs, DVDs, phones,

Source Entropy Channel Capacity Ethernet, Video cassettes etc.

Rate vs Distortion Capacity vs Efficiency

4 5

1
Communication Channel Noiseless binary channel
Noiseless binary channel

Channel
Channel
0 0
input x p(y|x) output y

Transition probabilities
1 1
Memoryless:
- output only on input Transition Matrix
- input and output alphabet finite
0 1
p(y | x) = 0 1 0
1 0 1
6 7

Binary Symmetric Channel (BSC) Binary Symmetric Channel (BSC)


(Noisy channel) (Noisy channel)

BSC Channel
0 01
1-p
p 1-p
Error Source
0 0
e BSC Channel
1 10
xi yi = xi  e
p
p 1-p
+
Output 1 1
Input p
1-p 0 0
1-p
1 1
p
BSC Channel

8 9

Symmetric Channel
(Noisy channel)

Channel

X Y

In the transmission matrix of this channel , all the rows are


permutations of each other and so the columns.
Mutual Information
Example: Transition Matrix

y1 y2 y3
x1 0.3 0.2 0.5
p(y | x) = x2 0.5 0.3 0.2
x3 0.2 0.5 0.3

10 11

2
Mutual Information (MI) Mutual Information (MI)
Channel Relationship between Entropy and Mutual Information

X p(y|x) Y I(X,Y)=H(Y)-H(Y|X)
Proof:
p( y | x)
MI is a measure of the amount of information that one I ( X , Y )   p ( x, y ) log 2 ( )
random variable contains about another random variable. x y p( y )
It is the reduction in the uncertainty of one random
I ( X , Y )     p ( x, y ) log 2 ( p ( y ))    p ( x , y ) log 2 ( p ( y | x ))
variable due to the knowledge of the other. x y x y

For the two random variable X and Y with a joint


I ( X ,Y )   p( y) log2 ( p( y))   p( x) p( y | x) log2 ( p( y | x))
probability p(x, y) and marginal probabilities p(x) and p(y).
y x y
The mutual information I(X,Y) is the relative entropy between
= H(Y) – H(Y|X) Note that I(X,Y)  0
the joint distribution and the product distribution p(x) p(y) i.e.
p ( x, y )
I ( X , Y )   p ( x, y ) log 2 ( ) i.e. the reduction in the description length of X given Y
p( x) p( y) 12 or: the amount of information that Y gives about X 13

Mutual Information (MI) Mutual Information with 2 channels


Note That:
Channel 1 Channel 2
• I(X,Y)=H(Y)-H(Y|X)
Y Z
• I(X,Y)=H(X)-H(X|Y) X p(y|x) p(z|y)

• I(X,Y)=H(X)+H(Y)-H(X,Y) Let X, Y and Z form a Markov chain: X  Y  Z


• I(X,Y)=I(Y,X) and Z is independent from X given Y
• I(X,X)=H(X) i.e. p(x,y,z) = p(x) p(y|x) p(z|y)
The amount of information that Y gives about X given Z is:

I(X,Y) H(Y|X) I(X, Y|Z) = H(X|Z) – H(X|YZ)


H(X|Y)

H(X) H(Y) 14 15

Mutual Information with 2 channels


Theory: I(X,Y)  I(X, Z)

(I.e. Multi-channels may destroy information)

Proof: I(X, (Y,Z) ) =H(Y,Z) - H(Y,Z|X)


=H(Y) + H(Z|Y) - H(Y|X) - H(Z|YX)

I(X, (Y,Z) )
= I(X, Y) + I(X; Z|Y)
= H(X) - H(X|YZ)
Channel Capacity
= H(X) - H(X|Z) + H(X|Z) - H(X|YZ)
= I(X, Z) + I(X,Y|Z)
now I(X, Z|Y) = 0 (independency)
Thus: I(X, Y)  I(X, Z)

16 17

3
Transmission efficiency Channel Capacity
The channel capacity C is the highest rate, in bits per channel
I need on the average H(X) bits/source output to describe the
use, at which information can be sent with arbitrarily low
source symbols X After observing Y, I need H(X|Y) bits/source
probability of error
output.
C = Max I(X , Y)
Channel p(x)

H(X) X Y H(X|Y) Where the maximum is taken over all possible input
distributions p(x)

Note that :
Reduction in description length is called the transmitted
• During data compression, we remove all redundancy in the
Information.
data to form the most compressed version possible.
Transmitted R = I(X, Y) = H(X) - H(X|Y)
• During data transmission, we add redundancy in a controlled
= H(Y) – H(Y|X) from earlier calculations
fashion to combat errors is the channel
We can maximize R by changing the input probabilities. Capacity depends on input probabilities
The maximum is called CAPACITY 18 because the transition probabilites are fixed 19

Channel Capacity Channel Capacity


Example 1: Noiseless binary channel Example 2: (Noisy) Binary symmetric channel

Channel If P is the probability of an error ( noise )


Probability
Transition Matrix p
1/2 0 0
0 1 0 0
1-p Transition Matrix
p(y|x) = 0 1 0 1 1
p 0 1
1 0 1 BSC Channel p(y|x)= 0 p 1-p
1/2 1 1 Here the mutual information 1 1-p p
I(X,Y) = H(Y) - H(Y | X) = H(Y) - ∑x p(x) H(Y | X=x)
Here the entropy is:
= H(Y) - ∑x p(x) H(P) = H(Y) - H(P) ∑x p(x) (Note that ∑p(x)
x
= 1)

H(X) = H(p, 1-p)= H(1/2,1/2) = -1/2 log 1/2 - 1/2 log 1/2 = 1 bit = H(Y) - H(P) (Note that H(Y) ≤ 1 )

≤ 1 - H(P) I(X,Y) ≤ 1 – H(P)


H(X|Y) = -1 log 1 - 0 log 0 - 0 log 0 - 1 log 1 = 0 bit
Hence the information capacity of the BSC channel is
I(X, Y) = H(X) - H(X | Y) = 1 - 0 = 1 bit (maximum information )
20 C = 1 - H (P) bits 21

Channel Capacity Channel


Channel Capacity
Example 3: (Noisy) Symmetric channel
X Y (Noisy) Symmetric channel
The capacity of such channel can be found as follows:
Channel
let r be a row of the transition matrix =1
I(X, Y) = H(Y) - H(Y | X) = H (Y) - ∑x p(x) H(Y | X = x)
X Y
= H(Y) - ∑x p(x) H(r) = H(Y) - H(r) ∑x p(x) = H(Y) - H(r)
≤ log |Y | - H (r) Y is the output alphabet
Hence the capacity C of such channel is: C = max I(X; Y) = log |Y | - H (r)
Note that: In general for symmetric channel the capacity C is
For example: Consider a symmetric channel with the following transmission matrix
It is clear that such channel is symmetric y1 y2 y3 C = log |Y | - H (r)
since rows (and columns) are permutations. x1 0.3 0.2 0.5
Hence the capacity C of this channel is: P (y|x) = x2 0.5 0.3 0.2 Where Y is the output alphabet and r is any row in the
x3 0.2 0.5 0.3 transmission matrix.
C = log 3 - H(0.2, 0.3, 0.5) = 0.1 bits (approx.)
Transition Matrix 23
22

You might also like