Information Theory: Today's Topics

Today’s Topics
Information Theory • Communication Channel

• Noiseless binary channel
Mohamed Hamada • Binary Symmetric Channel (BSC)
Software Engineering Lab
The University of Aizu
• Symmetric Channel
Email: hamada@u-aizu.ac.jp
URL: http://www.u-aizu.ac.jp/~hamada
• Mutual Information
• Channel Capacity
Digital Communication Systems Digital Communication Systems

1. Huffman Code．
1. Memoryless
2. Two-pass Huffman Code．
2. Stochastic
3. Lemple-Ziv Code．
3. Markov
Information 4. Fano code．
User of Information 4. Ergodic User of
Source 5. Shannon Code． Information Source Information
6. Arithmetic Code．
Source Source Source Source

Encoder Decoder Encoder Decoder
Channel Channel Channel Channel

Encoder Decoder Encoder Decoder
Modulator De-Modulator Modulator De-Modulator
Channel Channel
2 3
INFORMATION TRANSFER ACROSS CHANNELS Communication Channel
Sent Received
A (discrete ) channel is a system consisting of an input alphabet
messages messages X and output alphabet Y and a probability transition matrix
symbols p(y|x) that expresses the probability of observing the output
symbol y given that we send the symbol x
Source Channel Channel Source
source Channel decoding decoding receiver
coding coding
Examples of channels:
Compression Error Correction Decompression CDs, CD – ROMs, DVDs, phones,
Source Entropy Channel Capacity Ethernet, Video cassettes etc.
Rate vs Distortion Capacity vs Efficiency
4 5
1
Communication Channel Noiseless binary channel
Noiseless binary channel
Channel
Channel
0 0
input x p(y|x) output y
Transition probabilities
1 1
Memoryless:
- output only on input Transition Matrix
- input and output alphabet finite
0 1
p(y | x) = 0 1 0
1 0 1
6 7
Binary Symmetric Channel (BSC) Binary Symmetric Channel (BSC)

(Noisy channel) (Noisy channel)
BSC Channel
0 01
1-p
p 1-p
Error Source
0 0
e BSC Channel
1 10
xi yi = xi  e
p
p 1-p
+
Output 1 1
Input p
1-p 0 0
1-p
1 1
p
BSC Channel
8 9
Symmetric Channel
(Noisy channel)
Channel
X Y
In the transmission matrix of this channel , all the rows are

permutations of each other and so the columns.
Mutual Information
Example: Transition Matrix
y1 y2 y3
x1 0.3 0.2 0.5
p(y | x) = x2 0.5 0.3 0.2
x3 0.2 0.5 0.3
10 11
2
Mutual Information (MI) Mutual Information (MI)
Channel Relationship between Entropy and Mutual Information
X p(y|x) Y I(X,Y)=H(Y)-H(Y|X)
Proof:
p( y | x)
MI is a measure of the amount of information that one I ( X , Y )   p ( x, y ) log 2 ( )
random variable contains about another random variable. x y p( y )
It is the reduction in the uncertainty of one random
I ( X , Y )     p ( x, y ) log 2 ( p ( y ))    p ( x , y ) log 2 ( p ( y | x ))
variable due to the knowledge of the other. x y x y
For the two random variable X and Y with a joint

I ( X ,Y )   p( y) log2 ( p( y))   p( x) p( y | x) log2 ( p( y | x))
probability p(x, y) and marginal probabilities p(x) and p(y).
y x y
The mutual information I(X,Y) is the relative entropy between
= H(Y) – H(Y|X) Note that I(X,Y)  0
the joint distribution and the product distribution p(x) p(y) i.e.
p ( x, y )
I ( X , Y )   p ( x, y ) log 2 ( ) i.e. the reduction in the description length of X given Y
p( x) p( y) 12 or: the amount of information that Y gives about X 13
Mutual Information (MI) Mutual Information with 2 channels

Note That:
Channel 1 Channel 2
• I(X,Y)=H(Y)-H(Y|X)
Y Z
• I(X,Y)=H(X)-H(X|Y) X p(y|x) p(z|y)
• I(X,Y)=H(X)+H(Y)-H(X,Y) Let X, Y and Z form a Markov chain: X  Y  Z

• I(X,Y)=I(Y,X) and Z is independent from X given Y
• I(X,X)=H(X) i.e. p(x,y,z) = p(x) p(y|x) p(z|y)
The amount of information that Y gives about X given Z is:
I(X,Y) H(Y|X) I(X, Y|Z) = H(X|Z) – H(X|YZ)

H(X|Y)
H(X) H(Y) 14 15
Mutual Information with 2 channels

Theory: I(X,Y)  I(X, Z)
(I.e. Multi-channels may destroy information)
Proof: I(X, (Y,Z) ) =H(Y,Z) - H(Y,Z|X)

=H(Y) + H(Z|Y) - H(Y|X) - H(Z|YX)
I(X, (Y,Z) )
= I(X, Y) + I(X; Z|Y)
= H(X) - H(X|YZ)
Channel Capacity
= H(X) - H(X|Z) + H(X|Z) - H(X|YZ)
= I(X, Z) + I(X,Y|Z)
now I(X, Z|Y) = 0 (independency)
Thus: I(X, Y)  I(X, Z)
16 17
3
Transmission efficiency Channel Capacity
The channel capacity C is the highest rate, in bits per channel
I need on the average H(X) bits/source output to describe the
use, at which information can be sent with arbitrarily low
source symbols X After observing Y, I need H(X|Y) bits/source
probability of error
output.
C = Max I(X , Y)
Channel p(x)
H(X) X Y H(X|Y) Where the maximum is taken over all possible input
distributions p(x)
Note that :
Reduction in description length is called the transmitted
• During data compression, we remove all redundancy in the
Information.
data to form the most compressed version possible.
Transmitted R = I(X, Y) = H(X) - H(X|Y)
• During data transmission, we add redundancy in a controlled
= H(Y) – H(Y|X) from earlier calculations
fashion to combat errors is the channel
We can maximize R by changing the input probabilities. Capacity depends on input probabilities
The maximum is called CAPACITY 18 because the transition probabilites are fixed 19
Channel Capacity Channel Capacity

Example 1: Noiseless binary channel Example 2: (Noisy) Binary symmetric channel
Channel If P is the probability of an error ( noise )

Probability
Transition Matrix p
1/2 0 0
0 1 0 0
1-p Transition Matrix
p(y|x) = 0 1 0 1 1
p 0 1
1 0 1 BSC Channel p(y|x)= 0 p 1-p
1/2 1 1 Here the mutual information 1 1-p p
I(X,Y) = H(Y) - H(Y | X) = H(Y) - ∑x p(x) H(Y | X=x)
Here the entropy is:
= H(Y) - ∑x p(x) H(P) = H(Y) - H(P) ∑x p(x) (Note that ∑p(x)
x
= 1)
H(X) = H(p, 1-p)= H(1/2,1/2) = -1/2 log 1/2 - 1/2 log 1/2 = 1 bit = H(Y) - H(P) (Note that H(Y) ≤ 1 )
≤ 1 - H(P) I(X,Y) ≤ 1 – H(P)

H(X|Y) = -1 log 1 - 0 log 0 - 0 log 0 - 1 log 1 = 0 bit
Hence the information capacity of the BSC channel is
I(X, Y) = H(X) - H(X | Y) = 1 - 0 = 1 bit (maximum information )
20 C = 1 - H (P) bits 21
Channel Capacity Channel

Channel Capacity
Example 3: (Noisy) Symmetric channel
X Y (Noisy) Symmetric channel
The capacity of such channel can be found as follows:
Channel
let r be a row of the transition matrix =1
I(X, Y) = H(Y) - H(Y | X) = H (Y) - ∑x p(x) H(Y | X = x)
X Y
= H(Y) - ∑x p(x) H(r) = H(Y) - H(r) ∑x p(x) = H(Y) - H(r)
≤ log |Y | - H (r) Y is the output alphabet
Hence the capacity C of such channel is: C = max I(X; Y) = log |Y | - H (r)
Note that: In general for symmetric channel the capacity C is
For example: Consider a symmetric channel with the following transmission matrix
It is clear that such channel is symmetric y1 y2 y3 C = log |Y | - H (r)
since rows (and columns) are permutations. x1 0.3 0.2 0.5
Hence the capacity C of this channel is: P (y|x) = x2 0.5 0.3 0.2 Where Y is the output alphabet and r is any row in the
x3 0.2 0.5 0.3 transmission matrix.
C = log 3 - H(0.2, 0.3, 0.5) = 0.1 bits (approx.)
Transition Matrix 23
22

Information Theory: Today's Topics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information Theory: Today's Topics

Uploaded by

Copyright:

Available Formats

Today’s Topics

Information Theory • Communication Channel

Digital Communication Systems Digital Communication Systems

Source Source Source Source

Channel Channel Channel Channel

Modulator De-Modulator Modulator De-Modulator

INFORMATION TRANSFER ACROSS CHANNELS Communication Channel

Compression Error Correction Decompression CDs, CD – ROMs, DVDs, phones,

Source Entropy Channel Capacity Ethernet, Video cassettes etc.

Rate vs Distortion Capacity vs Efficiency

Binary Symmetric Channel (BSC) Binary Symmetric Channel (BSC)

In the transmission matrix of this channel , all the rows are

For the two random variable X and Y with a joint

Mutual Information (MI) Mutual Information with 2 channels

• I(X,Y)=H(X)+H(Y)-H(X,Y) Let X, Y and Z form a Markov chain: X  Y  Z

I(X,Y) H(Y|X) I(X, Y|Z) = H(X|Z) – H(X|YZ)

Mutual Information with 2 channels

(I.e. Multi-channels may destroy information)

Proof: I(X, (Y,Z) ) =H(Y,Z) - H(Y,Z|X)

Channel Capacity Channel Capacity

Channel If P is the probability of an error ( noise )

≤ 1 - H(P) I(X,Y) ≤ 1 – H(P)

Channel Capacity Channel

You might also like