1-Information Theory-2021

Communication2-Fourth year Assis.prof. Dr. Manal H.
Jasim
1-Information Theory and Coding
Concept of Probability : If an experiment has A1, A2, A3… An outcomes then:
Where n(Ai ):Number of times event Ai occurs if the experiment is repeated N

times. Note that:
If p(Ai )=1 then Ai is a certain event

Example : A simple language consists only two symbols A and B produced in a long continuous
sequence. BAAABBBAAA AABBBAAAAB
Joint Probability: If we have two experiments A & B, experiment A has A1, A2, A3…An
outcomes and experiment B has B1, B2, B3…Bn outcomes
P(Ai , Bj ) Joint probability that event Ai occurs from experiment A and event Bj from experiment
B. note that:
p(Ai , Bj ) is written in matrix form as follows
Note that
Example: From the previous example, assume that the 21st letter is A in order to have 20 pairs.
Then
1
Or in matrix form
Conditional Probability: Two experiments A & B with their outcomes affect on each other.
P(Ai/Bj) = Conditional probability of Ai given that Bj has already occurred in experiment
P(Bj /Ai)= Conditional probability of Bj given that Ai has already occurred in experiment
Notr that: P(Ai /Bj)= P(Ai) . P(Bj /Ai )
P(Ai /Bj)= P(Bj) . P(Ai /Bj )
Both P(Ai /Bj) and P(Ai /Bj) can be written in matrix form
Note that
Example: From the previous example, we have
on matrix form:
Statistical Independence: If Ai has no effect on the probability of Bj then they are called
independent and P(Ai /Bj )= P(Ai) and P(Bj /Ai )= P(Bj) and P(Ai /Bj)= P(Ai) . P(Bj)
Example: The two experiments A & B have the joint probability matrix:
find: p(A), p(B), p(A/B), p(B/A)

Solution:
2
Information Theory: This subject deals with information and data transmission from one point to
another. The block diagram of any communication system is as shown below:
Self-Information : Suppose that the source of information produces finite set of messages x1, x2,
x3,... xn With probabilities p(x1), p(x2), p(x3), ...... p(xn) such that ∑
The amount of information gained from knowing that the source produces the symbol (xi) is
related with p (xi) as follows:
3
1) Information is zero if p (xi) =1 (certain event).
2) Information increases as p (xi) decreases to zero.
3) Information is a positive quantity.
The function that relates p(xi) with information of xi is : I(xi)= - Loga P(xi)
The units of I (xi) depends on (a).
i. if a=2, then I(xi) has the units of bits .
ii. if a=e=2.718, I(xi) has the units of nat .
iii. if a=10, I(xi) has the units of hartly .
Source Entropy H(x) : If I(xi), i=1,2,3…n are different for the source producing unequal
probability symbols, then the statistical average of I(xi) will give the average amount of
uncertainty associated with the source x. This average is called source entropy and is denoted by
H(x). This is given by:
Example: Find the entropy of the source producing the symbols: p(x) = [0.25 0.1 0.15 0.5]
Solution:
Source Entropy Rate R(X): It is the average amount of information produced per second.
4
Example: A source produces dots and dashes with p (dot) = 0.65, If time duration of a dot is
200 msec and that for a dash is 800 msec. find the average source entropy rate
Solution:
Mutual Information I(xi , yj) :Consider the source produce the set of symbols x1,x2 , x3,….. xn
, The receiver receive y1, y2 ,…. ym. Theoretically, if the noise is zero then the set x = set y and
n=m, there will be a conditional probability p (yi / xi).The amount of information that yj
provides about xi is called mutual information between xi and yj.
i.e., the mutual information is symmetric.
Average Mutual Information I(X,Y): This is the statistical average of all the pairs
5
Marginal Entropies H(y): A term usually used to denote both source entropy H(x) defined as
before and the receiver entropy H(y)
Joint and Conditional Entropies H(X, Y), H(Y/X) , H(X/Y): The average amount of
information associated with the pair (xi, yj) is called joint (system) entropy H(X, Y):
∑ ∑ ( ) (Bits/symbol)
The average amount of information associated with the pairs (y j / xi) and (xi / yj) are called
conditional entropies; H(Y/X) and H(X/Y).they are given by:
⁄ ∑ ∑ ( ) ⁄ (Bits/symbol) which is defined as noise entropy
⁄ ∑ ∑ ( ) ⁄ (Bits/symbol) which is defined as losses entropy
Example: Show that H(X, Y) = H(X) + H(Y/X)

Solution: we know that:
∑ ∑ ( ) ( )
But p ( xi ,yj )= p(xi). p( yj /xi)

so we get
∑ ∑ ( ) ∑ ∑ ( ) ⁄
Also we have ∑ ( )
6
Then
∑ ∑ ∑ ( ) ⁄
So we get H(X, Y) = H(X) + H(Y/X)
H.W: Show that H(X, Y) =H(Y) +H(X/Y)
Example: Show that I(X, Y) = H(X) - H(X/Y)

Solution: we know that:
So I(X,Y) = H(X) - H(X/Y)

Example: The joint probability is given by:
1) Find the marginal entropies.

2) Find the joint (system) entropy.
3) Find the noise and losses entropies.
4) Find the mutual information between x1 and y2.
5) Find the average mutual information.
6) Draw the channel model.
Solution
7
8
Example: find the average mutual information for the Binary Symmetric Channel (BSC) if
Solution: the BSC is a well-known channel (practical values of Pe <<1)

Assume that 0T = x1, 1T = x2, 0R= y1, 1R= y2
Ternary Symmetric Channel (TSC)
This TSC is symmetric but not very practical since practically x1 does not affect so much on x3,
i.e., there is no chance that x1 is received as y3 or x3 as y1 , But x2 can effect on both x1 and x3
equally. Hence a non-symmetric but more practical channel is shown below.
9
Lossless Channel: This channel has only nonzero element in each column of p(y/x) matrix
(channel transition matrix). The following channel is an example of this type of channel:
This channel has a losses entropy = zero, i.e. H(X/Y) and I(X, Y) = H(X)
Deterministic Channel: This has only one nonzero element in each row of p(y/x). This has zero
noise entropy H(Y/X) and I(X,Y)=H(X), The following channel is an example of this type of
channel.
10
Noiseless Channel: This channel has n = m and p(y/x) is an identity matrix.
Here again: I(X, Y) = H(X) = H(Y) and H(Y/X) = H(X/Y)=0

It worth to note that the noiseless channel is a lossless and deterministic channel
Channel Capacity (C): This is defined as the maximum value of I(X,Y)
Where R(X,Y) is the rate of information transmitted
The maximization of I(X,Y) is done with respect to input or output probability with channel
transition probability, p(y/x), being a constant.
Channel Capacity of Discrete Symmetric Channels: A general definition of a symmetric channel

is that n = m and each row of p(y/x) is a permutation of the other rows.
Examples:
11
Example: find the channel capacity for the BSC shown
Solution:
12
Cascading of Channels: If two channels are cascaded, then the overall transition matrix is the
product of the two transition matrices
Example: find the transition matrix P( z / x) for the cascaded channel shown:
Source Coding: A discrete source is that source that produces finite set of messages x1, x2, x3,
....xn with probabilities p(x1), p(x2), p(x3),..... p(xn) .
A source coder will transform each message into a finite sequence of digits, called codeword of
the message. If binary digits are used in this codeword, then we obtain what is called binary
Source Coding. Ternary source coding is also possible.
13
For a set of symbols represented by binary code words with lengths lk (binary) digits
The overall code length, Lc, can be defined as the average codeword length,
The Information / message at the source output is H(X) bits/message

So if the source produces M messages.
Then the amount of information produced at the source output = M* H(X) bits .
And at the source encoder output = M* Lc bits.
So the efficiency of the code is:

Fixed Length Codes : This is used when the source produces almost equiprobable messages:
Then for binary coding:
Coding Mathods
1- Shannon Code: For messages x1, x2, x3,… xn with probabilities P(x1), P(x2), P(x3),…P(xn) .
Example: Develop the Shannon code for the following set of messages(x) = [0.3 0.2 0.15 0.12 0.1,
0.08 0.05] then find: a. Code efficiency, b. P (0) at the encoder output
Solution:
1- In encoding, messages must be arranged in a decreasing order of probabilities.
2-
3-
4-
14
Example: Repeat the previous example using ternary coding.
Solution:
15
2-Shannon - Fano Code (Fano Code)
1) Arrange messages in decreasing order of probabilities
2) Divide the set into r (depending on type of coding) groups, with approximately equal
probabilities in each group.
3) Assign different code alphabet to each group.
4) Repeat steps (2&3) for each group many times until all the messages are separated.
r = 2 for binary coding while r = 3 for ternary coding
Example: Develop the Shannon - Fano code for the following set of messages,
P(x) = [ 0.35 0.2 0.15 0.12 0.1 0.08 ] then find the code Efficiency.
Solution: r = 2
16
Example: Repeat the previous example using with r =3
Solution:
17
3-Huffman Code: The Huffman coding algorithm comprises two steps, reduction and splitting.
1) Reduction
a) List the symbols in descending order of probability.
b) Reduce the r least probable symbols to one symbol with a probability equal to their combined
probability
c) Reorder in descending order of probability at each stage.
d) Repeat the reduction step until only two symbols remain.
2) Splitting
a) Assign r(1,0) r to the r(0,1,2) final symbols and work backwards.
b) Expand or lengthen the code to cope with each successive split
Note: The condition that the number of symbols n so that we can decode them using r (2 or 3 bit)
Huffman coding is [(n-r) / (r-1)] must be an integer value, otherwise, add a redundant symbols
with a probabilities equal to zero so that the condition is satisfied.
Example: Develop the Huffman code for the following set of symbols
Solution:
18
So we obtain the following code
Channel Coding: The purpose of channel coding is error detection and correction, i.e., protection
data from channel noise and distortion
Error Correcting Codes: These codes are used to detect and correct errors.
Linear Block Codes (LBC)
At the input D= [d1 d2 d3 ….dk ] is an information code word .

For systematic code, at the output we have
19
Note: If information bits are separated from the correction bits then the code is called systematic.
Otherwise, it is called non-systematic. For example the code C = [d1 d2 p1 d3 p2 d4] is an (6,4)
non-systematic code.
Generation of LBC : The correction (parity, check) bits [p1 p2 p3…pr] are generated from
D=[d1 d2 d3..dk] using the following equation:

Where C=[d1 d2 d3…dk p1 p2 p3…pr] and H is called the parity heck matrix of the code
having the form
Hij are binary elements of 0 and 1 , Now HCT = 0 will give
20
Example: A linear block code has a parity check matrix
a) Find the channel coding efficiency.

b) Find the correction bits in terms of information bits.
c) Find the code table of the code.
d) Design a logic circuit for the encoder.
Solution:
Where ωi = Hamming weight = Number of 1’s in the output code word C for all nonzero output.
ωi = minimum Hamming weight = 3
21
Hamming Distance: The hamming distance between two code words Ci and Cj of the same length
n is the number of corresponding bits that are different. This is denoted by dij
Minimum Hamming Distance: If the code consists of words C1, C2 , C3 , ….Cn then
Minimum Hamming Distance = H . D min = (dij)min
Example: C1=[10110], C2=[00101] , C3=[10001] ,and C4 =[11010] then
D12=3, d13=3, d14=2, d23=2, d24=5, and d34=3. So H. Dmin =2
Generation of LBC Using Generator Matrix: If D=[d1 d2 d3 ….dk] information bits then the
output systematic LBC can be generated using the following matrix equation : C = DG
Where G= generator matrix having the form
22
G can be found from H and H can found from G , H= [ Prxk Irxr ], Pkxr = (Prxk )T
Example:
n=6, k=3, r=3

Example:
Decoding of LBC: if the received word is: R=C+E …………….1

Where C = Transmitted word, E = Error word
Decoding is done using H matrix. Multiplying Equation (1) by H, we get
Where S is called the syndrome used to locate the error. The error detection and correction
capability of a LBC is:
S can be used directly to correct most probable error by comparing S with all columns of the H
matrix and that column similar to S is the position of the error.
Considerations in Choosing H Matrix
1) No repeated columns.
2) No zero columns.
23
Example: The generator matrix of a LBC is given by:
Find:
a) Code table, (ωi )min , error detection and correction capabilities.
b) Syndrome for single error in last position.
c) Syndrome for double errors in first and last positions.
d) If the received word is R= [0101010] , find the word sent
e) Design a logic circuit for generation of syndrome vector.
Solution:
a) D = [d1 d2 d3 d4]
C = [d1 d2 d3 d4 d1+d3+d4 d2+d3+d4 d1+d2+d4]
24
Which equals the 3rd column in the H matrix, So the 3rd bit is in error and the correct word is [
0111010 ]
e) R= [ r1 r2 r3 r4 r5 r6 r7 ]
S=HRT
S1= r1+r3+r4+r5 S2=r2+r3+r4+r6 S3=r1+r2+r4+r7
25
Cyclic Codes: This is a subclass from the LBC, it is called cyclic, since any cyclic shift of a code
word gives another code word. So cyclic code is a special type of LBC
1) Non-systematic Cyclic Code: This code is generated by multiplication of data polynomial D(x)
by generator polynomial g(x). To find D(x) from the data word D=[a 1 a2 a3..ak] then D(x)=ak+ak-
2 k-1
1x+ak-2x +..+a1x
Example:
The generator polynomial(x) is a primitive polynomial of order r = n - k obtained from tables with
different orders.
For example, if r = 3 the following two generator polynomials are given
g(x) = 1+x+x3 , g(x)= 1+x2+x3
The output code polynomial (non-systematic) is C(x) = D(x). g(x)
Example: Find the code table for (7, 4) non-systematic cyclic code for the generator polynomial
g(x) = 1+x+x3
Solution:
26
2) Systematic Cyclic Code: D Again is converted into D(x) and the g(x) polynomial is used but in
a different procedure:
27
Example: Find the code table for (7,4 ) systematic cyclic code for the generator polynomial
g(x)= 1+x2+x3
Solution:
28
Implementation of Systematic Cyclic Encoder: This requires r bit shift register. The
implementation depends on the generator polynomial: g(x) =g0+g1 x+g2 x2+…+gr x r , where g0,
g1, ,g2, gr binary variable
Note: g0= gr =1 , the implementation in general is shown below
Switch S At position (1) for k clock pulses ( Z= 1 ), At position (2) for r clock pulses ( Z=0 ), The
shift register is initially loaded with zeros.
Example: let g(x) = 1+x2+x3, r =3bits, g0=1, g1=0, g2=1 and g3=1
29
Decoding of Systematic Cyclic Code: If the received word is: R= C +E , where C is the
transmitted word and E is the error word dividing by g(x) and taking the reminder to get
Where S(x) is the syndrome polynomial
To find E(x), the receiver first prepare a syndrome table starting from high probable errors (single
errors) and ending with the number of bits that can be correcting using the equation
If the receiver receives R(x) word then S(x) is found from
From which and using the syndrome table prepared then E(x) can be found
Example: find the syndrome table for (7, 4) systematic cyclic code with generator polynomial g(x)
= 1+x2+x3, then find the corrected word at the receiver if the received word is [1100110]
Solution:
30
If the recived word is R = [ 1100110] then the receiver will find
Implementation of Systematic Cyclic Decoder: to generate S from R then the following logic
circuit is used.
Z= 1 for n clock pulses, Z= 0 for r clock pulses
For g(x) =1+x2 +x3
31

1-Information Theory-2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1-Information Theory-2021

Uploaded by

Copyright:

Available Formats

Communication2-Fourth year Assis.prof. Dr. Manal H.

Where n(Ai ):Number of times event Ai occurs if the experiment is repeated N

If p(Ai )=1 then Ai is a certain event

p(Ai , Bj ) is written in matrix form as follows

find: p(A), p(B), p(A/B), p(B/A)

i.e., the mutual information is symmetric.

Example: Show that H(X, Y) = H(X) + H(Y/X)

But p ( xi ,yj )= p(xi). p( yj /xi)

So we get H(X, Y) = H(X) + H(Y/X)

H.W: Show that H(X, Y) =H(Y) +H(X/Y)

Example: Show that I(X, Y) = H(X) - H(X/Y)

So I(X,Y) = H(X) - H(X/Y)

1) Find the marginal entropies.

Solution: the BSC is a well-known channel (practical values of Pe <<1)

Ternary Symmetric Channel (TSC)

Here again: I(X, Y) = H(X) = H(Y) and H(Y/X) = H(X/Y)=0

Channel Capacity (C): This is defined as the maximum value of I(X,Y)

Where R(X,Y) is the rate of information transmitted

Channel Capacity of Discrete Symmetric Channels: A general definition of a symmetric channel

The Information / message at the source output is H(X) bits/message

So the efficiency of the code is:

Then for binary coding:

At the input D= [d1 d2 d3 ….dk ] is an information code word .

D=[d1 d2 d3..dk] using the following equation:

Hij are binary elements of 0 and 1 , Now HCT = 0 will give

a) Find the channel coding efficiency.

n=6, k=3, r=3

Decoding of LBC: if the received word is: R=C+E …………….1

Decoding is done using H matrix. Multiplying Equation (1) by H, we get

C = [d1 d2 d3 d4 d1+d3+d4 d2+d3+d4 d1+d2+d4]

S1= r1+r3+r4+r5 S2=r2+r3+r4+r6 S3=r1+r2+r4+r7

Where S(x) is the syndrome polynomial

If the receiver receives R(x) word then S(x) is found from

You might also like