Professional Documents
Culture Documents
Jasim
1-Information Theory and Coding
Concept of Probability : If an experiment has A1, A2, A3… An outcomes then:
Joint Probability: If we have two experiments A & B, experiment A has A1, A2, A3…An
outcomes and experiment B has B1, B2, B3…Bn outcomes
P(Ai , Bj ) Joint probability that event Ai occurs from experiment A and event Bj from experiment
B. note that:
Note that
Example: From the previous example, assume that the 21st letter is A in order to have 20 pairs.
Then
1
Or in matrix form
Conditional Probability: Two experiments A & B with their outcomes affect on each other.
P(Ai/Bj) = Conditional probability of Ai given that Bj has already occurred in experiment
P(Bj /Ai)= Conditional probability of Bj given that Ai has already occurred in experiment
Notr that: P(Ai /Bj)= P(Ai) . P(Bj /Ai )
P(Ai /Bj)= P(Bj) . P(Ai /Bj )
Both P(Ai /Bj) and P(Ai /Bj) can be written in matrix form
Note that
Example: From the previous example, we have
on matrix form:
Statistical Independence: If Ai has no effect on the probability of Bj then they are called
independent and P(Ai /Bj )= P(Ai) and P(Bj /Ai )= P(Bj) and P(Ai /Bj)= P(Ai) . P(Bj)
Example: The two experiments A & B have the joint probability matrix:
Self-Information : Suppose that the source of information produces finite set of messages x1, x2,
x3,... xn With probabilities p(x1), p(x2), p(x3), ...... p(xn) such that ∑
The amount of information gained from knowing that the source produces the symbol (xi) is
related with p (xi) as follows:
3
1) Information is zero if p (xi) =1 (certain event).
2) Information increases as p (xi) decreases to zero.
3) Information is a positive quantity.
The function that relates p(xi) with information of xi is : I(xi)= - Loga P(xi)
The units of I (xi) depends on (a).
i. if a=2, then I(xi) has the units of bits .
ii. if a=e=2.718, I(xi) has the units of nat .
iii. if a=10, I(xi) has the units of hartly .
Source Entropy H(x) : If I(xi), i=1,2,3…n are different for the source producing unequal
probability symbols, then the statistical average of I(xi) will give the average amount of
uncertainty associated with the source x. This average is called source entropy and is denoted by
H(x). This is given by:
Example: Find the entropy of the source producing the symbols: p(x) = [0.25 0.1 0.15 0.5]
Solution:
Source Entropy Rate R(X): It is the average amount of information produced per second.
4
Example: A source produces dots and dashes with p (dot) = 0.65, If time duration of a dot is
200 msec and that for a dash is 800 msec. find the average source entropy rate
Solution:
Mutual Information I(xi , yj) :Consider the source produce the set of symbols x1,x2 , x3,….. xn
, The receiver receive y1, y2 ,…. ym. Theoretically, if the noise is zero then the set x = set y and
n=m, there will be a conditional probability p (yi / xi).The amount of information that yj
provides about xi is called mutual information between xi and yj.
Average Mutual Information I(X,Y): This is the statistical average of all the pairs
5
Marginal Entropies H(y): A term usually used to denote both source entropy H(x) defined as
before and the receiver entropy H(y)
Joint and Conditional Entropies H(X, Y), H(Y/X) , H(X/Y): The average amount of
information associated with the pair (xi, yj) is called joint (system) entropy H(X, Y):
∑ ∑ ( ) (Bits/symbol)
The average amount of information associated with the pairs (y j / xi) and (xi / yj) are called
conditional entropies; H(Y/X) and H(X/Y).they are given by:
⁄ ∑ ∑ ( ) ⁄ (Bits/symbol) which is defined as noise entropy
⁄ ∑ ∑ ( ) ⁄ (Bits/symbol) which is defined as losses entropy
∑ ∑ ( ) ( )
∑ ∑ ( ) ∑ ∑ ( ) ⁄
Also we have ∑ ( )
6
Then
∑ ∑ ∑ ( ) ⁄
This TSC is symmetric but not very practical since practically x1 does not affect so much on x3,
i.e., there is no chance that x1 is received as y3 or x3 as y1 , But x2 can effect on both x1 and x3
equally. Hence a non-symmetric but more practical channel is shown below.
9
Lossless Channel: This channel has only nonzero element in each column of p(y/x) matrix
(channel transition matrix). The following channel is an example of this type of channel:
This channel has a losses entropy = zero, i.e. H(X/Y) and I(X, Y) = H(X)
Deterministic Channel: This has only one nonzero element in each row of p(y/x). This has zero
noise entropy H(Y/X) and I(X,Y)=H(X), The following channel is an example of this type of
channel.
10
Noiseless Channel: This channel has n = m and p(y/x) is an identity matrix.
The maximization of I(X,Y) is done with respect to input or output probability with channel
transition probability, p(y/x), being a constant.
11
Example: find the channel capacity for the BSC shown
Solution:
12
Cascading of Channels: If two channels are cascaded, then the overall transition matrix is the
product of the two transition matrices
Example: find the transition matrix P( z / x) for the cascaded channel shown:
Source Coding: A discrete source is that source that produces finite set of messages x1, x2, x3,
....xn with probabilities p(x1), p(x2), p(x3),..... p(xn) .
A source coder will transform each message into a finite sequence of digits, called codeword of
the message. If binary digits are used in this codeword, then we obtain what is called binary
Source Coding. Ternary source coding is also possible.
13
For a set of symbols represented by binary code words with lengths lk (binary) digits
The overall code length, Lc, can be defined as the average codeword length,
Coding Mathods
1- Shannon Code: For messages x1, x2, x3,… xn with probabilities P(x1), P(x2), P(x3),…P(xn) .
Example: Develop the Shannon code for the following set of messages(x) = [0.3 0.2 0.15 0.12 0.1,
0.08 0.05] then find: a. Code efficiency, b. P (0) at the encoder output
Solution:
1- In encoding, messages must be arranged in a decreasing order of probabilities.
2-
3-
4-
14
Example: Repeat the previous example using ternary coding.
Solution:
15
2-Shannon - Fano Code (Fano Code)
1) Arrange messages in decreasing order of probabilities
2) Divide the set into r (depending on type of coding) groups, with approximately equal
probabilities in each group.
3) Assign different code alphabet to each group.
4) Repeat steps (2&3) for each group many times until all the messages are separated.
r = 2 for binary coding while r = 3 for ternary coding
Example: Develop the Shannon - Fano code for the following set of messages,
P(x) = [ 0.35 0.2 0.15 0.12 0.1 0.08 ] then find the code Efficiency.
Solution: r = 2
16
Example: Repeat the previous example using with r =3
Solution:
17
3-Huffman Code: The Huffman coding algorithm comprises two steps, reduction and splitting.
1) Reduction
a) List the symbols in descending order of probability.
b) Reduce the r least probable symbols to one symbol with a probability equal to their combined
probability
c) Reorder in descending order of probability at each stage.
d) Repeat the reduction step until only two symbols remain.
2) Splitting
a) Assign r(1,0) r to the r(0,1,2) final symbols and work backwards.
b) Expand or lengthen the code to cope with each successive split
Note: The condition that the number of symbols n so that we can decode them using r (2 or 3 bit)
Huffman coding is [(n-r) / (r-1)] must be an integer value, otherwise, add a redundant symbols
with a probabilities equal to zero so that the condition is satisfied.
Example: Develop the Huffman code for the following set of symbols
Solution:
18
So we obtain the following code
Channel Coding: The purpose of channel coding is error detection and correction, i.e., protection
data from channel noise and distortion
Error Correcting Codes: These codes are used to detect and correct errors.
Linear Block Codes (LBC)
19
Note: If information bits are separated from the correction bits then the code is called systematic.
Otherwise, it is called non-systematic. For example the code C = [d1 d2 p1 d3 p2 d4] is an (6,4)
non-systematic code.
Generation of LBC : The correction (parity, check) bits [p1 p2 p3…pr] are generated from
20
Example: A linear block code has a parity check matrix
Where ωi = Hamming weight = Number of 1’s in the output code word C for all nonzero output.
ωi = minimum Hamming weight = 3
21
Hamming Distance: The hamming distance between two code words Ci and Cj of the same length
n is the number of corresponding bits that are different. This is denoted by dij
Minimum Hamming Distance: If the code consists of words C1, C2 , C3 , ….Cn then
Minimum Hamming Distance = H . D min = (dij)min
Example: C1=[10110], C2=[00101] , C3=[10001] ,and C4 =[11010] then
D12=3, d13=3, d14=2, d23=2, d24=5, and d34=3. So H. Dmin =2
Generation of LBC Using Generator Matrix: If D=[d1 d2 d3 ….dk] information bits then the
output systematic LBC can be generated using the following matrix equation : C = DG
Where G= generator matrix having the form
22
G can be found from H and H can found from G , H= [ Prxk Irxr ], Pkxr = (Prxk )T
Example:
Where S is called the syndrome used to locate the error. The error detection and correction
capability of a LBC is:
S can be used directly to correct most probable error by comparing S with all columns of the H
matrix and that column similar to S is the position of the error.
Considerations in Choosing H Matrix
1) No repeated columns.
2) No zero columns.
23
Example: The generator matrix of a LBC is given by:
Find:
a) Code table, (ωi )min , error detection and correction capabilities.
b) Syndrome for single error in last position.
c) Syndrome for double errors in first and last positions.
d) If the received word is R= [0101010] , find the word sent
e) Design a logic circuit for generation of syndrome vector.
Solution:
a) D = [d1 d2 d3 d4]
24
Which equals the 3rd column in the H matrix, So the 3rd bit is in error and the correct word is [
0111010 ]
e) R= [ r1 r2 r3 r4 r5 r6 r7 ]
S=HRT
25
Cyclic Codes: This is a subclass from the LBC, it is called cyclic, since any cyclic shift of a code
word gives another code word. So cyclic code is a special type of LBC
1) Non-systematic Cyclic Code: This code is generated by multiplication of data polynomial D(x)
by generator polynomial g(x). To find D(x) from the data word D=[a 1 a2 a3..ak] then D(x)=ak+ak-
2 k-1
1x+ak-2x +..+a1x
Example:
The generator polynomial(x) is a primitive polynomial of order r = n - k obtained from tables with
different orders.
For example, if r = 3 the following two generator polynomials are given
g(x) = 1+x+x3 , g(x)= 1+x2+x3
The output code polynomial (non-systematic) is C(x) = D(x). g(x)
Example: Find the code table for (7, 4) non-systematic cyclic code for the generator polynomial
g(x) = 1+x+x3
Solution:
26
2) Systematic Cyclic Code: D Again is converted into D(x) and the g(x) polynomial is used but in
a different procedure:
27
Example: Find the code table for (7,4 ) systematic cyclic code for the generator polynomial
g(x)= 1+x2+x3
Solution:
28
Implementation of Systematic Cyclic Encoder: This requires r bit shift register. The
implementation depends on the generator polynomial: g(x) =g0+g1 x+g2 x2+…+gr x r , where g0,
g1, ,g2, gr binary variable
Note: g0= gr =1 , the implementation in general is shown below
Switch S At position (1) for k clock pulses ( Z= 1 ), At position (2) for r clock pulses ( Z=0 ), The
shift register is initially loaded with zeros.
Example: let g(x) = 1+x2+x3, r =3bits, g0=1, g1=0, g2=1 and g3=1
29
Decoding of Systematic Cyclic Code: If the received word is: R= C +E , where C is the
transmitted word and E is the error word dividing by g(x) and taking the reminder to get
To find E(x), the receiver first prepare a syndrome table starting from high probable errors (single
errors) and ending with the number of bits that can be correcting using the equation
From which and using the syndrome table prepared then E(x) can be found
Example: find the syndrome table for (7, 4) systematic cyclic code with generator polynomial g(x)
= 1+x2+x3, then find the corrected word at the receiver if the received word is [1100110]
Solution:
30
If the recived word is R = [ 1100110] then the receiver will find
Implementation of Systematic Cyclic Decoder: to generate S from R then the following logic
circuit is used.
Z= 1 for n clock pulses, Z= 0 for r clock pulses
For g(x) =1+x2 +x3
31