You are on page 1of 18

Generating code words for groups or

sequences of symbols
A unique identifier or tag is generated for
the sequence to be encoded
This tag is a fraction and it is binary
encoded
The set of tags for representing symbols
are the numbers in the unit interval [0,1)
Number of numbers in the unit interval is
infinite.
Unique tag can be generated for each
sequence.

Cumulative distribution function of the random


variables associated with the source is used for
mapping the sequence to a tag.
Generating the tag:
Example:
Consider a three-letter alphabet A = {a1, a2, a3}
with
P(a1) = 0.7,P(a2) = 0.1, and P(a3) = 0.2.
Fx(1) = 0.7, Fx (2) = 0.8, and Fx(3) = 1. This
partitions
the unit interval as shown in Figure.

0.0

a1

0.7

0.49

0.546

a1

a1

a1

0.49
a2

0.8

0.5558

0.539
a2

0.56
a3

1.0

0.0

0.546
a3
0.70

a2
a3
0.56

a2
0.5572
a3
0.56

The interval in which the tag for a particular


sequence resides is disjoint from all
intervals in which the tag for any other
sequence may reside.
Any number in this interval can be used as
the tag(say, mid point/lower value).
the tag forms a unique representation for
the sequence. This means that the binary
representation of the tag forms a unique
binary code for the sequence.

We can show that for any sequence


x = (x1, x2,.xn)

l ( n ) l ( n 1) (u ( n 1) l ( n 1) ) FX ( xn 1)
u ( n ) l ( n 1) (u ( n 1) l ( n 1) ) FX ( xn )
u (n) l (n)
TX ( x )
2
The only information required by the tag
generation procedure is the cdf of the source,
which can be obtained directly from the probability
model.

Given a DMS A = {a1, a2, a3}


P(a1) = 0.8,
P(a2) =0.02
Obtain tag for a1 a3 a2 a1.
Ans. Tag = 0.772352

P(a3) =0.18

Given Tag Value, Probability model & length


of the block
Tag value = 0.772352

The interval containing this tag value is the


subset of every interval obtained in the
encoding process. While decoding, decode
the elements in such a way that upper and
lower limits always contain this tag value.

1.
2.
3.

4.
5.

Start with l(0) =0, u(0) =1


Let the first element is decoded as x1
Then l(1)= 0 +(1-0)Fx(x1-1)=Fx(x1-1)
u(1) =0 +(1-0)Fx(x1)=Fx(x1)
Find the value of x1 for which this interval
contains the tag value.
If we pick x1=1, the interval is [0,0.8). If
we pick x1=2, the interval is [0.8,0.82),
and if we pick x1=3, the interval is
[0.82,1.0). As 0.77232 lies in the interval
[0.0,0.8), we choose x1=1

Let the second element is x2.


l(2)
=
0 +(0.8-0)Fx(x2-1)=0.8 Fx(x2-1)
u(2)
=
0 +(0.8-0)Fx(x2)=0.8 Fx(x2)
If we pick x2=1, the updated interval is [0, 0.64,),
which does not contain the tag.
If we pick x2=2, the updated interval is [0.64,
0.656), which also does not contain the tag.

If we pick x2=3, the updated interval is [0.656,


0.8) which contains the tag.

Therefore second element is 3.

Let the third element is x3.


l (3)
u (3)

= 0.656 +(0.8-0.656) Fx(x3-1) = 0.656 + 0.144 Fx(x3-1)


= 0.656 +(0.8-0.656) Fx(x3) = 0.656 + 0.144 Fx(x3)

Subtract the value of l

(2)

from both the limits and the tag.

Find the value of x3 for which the interval [0.144 x Fx (x3-1),


0.144 x Fx (x3)) contains 0.772352 - 0.656 (Residual tag)
Divide the residual tag value by 0.144 to get 0.808.
Find the value of x3 for which 0.808 falls in the interval
[Fx(x3-1), Fx(x3) ). It is possible if x3 =2.

Then update the intervals.

Let the fourth element is x4.

(4)

(4)

Subtract the value of l

0.7712 +(0.77408-0.7712) Fx(x4-1) =


0.7712 + 0.00288 Fx(x 4-1)
0.7712 +(0.77408-0.7712) Fx(x4) =
0.7712 + 0.00288 Fx(x 4)
(3)

from both the limits and the tag.

(0.772352 - 0.7712 =0.001152)

Find the value of x4 for which the interval [0. 00288 x Fx (x4-1),
0. 00288 x Fx (x4)) contains 0.001152 (Residual tag)
Divide the residual tag value by 0. 00288 to get 0.4.
Find the value of x4 for which 0.4 falls in the interval
[Fx(x4-1), Fx(x4) ). It is possible if x4 =1.

Data is decoded as 1321 i.e., a1 a3 a2 a1

1.
2.
3.
4.
5.

Initialize l(0) = 0 & u(0) = 1


For each k find t* = (tag - l(k-1))/ (u(k-1)- l(k-1))
Fin the value of xk for which Fx (xk-1)

t*
Fx (xk)
Update u(k) and l(k)
Continue until the entire sequence has been
decoded.

T x( x )

log(1 / p( x)) 1

Eg: A ={a1, a2, a3,a4}


P(a1) = , P(a2) = , P(a3) =

Symbol

Fx

, P(a4) =

Tag

In binary

1
log p( x) 1

Code

0.5

.25

.010

01

0.75

.625

.101

101

0.875

.8125

.1101

1101

1.0

.9375

.1111

1111

Binary code obtained by truncating is uniquely decodable.


Proof: T x( x ) is a unique identifier.
It lies in the interval [Fx(x-1), Fx(x))
We have to show that T ( x) is also contained in this
x
l ( x)
interval
Since we are truncating the binary representation of T x( x )
to T ( x) , T ( x)
is less than or equal to T x( x )

l ( x)

l ( x)

1

0 Tx ( x) Tx ( x) l ( x )

l ( x) 2
Since
< Fx(x), < Fx(x).

T x( x )

Tx ( x)
l ( x)

We have to show that T ( x)


is greater than or equal to
x

l ( x)
Fx(x-1)
1
1

1
2l ( x )
log
1
p( x)

2
1

1
log

p( x)

1
p( x)

1
2
2
p( x)

p( x)
Tx ( x) Fx ( x 1)
2

1
Tx ( x) Fx ( x 1) l ( x )
2

or Tx ( x) Fx ( x 1)

l ( x)

Therefore the code Tx ( x) is a unique

l ( x)

representation of

1
2n

T x( x )

Average codeword length (Rate)

2
H ( X ) lA H ( X )
m