You are on page 1of 18

# Generating code words for groups or

sequences of symbols
A unique identifier or tag is generated for
the sequence to be encoded
This tag is a fraction and it is binary
encoded
The set of tags for representing symbols
are the numbers in the unit interval [0,1)
Number of numbers in the unit interval is
infinite.
Unique tag can be generated for each
sequence.

## Cumulative distribution function of the random

variables associated with the source is used for
mapping the sequence to a tag.
Generating the tag:
Example:
Consider a three-letter alphabet A = {a1, a2, a3}
with
P(a1) = 0.7,P(a2) = 0.1, and P(a3) = 0.2.
Fx(1) = 0.7, Fx (2) = 0.8, and Fx(3) = 1. This
partitions
the unit interval as shown in Figure.

0.0

a1

0.7

0.49

0.546

a1

a1

a1

0.49
a2

0.8

0.5558

0.539
a2

0.56
a3

1.0

0.0

0.546
a3
0.70

a2
a3
0.56

a2
0.5572
a3
0.56

## The interval in which the tag for a particular

sequence resides is disjoint from all
intervals in which the tag for any other
sequence may reside.
Any number in this interval can be used as
the tag(say, mid point/lower value).
the tag forms a unique representation for
the sequence. This means that the binary
representation of the tag forms a unique
binary code for the sequence.

## We can show that for any sequence

x = (x1, x2,.xn)

l ( n ) l ( n 1) (u ( n 1) l ( n 1) ) FX ( xn 1)
u ( n ) l ( n 1) (u ( n 1) l ( n 1) ) FX ( xn )
u (n) l (n)
TX ( x )
2
The only information required by the tag
generation procedure is the cdf of the source,
which can be obtained directly from the probability
model.

## Given a DMS A = {a1, a2, a3}

P(a1) = 0.8,
P(a2) =0.02
Obtain tag for a1 a3 a2 a1.
Ans. Tag = 0.772352

P(a3) =0.18

## Given Tag Value, Probability model & length

of the block
Tag value = 0.772352

## The interval containing this tag value is the

subset of every interval obtained in the
encoding process. While decoding, decode
the elements in such a way that upper and
lower limits always contain this tag value.

1.
2.
3.

4.
5.

Let the first element is decoded as x1
Then l(1)= 0 +(1-0)Fx(x1-1)=Fx(x1-1)
u(1) =0 +(1-0)Fx(x1)=Fx(x1)
Find the value of x1 for which this interval
contains the tag value.
If we pick x1=1, the interval is [0,0.8). If
we pick x1=2, the interval is [0.8,0.82),
and if we pick x1=3, the interval is
[0.82,1.0). As 0.77232 lies in the interval
[0.0,0.8), we choose x1=1

## Let the second element is x2.

l(2)
=
0 +(0.8-0)Fx(x2-1)=0.8 Fx(x2-1)
u(2)
=
0 +(0.8-0)Fx(x2)=0.8 Fx(x2)
If we pick x2=1, the updated interval is [0, 0.64,),
which does not contain the tag.
If we pick x2=2, the updated interval is [0.64,
0.656), which also does not contain the tag.

## If we pick x2=3, the updated interval is [0.656,

0.8) which contains the tag.

l (3)
u (3)

## = 0.656 +(0.8-0.656) Fx(x3-1) = 0.656 + 0.144 Fx(x3-1)

= 0.656 +(0.8-0.656) Fx(x3) = 0.656 + 0.144 Fx(x3)

(2)

## Find the value of x3 for which the interval [0.144 x Fx (x3-1),

0.144 x Fx (x3)) contains 0.772352 - 0.656 (Residual tag)
Divide the residual tag value by 0.144 to get 0.808.
Find the value of x3 for which 0.808 falls in the interval
[Fx(x3-1), Fx(x3) ). It is possible if x3 =2.

(4)

(4)

## 0.7712 +(0.77408-0.7712) Fx(x4-1) =

0.7712 + 0.00288 Fx(x 4-1)
0.7712 +(0.77408-0.7712) Fx(x4) =
0.7712 + 0.00288 Fx(x 4)
(3)

## (0.772352 - 0.7712 =0.001152)

Find the value of x4 for which the interval [0. 00288 x Fx (x4-1),
0. 00288 x Fx (x4)) contains 0.001152 (Residual tag)
Divide the residual tag value by 0. 00288 to get 0.4.
Find the value of x4 for which 0.4 falls in the interval
[Fx(x4-1), Fx(x4) ). It is possible if x4 =1.

1.
2.
3.
4.
5.

## Initialize l(0) = 0 & u(0) = 1

For each k find t* = (tag - l(k-1))/ (u(k-1)- l(k-1))
Fin the value of xk for which Fx (xk-1)

t*
Fx (xk)
Update u(k) and l(k)
Continue until the entire sequence has been
decoded.

T x( x )

log(1 / p( x)) 1

## Eg: A ={a1, a2, a3,a4}

P(a1) = , P(a2) = , P(a3) =

Symbol

Fx

, P(a4) =

Tag

In binary

1
log p( x) 1

Code

0.5

.25

.010

01

0.75

.625

.101

101

0.875

.8125

.1101

1101

1.0

.9375

.1111

1111

## Binary code obtained by truncating is uniquely decodable.

Proof: T x( x ) is a unique identifier.
It lies in the interval [Fx(x-1), Fx(x))
We have to show that T ( x) is also contained in this
x
l ( x)
interval
Since we are truncating the binary representation of T x( x )
to T ( x) , T ( x)
is less than or equal to T x( x )

l ( x)

l ( x)

1

0 Tx ( x) Tx ( x) l ( x )

l ( x) 2
Since
< Fx(x), < Fx(x).

T x( x )

Tx ( x)
l ( x)

## We have to show that T ( x)

is greater than or equal to
x

l ( x)
Fx(x-1)
1
1

1
2l ( x )
log
1
p( x)

2
1

1
log

p( x)

1
p( x)

1
2
2
p( x)

p( x)
Tx ( x) Fx ( x 1)
2

1
Tx ( x) Fx ( x 1) l ( x )
2

or Tx ( x) Fx ( x 1)

l ( x)

## Therefore the code Tx ( x) is a unique

l ( x)

representation of

1
2n

T x( x )

## Average codeword length (Rate)

2
H ( X ) lA H ( X )
m