You are on page 1of 37

Arithmetic Coding

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 2


How we can do better than
Huffman? - I
As we have seen, the main drawback of
Huffman scheme is that it has problems when
there is a symbol with very high probability
Remember static Huffman redundancy bound


where is the probability of the most likely
simbol

1
redundancy 0.086 p
1
p
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 3
How we can do better than
Huffman? - II
The only way to overcome this limitation is to
use, as symbols, blocks of several
characters.
In this way the per-symbol inefficiency is
spread over the whole block

However, the use of blocks is difficult to
implement as there must be a block for every
possible combination of symbols, so block
number increases exponentially with their
length
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 4
How we can do better than
Huffman? - III
Huffman Coding is optimal in its
framework
static model
one symbol, one word
adaptive Huffman
blocking
arithmetic
coding
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 5
The key idea
Arithmetic coding completely bypasses the
idea of replacing an input symbol with a
specific code.
Instead, it takes a stream of input symbols
and replaces it with a single floating point
number in
The longer and more complex the message, the
more bits are needed to represents the output
number
[0,1)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 6
The key idea - II
The output of an arithmetic coding is, as
usual, a stream of bits
However we can think that there is a prefix 0,
and the stream represents a fractional binary
number between 0 and 1


In order to explain the algorithm, numbers will
be shown as decimal, but obviously they are
always binary
01101010 0110 0 0. 101
Consider a ternary alphabet with A={a
1
,a
2
,a
3
} and
p(a
1
)=0.7, p(a
2
)=0.1, p(a
3
)=0.2. Using the equation,
we have F
x
(1)=0.7, F
x
(2)=0.8, F
x
(3)=1.0, Now let us
consider an input sequence (a
1
,a
2
,a
3
) . Then this
partitions the unit interval as shown in Figure 1.
7
The partition in which the tag resides depends on the first
symbol of the sequence. If the first symbol is a
1
, then the tag
lies in the interval [0.0, 0.7); if the first symbol is a
2
, then the
tag lies in the interval [0.7,0.8); if the first symbol is a
3
, then
the tag lies in the interval [0.8, 1.0). Once the interval
containing the tag has been determined, the rest of the unit
interval is discarded, and this restricted interval is again
divided in the same proportions as the original interval.

9
Cont
Now the input sequence is (a
1
,a
2
,a
3
) . The first symbol is a
1
,
and the tag would be contained in the subinterval [0.0, 0.7).
This subinterval is then subdivided in exactly the same
proportions as the original interval, yielding the subintervals
(0.00, 0.49), (0.49, 0.56), and (0.56, 0.70).
10
Cont
The second symbol in the sequence is a
2
. The tag
value is then restricted to lie in the interval [0.49,
0.56). We now partition this interval in the same
proportion as the original interval in order to obtain
the subinterval [0.49, 0.539) corresponding to the
symbol a
1
, the subinterval [0.539,0.546)
corresponding to the symbol a
2
and the subinterval
(0.546,0.560) corresponding to the symbol a
3
.
11
If the third symbol is a
3
, then the tag will be
restricted to the interval [0.546,0.560), which can be
subdivided further by following the procedure
described above.

12
Note that the appearance of each new symbol restricts
the tag to a subinterval that is disjoint from any other
subinterval that may have been generated using this
process. For the sequence (a
1
, a
2
, a
3
), since the third
symbol is a
3
, the tag is restricted to the subinterval
[0.546, 0.560). If the third symbol is a
1
instead of a
3
,
the tag would have resided in the subinterval [0.49,
0.539), which is disjoint from the subinterval
[0.546,0.560).
13
The first partition [0.00, 0.49) corresponds to the
symbol , the second partition [0.49, 0.56)
corresponds to the symbol , and the third
partition [0.56,0.70) corresponds to the symbol
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 14
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 15
Example - II
String bccb from the alphabet {a,b,c}
Zero-frequency problem solved initializing at 1 all
character counters
When the first b is to be coded all symbols have a 33%
probability (why?)

The arithmetic coder maintains two numbers, low and
high, which represent a subinterval [low,high) of the
range [0,1)
Initially low=0 and high=1
16
An example - II
The range between low and high is divided between the
symbols of the alphabet, according to their probabilities
low
high
0
1
0.3333
0.6667
a
b
c
(P[c]=1/3)
(P[b]=1/3)
(P[a]=1/3)
17
An example - III
low
high
0
1
0.3333
0.6667
a
b
c
b
low = 0.3333
high = 0.6667
P[a]=1/4
P[b]=2/4
P[c]=1/4
new probabilities
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 18
An example - IV
new probabilities
P[a]=1/5
P[b]=2/5
P[c]=2/5
low
high
0.3333
0.6667
0.4167
0.5834
a
b
c
c
low = 0.5834
high = 0.6667
(P[c]=1/4)
(P[b]=2/4)
(P[a]=1/4)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 19
An example - V
new probabilities
P[a]=1/6
P[b]=2/6
P[c]=3/6
low
high
0.5834
0.6667
0.6001
0.6334
a
b
c
c
low = 0.6334
high = 0.6667
(P[c]=2/5)
(P[b]=2/5)
(P[a]=1/5)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 20
An example - VI
Final interval
[0.6390,0.6501)
we can send 0.64
low
high
0.6334
0.6667
0.6390
0.6501
a
b
c
low = 0.6390
high = 0.6501
b
(P[c]=3/6)
(P[b]=2/6)
(P[a]=1/6)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 21
An example - summary
Starting from the range between 0 and 1 we restrict ourself
each time to the subinterval that codify the given symbol
At the end the whole sequence can be codified by any of the
numbers in the final range (but mind the brackets...)
22
An example - summary
0
1
0.3333
0.6667
a
b
c
0.6667
0.3333
1/3
1/3
1/3
0.4167
0.5834
1/4
2/4
1/4
a
b
c
0. 5834
0. 6667
2/5
2/5
1/5
0.6001
0.6334
a
b
c
0. 6667
0.6334
a
b
c
0.6390
0.6501
3/6
2/6
1/6
[0.6390, 0.6501) 0.64
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 23
Another example - I
Consider encoding the name BILL GATES
Again, we need the frequency of all the
characters in the text.


chr freq.
space 0.1
A 0.1
B 0.1
E 0.1
G 0.1
I 0.1
L 0.2
S 0.1
T 0.1
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 24
Another example - II
character probability range
space 0.1 [0.00, 0.10)
A 0.1 [0.10, 0.20)
B 0.1 [0.20, 0.30)
E 0.1 [0.30, 0.40)
G 0.1 [0.40, 0.50)
I 0.1 [0.50, 0.60)
L 0.2 [0.60, 0.80)
S 0.1 [0.80, 0.90)
T 0.1 [0.90, 1.00)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 25
Another example - III
chr low high
0.0 1.0
B 0.2 0.3
I 0.25 0.26
L 0.256 0.258
L 0.2572 0.2576
Space 0.25720 0.25724
G 0.257216 0.257220
A 0.2572164 0.2572168
T 0.25721676 0.2572168
E 0.257216772 0.257216776
S 0.2572167752 0.2572167756

The final low value, 0.2572167752 will uniquely encode
the name BILL GATES
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 26
Decoding - I
Suppose we have to decode 0.64
The decoder needs symbol probabilities, as it
simulates what the encoder must have been
doing
It starts with low=0 and high=1 and divides
the interval exactly in the same manner as the
encoder (a in [0, 1/3), b in [1/3, 2/3), c in
[2/3, 1)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 27
Decoding - II
The trasmitted number falls in the interval
corresponding to b, so b must have been the
first symbol encoded
Then the decoder evaluates the new values for
low (0.3333) and for high (0.6667), updates
symbol probabilities and divides the range
from low to high according to these new
probabilities
Decoding proceeds until the full string has
been reconstructed
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 28
Decoding - III
0.64 in [0.3333, 0.6667) b
0.64 in [0.5834, 0.6667) c...

and so on...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 29
Why does it works?
More bits are necessary to express a number
in a smaller interval
High-probability events do not decrease very
much interval range, while low probability
events result a much smaller next interval
The number of digits needed is proportional to
the negative logarithm of the size of the
interval
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 30
Why does it works?
The size of the final interval is the product of the
probabilities of the symbols coded, so the logarithm of this
product is the sum of the logarithm of each term
So a symbol s with probability Pr[s] contributes


bits to the output, that is equal to symbol probability
content (uncertainty)!!
logPr[ ] s
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 31
Why does it works?
For this reason arithmetic coding is nearly
optimum as number of output bits, and it is
capable to code very high probability events in
just a fraction of bit
In practice, the algorithm is not exactly
optimal because of the use of limited precision
arithmetic, and because trasmission requires
to send a whole number of bits
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 32
A trick - I
As the algorithm was described until now, the whole
output is available only when encoding are finished
In practice, it is possible to output bits during the
encoding, which avoids the need for higher and higher
arithmetic precision in the encoding
The trick is to observe that when low and high are close
they could share a common prefix
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 33
A trick - II
This prefix will remain forever in the two values, so we
can transmit it and remove from low and high
For example, during the encoding of bccb, it has
happened that after the encoding of the third character
the range is low=0.6334, high=0.6667
We can remove the common prefix, sending 6 to the
output and transforming low and high into 0.334 and
0,667
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 34
The encoding step
To code symbol s, where symbols are
numbered from 1 to n and symbol i has
probability Pr[i]

low_bound =
high_bound =
range = high - low
low = low + range * low_bound
high = low + range * high_bound
1
1
Pr[ ]
s
i
i

1
Pr[ ]
s
i
i

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-2006 36


Implementing arithmetic coding
As mentioned early, arithmetic coding uses
binary fractional number with unlimited
arithmetic precision
Working with finite precision (16 or 32 bits)
causes compression be a little worser than
entropy bound
It is possible also to build coders based on
integer arithmetic, with another little
degradation of compression
37
Arithmetic coding vs. Huffman coding
In tipical English text, the space character is the most common,
with a probability of about 18%, so Huffman redundancy is quite
small.

On the contrary, in black and white images, arithmetic coding is
much better than Huffman coding, unless a blocking technique is
used
A. coding requires less memory, as symbol representation is
calculated on the fly
A. coding is more suitable for high performance models, where
there are confident predictions
38
Arithmetic coding vs. Huffman coding
H. decoding is generally faster than a. decoding
In a. coding it is not easy to start decoding in the middle of the
stream, while in H. coding we can use starting points
In large collections of text and images, Huffman coding is likely
to be used for the text, and arithmeting coding for the images