This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

:

entropy coding combining speed of Huﬀman coding

with compression rate of arithmetic coding

Jarek Duda

Center for Science of Information, Purdue University, W. Lafayette, IN 47907, U.S.A.

email: dudaj@purdue.edu

Abstract

The modern data compression is mainly based on two approaches to entropy

coding: Huﬀman (HC) and arithmetic/range coding (AC). The former is much

faster, but approximates probabilities with powers of 2, usually leading to relatively

low compression rates. The latter uses nearly exact probabilities - easily approaching

theoretical compression rate limit (Shannon entropy), but at cost of much larger

computational cost.

Asymmetric numeral systems (ANS) is a new approach to accurate entropy

coding, which allows to end this tradeoﬀ between speed and rate: the recent imple-

mentation [1] provides about 50% faster decoding than HC for 256 size alphabet,

with compression rate similar to provided by AC. This advantage is due to be-

ing simpler than AC: using single natural number as the state, instead of two to

represent a range. Beside simplifying renormalization, it allows to put the entire

behavior for given probability distribution into a relatively small table: deﬁning

entropy coding automaton. The memory cost of such table for 256 size alphabet

is a few kilobytes. There is a large freedom while choosing a speciﬁc table - using

pseudorandom number generator initialized with cryptographic key for this purpose

allows to simultaneously encrypt the data.

This article also introduces and discusses many other variants of this new entropy

coding approach, which can provide direct alternatives for standard AC, for large

alphabet range coding, or for approximated quasi arithmetic coding.

1 Introduction

In standard numeral systems diﬀerent digits as treated as containing the same amount

of information: 1 bit in the binary system, or generally lg(b) bits (lg ≡ log

2

) in base b

numeral system. However, event of probability p contains lg(1/p) bits of information. So

while standard numeral systems are optimal for uniform digit probability distributions, to

optimally encode general distributions, what is the heart of data compression, we should

try to somehow asymmetrize this concept. We can obtain arithmetic/range coding(AC)

([10], [9]) or recent asymmetric numeral system(ANS) ([2], [3]) this way, depending on

position we add succeeding digits.

1

a

r

X

i

v

:

1

3

1

1

.

2

5

4

0

v

2

[

c

s

.

I

T

]

6

J

a

n

2

0

1

4

1 INTRODUCTION 2

Speciﬁcally, in standard binary numeral system, having information stored in a natural

number x (having m digits), we can add information from a digit s ∈ ¦0, 1¦ in basically

two diﬀerent ways: in the most signiﬁcant position (x → x+2

m

s) or in the least signiﬁcant

position (x → 2x+s). The former means that the new digit chooses between large ranges

- we can asymmetrize it by changing proportions of these ranges, getting arithmetic/range

coding. However, while in standard numeral systems we would need to remember position

of this most signiﬁcant digit (m), AC requires to specify the range in given moment - its

current state are two numbers representing the range.

In contrast, while adding information in the least signiﬁcant position, the current

state is just a single natural number. This advantage remains for the asymmetrization:

ANS. While in standard binary system x becomes x-th appearance of corresponding even

(s = 0) or odd (s = 1) number, this time we would like to redeﬁne this splitting of N into

even and odd numbers. Such that they are still ”uniformly distributed”, but with diﬀerent

densities - corresponding to symbol probability distribution we would like to encode, like

in Fig. 1.

Let us look at both approaches from information theory point of view. For AC,

knowing that we are currently in given half of the range is worth 1 bit of information,

so the current content is lg(size of range/size of subrange) bits of information. While

encoding symbol of probability p, the size of subrange is multiplied by p, increasing

informational content by lg(1/p) bits as expected. From ANS point of view, seeing

x as containing lg(x) bits of information, informational content should increase to

lg(x) + lg(1/p) = lg(x/p) bits while adding symbol of probability p. So we need to

ensure x →≈ x/p transition relation. Intuitively, rule: x becomes x-th appearance

of s-th subset fulﬁlls this relation if this subset is uniformly distributed with Pr(s) density.

Figure 1: Two ways to asymmetrize binary numeral system. Having some information

stored in a natural number x, to attach information from 0/1 symbol s, we can add it in

the most signiﬁcant position (x

= x + s2

m

), where s chooses between ranges, or in the

least signiﬁcant (x

**= 2x + s) position, where s chooses between even and odd numbers.
**

The former asymmetrizes to AC by changing range proportions. The latter asymmetrizes

to ABS by redeﬁning even/odd numbers, such that they are still uniformly distributed,

but with diﬀerent density. Now x

**is x-th element of the s-th subset.
**

1 INTRODUCTION 3

Figure 2: Left: construction of Huﬀman coding while grouping two symbols from

(1, 1, 1)/3 probability distribution: we have 9 equiprobable possibilities. This group-

ing reduces distance from the Shannon entropy: ∆H. Standard base 3 numeral system

would give ∆H = 0 here. Right: ∆H for grouping m symbols. For example operating on

3

10

= 59049 possibilities allows to get loss ∆H ≈ 0.003 bits/symbol. In comparison, ∆H

for ANS using just 16 or 17 states for the last four cases is correspondingly: ≈ 0.00065,

0.00122, 0.00147, 0.00121 bits/symbol.

Huﬀman coding [6] can be seen as a memoryless coder: type ”symbol→ bit sequence”

set of rules. It transforms every symbol into a natural number of bits, approximating

their probabilities with powers of 1/2. Theoretically it allows to approach the capacity

(Shannon entropy) as close as we want by grouping multiple symbols. However, we can

see in Fig. 2 that this convergence is relatively slow: getting ∆H ≈0.001 bits/symbol

would be completely impractical here. Precise analysis can be found in [11].

Let us generally look at the cost of approximating probabilities. If the we are using a

coder which encodes perfectly (q

s

) symbol distribution to encode (p

s

) symbol sequence, we

would use on average

s

p

s

lg(1/q

s

) bits per symbol, while there is only Shannon entropy

needed:

s

p

s

lg(1/p

s

). The diﬀerence between them is called Kullback - Leiber distance:

∆H =

s

p

s

lg

_

p

s

q

s

_

≈

s

−p

s

ln(2)

_

_

1 −

q

s

p

s

_

−

1

2

_

1 −

q

s

p

s

_

2

_

≈ 0.72

s

(

s

)

2

p

s

(1)

where

s

= q

s

−p

s

will be referred as inaccuracy.

So for better than Huﬀman convergence, we need more accurate coders - we need to

handle fractional numbers of bits. It can be done by adding a memory/buﬀer to the coder,

containing noninteger number of bits. The coder becomes a ﬁnite state automaton, with

type (symbol, state) → (bit sequence, new state) coding rules.

As it was discussed, the current state contains lg(size of range/size of subrange) bits

in the case of AC, or lg(x) bits in the case of ANS. Allowing such state to contain large

number of bits would require to operate with large precision in AC, or large arithmetic

in ANS - is impractical. To prevent that, in AC we regularly perform renormalization: if

the subrange is in one half of the range, we can send single bit to the stream (pointing

this half), and rescale this half up to the whole range. Analogously we need to gather

accumulated bits in ANS: transfer to the stream some least signiﬁcant bits of the state,

1 INTRODUCTION 4

Figure 3: Some 4 state encoding and decoding automaton for ANS with Pr(a) =

3/4, Pr(b) = 1/4 probability distribution. Upper edges of encoding picture are tran-

sitions for symbol ”a”, lower for symbol ”b”. Some edges contain digits produced while

corresponding transition. Intuitively, x is a buﬀer containing lg(x) ∈ [2, 3) bits of informa-

tion - symbol ”b” always produces 2 bits, while ”a” accumulates in the buﬀer. Decoding

is unique because each state corresponds to a ﬁxed symbol and number of digits to pro-

cess: 1 for state 4, 2 for state 5, none for 6, 7. There is written stationary probability

distribution for i.i.d. source, which allows to ﬁnd average number of used bits per sym-

bol: ≈ 2 1/4 + 1 3/4 (0.241 + 0.188) ≈ 0.82, what is larger than Shannon entropy

by ∆H ≈ 0.01 bits/symbol. Increasing the number of states to 8 allows to reduce it to

∆H ≈ 0.0018 bits/symbol.

such that we will return to a ﬁxed range (I = ¦l, .., 2l −1¦) after encoding given symbol.

In contrast to AC, the number of such bits can be easily determined here. Example of

such 4 state ANS based automaton for I = ¦4, 5, 6, 7¦ can be see in Fig. 3: while symbol

b of 1/4 probability always produces 2 bits of information, symbol ”a” of 3/4 probability

usually accumulates information by increasing the state, ﬁnally producing complete bit

of information. While decoding, every state know the number of bits to use.

This article introduces and discusses many variants of applying the ANS approach

or its binary case: asymmetric binary system (ABS). Most of them have nearly a direct

alternative in arithmetic coding family:

AC ANS

very accurate, formula for binary alphabet AC uABS, rABS

fast, tabled for binary alphabet M coder tABS

covering all probabilities with small automata quasi AC qABS

very accurate, formula for large alphabet Range Coding rANS

tabled for large alphabet - tANS

There will be now brieﬂy described these variants. The initial advantage of ANS ap-

proach is much simpler renormalization. In AC it is usually required to use slow branches

to extract single bits and additionally there is an issue when range contains the middle

point. These computationally costly problems disappear in ANS: we can quickly deduce

the number of bits to use in given step from x, or store them in the table - we just directly

transfer the whole block of bits once per step, making it faster.

uABS stands for direct arithmetic formula for uniformly distributing symbols. rABS

stands for distributing symbols in ranges - leading to still direct formula, a bit less accurate

2 BASIC CONCEPTS AND VERSIONS 5

but simpler to calculate. Its main advantage is allowing for large alphabet version: rANS,

which can be seen as direct alternative for Range Coding, but with a small advantage:

instead of 2 multiplications per step, it requires only a single one. We can also put e.g.

uABS behavior into tables, getting tABS. It can be applied similarly like one of many AC

approximations, for example M coder used in CABAC [8] in modern video compression.

Matt Mahoney implementations of tABS (fpaqb) and uABS (fpaqc) are available in [7].

While those very accurate versions can easily work loss many orders of magnitude

below ∆H ≈0.001 bits/symbol, qABS resembles quasi arithmetic coding [5] approxima-

tion: we would like to construct a family of small entropy coding automata, such we can

cover the whole space of (binary) probability distributions by switching between them.

For qABS it is 5 automata with 5 states for ∆H ≈0.01 bits/symbol loss, or about 20

automata with 16 states for ∆H ≈0.001 bits/symbol.

While above possibilities might not bring a really essential advantage, due to much

smaller state space, tabled ANS puts the whole behavior for large alphabet into a rela-

tively small coding table, what would be rather too demanding for AC approach. There

is available interactive demonstration of this tANS: [4]. The number of states of such

automaton should be about 2-4 times larger than the size of alphabet for ∆H ≈0.01

bits/symbol loss, or about 8-16 times for ∆H ≈0.001 bits/symbol. For 256 size alphabet

it means 1-16kB memory cost per such coding table. Generally ∆H drops with the square

of the number of states: doubling the number of states means about 4 times smaller ∆H.

As in the title, in this way we obtain extremely fast entropy coding: recent Yann Collet

implementation [1] has 50% faster decoding than fast implementation of Huﬀman coding,

with nearly optimal compression rate - combining advantages of Huﬀman and arithmetic

coding.

There is a large freedom while choosing the coding table for given parameters: ev-

ery symbol distribution deﬁnes an essentially diﬀerent coding. We will also discuss three

reasons of chaotic behavior of this entropy coder: asymmetry, ergodicity and diﬀusion,

making extremely diﬃcult any trial of tracing the state without complete knowledge.

These two reasons suggest to use it also for cryptographic applications: use pseudoran-

dom number generator initialized with a cryptographic key to choose the exact simbol

distribution/coding. This way we simultaneously get a decent encryption of encoded data.

2 Basic concepts and versions

We will now introduce basic concepts for encoding information in natural numbers and

ﬁnd analytic formulas for uABS, rABS and rANS cases.

2.1 Basic concepts

There is given an alphabet / = ¦0, .., n−1¦ and assumed probability distribution ¦p

s

¦

s∈A

,

s

p

s

= 1. The state x ∈ N in this section will contain the entire already processed symbol

sequence.

We need to ﬁnd encoding (C) and decoding (D) functions. The former takes a state

x ∈ N and symbol s ∈ /, and transforms them into x

**∈ N storing information from
**

both of them. Seeing x as a possibility of choosing a number from ¦0, 1, .., x−1¦ interval,

2 BASIC CONCEPTS AND VERSIONS 6

it contains lg(x) bits of information. Symbol s of probability p

s

contains lg(1/p

s

) bits

of information, so x

**should contain approximately lg(x) + lg(1/p
**

s

) = lg(x/p

s

) bits of

information: x

should be approximately x/p

s

, allowing to choose a value from a larger

interval ¦0, 1, .., x

**−1¦. Finally we will have functions deﬁning single steps:
**

C(s, x) = x

, D(x

) = (s, x) : D(C(s, x)) = (s, x), C(D(x

)) = x

, x

≈ x/p

s

For standard binary system we have C(s, x) = 2x+s, D(x

) = (mod(x, 2), ¸x/2|) and

we can see s as choosing between even and odd numbers. If we imagine that x chooses

between even (or odd) numbers in some interval, x

**= 2x+s chooses between all numbers
**

in this interval.

For the general case, we will have to redeﬁne the subsets corresponding to diﬀerent s:

like even/odd numbers they should still uniformly cover N, but this time with diﬀerent

densities: ¦p

s

¦

s∈A

. We can deﬁne this split of N by a symbol distribution s : N → /

¦0, 1, .., x

−1¦ =

_

s

¦x ∈ ¦0, 1, .., x

−1¦ : s(x) = s¦

While in standard binary system x

**is x-th appearance of even/odd number, this time it
**

will be x-th appearance of s-th subset. So the decoding function will be

D(x) = (s(x), x

s(x)

) where x

s

:= [¦y ∈ ¦0, 1, .., x −1¦ : s(y) = s¦[ (2)

and C(s, x

s

) = x is its inversion. Obviously we have x =

s

x

s

.

As x/x

s

is the number of bits we currently use to encode symbol s, to reduce

inaccuracy and so ∆H, we would like that x

s

≈ xp

s

approximation is as close as possible

- what intuitively means that symbols are nearly uniformly distributed with ¦p

s

¦ density.

We will now ﬁnd formulas for the binary case by just taking x

1

:= ,xp| and in Section

4 we will focus on ﬁnding such nearly uniform distributions on a ﬁxed interval for larger

alphabets.

Finally we can imagine that x is a stack of symbols, C is push operation, D is pop

operation. The current encoding algorithm would be: start e.g. with x = 1 and then use

C with succeeding symbols. It would lead to a large natural number, from which we can

extract all the symbols in reversed order using D. To prevent inconvenient operations

on large numbers, in the next section we will discuss stream version, in which cumulated

complete bits will be extracted to make that x remains in a ﬁxed interval I.

2.2 Uniform asymmetric binary systems (uABS)

We will now ﬁnd some explicit formulas for the binary case and nearly uniform distribution

of symbols: / = ¦0, 1¦.

Denote p := p

1

, ˜ p := 1 −p = p

0

. To obtain x

s

≈ x p

s

we can for example choose

x

1

:= ,xp| (or alternatively x

1

:= ¸xp|) (3)

x

0

= x −x

1

= x −,xp| (or x

0

= x −¸xp|) (4)

2 BASIC CONCEPTS AND VERSIONS 7

Now s(x) = 1 if there is a jump of ,xq| in the succeeding position:

s := ,(x + 1)p| −,xp| (or s := ¸(x + 1)p| −¸xq|) (5)

This way we have found some decoding function: D(x) = (s, x

s

).

We will now ﬁnd the corresponding encoding function: for given s and x

s

we want to

ﬁnd x. Denote r := ,xq| −xq ∈ [0, 1)

s(x) = s = ,(x + 1)p| −,xp| = ,(x + 1)q −,xq|| = ,(x + 1)p −r −xq| = ,p −r| (6)

s = 1 ⇔ r < p

• s = 1: x

1

= ,xp| = xp + r

x =

x

1

−r

p

=

_

x

1

p

_

as it is a natural number and 0 ≤ r < p.

• s = 0: p ≤ r < 1 so ˜ p ≥ 1 −r > 0

x

0

= x −,xp| = x −xp −r = x˜ p −r

x =

x

0

+ r

˜ p

=

x

0

+ 1

˜ p

−

1 −r

˜ p

=

_

x

0

+ 1

˜ p

_

−1

Finally encoding is:

C(s, x) =

_

_

_

_

x+1

1−p

_

−1 if s = 0

_

x

p

_

if s = 1

_

_

or =

_

_

_

_

x

1−p

_

if s = 0

_

x+1

p

_

−1 if s = 1

_

_

(7)

For p = 1/2 it is the standard binary numeral system with switched digits.

The starting values for this formula and p = 0.3 are presented in bottom-right of Fig.

1. Here is an example of encoding process by inserting succeeding symbols:

1

1

− → 3

0

− → 5

0

− → 8

1

− → 26

0

− → 38

1

− → 128

0

− → 184

0

− → 264... (8)

We could directly encode the ﬁnal x using ¸lg(x)| +1 bits. Let us look at the growth

of lg(x) while encoding: symbol s transforms state from x

s

to x:

−lg

_

x

s

x

_

= −lg (p

s

+

s

(x)) = −lg(p

s

) −

s

(x)

p

s

ln(2)

+ O((

s

(x))

2

)

where

s

(x) = x

s

/x −p

s

describes inaccuracy.

In the found coding we have [x

s

−xp

s

[ < 1 and so [

s

(x)[ < 1/x. While encoding symbol

sequence of ¦p

s

¦

s∈A

probability distribution, the sum of above expansion says that we

need on average H = −

s

p

s

lg(p

s

) bits/symbol plus higher order terms. As x grows

exponentially while encoding, [

s

(x)[ < 1/x, so these corrections are O(1) for the whole

sequence.

2 BASIC CONCEPTS AND VERSIONS 8

2.3 Range variants (rABS, rANS)

We will now introduce alternative approach, in which we place symbol appearances in

ranges. It is less accurate, but can be less expensive to perform. Most importantly, it

allows for large alphabets, getting direct alternative for Range Coding - this time using

single multiplication per symbol instead of two required in Range Coding. This approach

can be seen as taking a standard numeral system and merging some of its succeeding digits.

We will start with binary example (rABS), and then deﬁne general formulas.

Example: Looking at (1,3)/4 probability distribution, we would like to take a standard

base 4 numeral system: using digits 0, 1, 2, 3 and just merge 3 of its digits - let say

1, 2 and 3. So while the symbol distribution for standard ternary system would be

s(x) = mod(x, 4): cyclic (0123), we will asymmetrize it to (0111) cyclic distribution:

s(x) = 0 if mod(x, 4) = 0 else s(x) = 1

Behavior for s = 0 is exactly like in ternary system. In contrast, for s = 1 we will go to

¸x/3|-th of (0111) quadruples:

C(0, x) = 4x C(1, x) = 4¸x/3| + mod(x, 3) + 1

where the ”+1” is to shift over the single ”0” appearance. Decoding:

if(mod(x, 4) = 0) ¦s = 0; x = ¸x/4|¦ else ¦s = 1; x = 3¸x/4| + mod(x, 4) −1¦

Analogously we can deﬁne coding/decoding functions for diﬀerent fractions, what is a spe-

cial case of large alphabet formulas we will ﬁnd now: rANS. Assume that the probability

distribution is a fraction of form: (l

0

, l

1

, ..., l

n−1

)/m where l

s

∈ N, m =

s

l

s

.

Now let us analogously imagine that we start with base m numeral system and merge

corresponding numbers of succeeding digits together: the symbol distribution is cyclic

(00..011..1...”n-1”) with l

s

appearances of symbol s. Let us deﬁne s(x) as symbol in

x ∈ [0, .., m−1] position in this cycle:

s(x) = s(mod(x, m)) where s(x) = min

_

s : x <

s

i=0

l

i

_

and b

s

:=

s−1

i=0

l

i

is the starting position of s symbol in this cycle. These l

s

, b

s

and s(x)

should be tabled in coder. Putting l(x) = l

s(x)

and b(x) = b

s(x)

into tables would reduce

the number of use per step to one.

Now encoding and decoding steps are:

C(s, x) = m¸x/l

s

| + b

s

+ mod(x, l

s

)

D(x) = (s, l

s

¸x/m| + mod(x, m) −b

s

) where s = s(mod(x, m))

If we choose m as a power of 2, multiplying and dividing by it can be made by bit shifts,

mod(x, m) by bitand with mask - decoding requires only single multiplication per step.

In contrast, in Range Coding there are needed two - this approach should be faster.

3 STREAM VERSION - ENCODING FINITE-STATE AUTOMATON 9

Let us ﬁnd some boundary of its inaccuracy

s

(x) = x

s

/x −p

s

:

[x

s

/x −p

s

[ =

¸

¸

¸

¸

l

s

¸x/m| + mod(x, m) −b

s

x

−

l

s

m

¸

¸

¸

¸

=

¸

¸

¸

¸

(1 −l

s

/m)mod(x, m) −b

s

x

¸

¸

¸

¸

< m/x

as ¸x/m| = (x −mod(x, m))/m and b

s

< m.

This inaccuracy boundary is m times larger than for uABS. However, as this approach

is intended to directly apply above arithmetic formulas, we can use relatively large x,

making this inaccuracy negligible.

3 Stream version - encoding ﬁnite-state automaton

Arithmetic coding requires renormalization to work with ﬁnite precision. Analogously to

prevent ANS state from growing to inﬁnity, we will enforce x to remain in some chosen

range I = ¦l, .., 2l − 1¦. This way x can be seen as a buﬀer containing between l and

l +1 bits of information. So while we operate on symbols containing non-integer number

of bits, as they accumulate inside the buﬀer, we can extract complete bits of information

to remain in its operating region. These extracted bits are transferred to the bitstream

and used while decoding. Finally the situation will intuitively look like in Fig. 4.

3.1 Algorithm

To make these considerations more general, we will extract digits in base 2 ≤ b ∈ N

numeral system. Usually we will use bits: b = 2, but sometimes using a larger b could be

more convenient, for example b = 2

k

allows to extract k bits at once. Extracting the least

signiﬁcant base b digit means: x → ¸x/b| and mod(x, b) goes to the bitstream.

Observe that taking interval in form (l ∈ N):

I := ¦l, l + 1, .., lb −1¦ (9)

for any x ∈ N we have exactly one of three cases:

• x ∈ I or

• x > lb −1, then ∃!

k∈N

¸x/b

k

| ∈ I or

• x < l, then ∀

(d

i

)

i

∈{0,..,b−1}

N ∃!

k∈N

xb

k

+ d

1

b

k−1

+ .. + d

k

∈ I.

We will refer to this kind of interval as b-unique as eventually inserting (x → bx + d)

or removing (x → ¸x/b|) some of the least signiﬁcant digits of x, we will always ﬁnally

get to I in a unique way. Let us also deﬁne

I

s

= ¦x : C(s, x) ∈ I¦ so I =

_

s

C(s, I

s

) (10)

We can now deﬁne single steps for stream decoding: use D function and then get the

least signiﬁcant digits from the steam until we get back to I, and encoding: send the

least signiﬁcant digits to the stream, until we can use the C function:

3 STREAM VERSION - ENCODING FINITE-STATE AUTOMATON 10

Figure 4: Schematic picture of stream encoding/decoding: encoder travels right, trans-

forming symbols containing lg(1/p

s

) bits of information, into digits containing lg(d) bits

each. When information accumulates in buﬀer x, digits are produced such that x remains

in I = l, .., bl −1. Decoder analogously travels left.

Stream decoding: Stream encoding(s):

{(s, x) = D(x); {while(x / ∈ I

s

)

use s; (e.g. to generate symbol) {put mod(x, b) to output; x = ¸x/b|}

while(x / ∈ I) x = C(s, x)

x = xb+’digit from input’ }

}

Above functions are inverse of each other if only I

s

are also b-unique: there exists

¦l

s

¦

s∈A

such that

I

s

= ¦l

s

, ..., l

s

b −1¦ (11)

We need to ensure that this necessary condition is fulﬁlled, what is not automatically

true as we will see in the example. More compact form of this condition will be found in

Section 3.3 and for larger alphabets we will directly enforce its fulﬁllment in Section 4.

In practice we do not need to transfer these digits one by one like in the pseudocode

above, what needs multiple steps of slow branching and is usually required in arithmetic

coding. While using coding tables we can directly store also the number of bits to transfer

and perform the whole transfer in a single unconditional operation, what is used in fast

implementation of Yann Collet [1]. For b = 2 and l being a power of 2, the number of bits

to transfer while decoding step is the number of bits in the buﬀer minus the current length

of x (implemented in modern processors). For encoding given symbol, the corresponding

number of bits to transfer can take only one of two succeeding values.

3.2 Example

Taking uABS for p = 0.3 as in Fig. 1, let us transform it to stream version for b = 2 (we

transfer single least signiﬁcant bits). Choosing l = 8, we have x ∈ I = ¦8, .., 15¦. We can

see from this ﬁgure that I

0

= ¦5, .., 10¦, I

1

= ¦3, 4¦ are the ranges from which we get to

3 STREAM VERSION - ENCODING FINITE-STATE AUTOMATON 11

I after encoding correspondingly 0 or 1. These ranges are not b-unique, so for example

if we would like to encode s = 1 from x = 10, while transferring the least signiﬁcant bits

we would ﬁrst reduce it to x = 5 and then to x = 2, not getting into the I

1

range.

However, if we choose l = 9, I = ¦9, .., 17¦, we get I

0

= ¦6, .., 11¦ and I

1

= ¦3, 4, 5¦

which are b-unique. Here is the encoding process for s = 0, 1: ﬁrst we transfer some of

the least signiﬁcant bits to the stream, until we reduce x to I

s

range. Then we use ABS

formula (like in Fig. 1):

x ∈ I 9 10 11 12 13 14 15 16 17

bit transfer for s = 0 - - - 0 1 0 1 0 1

reduced x

∈ I

0

9 10 11 6 6 7 7 8 8

C(0, x) = C(0, x

) 14 15 17 9 9 11 11 12 12

bit transfer for s = 1 1 0 1 0, 0 1, 0 0, 1 1, 1 0, 0 1, 0

reduced x

∈ I

1

4 5 5 3 3 3 3 4 4

C(1, x) = C(1, x

) 13 16 16 10 10 10 10 13 13

Here is an example of evolution of (state, bit sequence) while using this table:

(9, −)

1

− → (13, 1)

0

− → (9, 11)

0

− → (14, 11)

1

− → (10, 1101)

0

− → (15, 1101)

1

− → (10, 110111)...

The decoder will ﬁrst use D(x) to get symbol and reduced x

**, then add the least
**

signiﬁcant bits from stream (in reversed order).

To ﬁnd the expected number of bits/symbol used by such process, let us ﬁrst ﬁnd its

stationary probability distribution assuming i.i.d. input source. Denoting by C(s, x) the

state to which we go from state x due to symbol s (like in the table above), this stationary

probability distribution have to fulﬁll:

Pr(x) =

s,y: C(s,y)=x

Pr(y) p

s

(12)

equation, being the dominant eigenvector of corresponding stochastic matrix. Such

numerically found distribution is written in Fig. 3. Here is approximated distribution for

the currently considered automaton and comparison with close Pr(x) ∝ 1/x distribution,

what will be motivated in Section 3.5:

x 9 10 11 12 13 14 15 16 17

Pr(x) 0.1534 0.1240 0.1360 0.1212 0.0980 0.1074 0.0868 0.0780 0.0952

1.3856/x 0.1540 0.1386 0.1260 0.1155 0.1066 0.0990 0.0924 0.0866 0.0815

We can now ﬁnd the expected number of bits/symbol used by this automaton by

summing the number of bits used in encoding table:

_

17

x=12

Pr(x)

_

p

0

+

_

Pr(9) + Pr(10) + Pr(11) + 2

17

x=12

Pr(x)

_

p

1

≈ 0.88658 bits/symbol

For comparison, Shannon entropy is −p lg(p) − (1 − p) lg(1 − p) ≈ 0.88129 bits/symbol,

so this automaton uses about ∆H ≈ 0.00529 bits/symbol more than required.

3 STREAM VERSION - ENCODING FINITE-STATE AUTOMATON 12

The found encoding table fully determines the encoding process and analogously we

can generate the table for decoding process, deﬁning automaton like in Fig. 3.

Observe that having a few such encoders operating on the same range I, we can use

them consecutively in some order - if decoder will use them in reversed order, it will still

retrieve the encoded message. Such combination of multiple diﬀerent encoders can be

useful for example when probability distribution (or variant or even alphabet size) varies

depending on context, or for cryptographic purposes.

3.3 Necessary condition and remarks for stream formulas

For unique decoding we need that all I

s

are b-unique, what is not always true as we have

seen in the example above. In the next section we will enforce it by construction for

larger alphabet. We will now check when the uABS, rABS and rANS formulas fulﬁll this

condition.

In the uABS case we need to ensure that both I

0

and I

1

are b-absorbing: denoting

I

s

= ¦l

s

, .., u

s

¦, we need to check that

u

s

= bl

s

−1 for s = 0, 1

We have l

s

= [¦x < l : s(x) = s¦[, u

s

= [¦x < bl : s(x) = s¦[ − 1, so

s

l

s

= l,

s

u

s

= bl −2. It means that fulﬁlling one of these condition implies the second one.

Let us check when u

1

= bl

1

−1. We have l

1

= ,lp|, u

1

= ,blp| −1, so the condition is:

Condition: Stream uABS can be used when

b,lp| = ,blp|. (13)

The basic situation this condition is fulﬁlled is when lp ∈ N, what means that p is deﬁned

with 1/l precision.

For rABS and rANS, for simplicity let us assume that m divides l: l = km (there

can be also other possibilities). In this case, kl

s

is the ﬁrst appearance of symbol s in I,

bkl

s

−1 is the last one as required.

Condition: If m divides l, rABS and rANS can be used in stream version.

There are two basic possibilities for using coding formulas ([7] implementations):

• use them directly for example in 32 bit arithmetics (fpaqc). As l > 2

20

, inaccuracy

becomes negligible, we can use large b to extract multiple bits at once. However,

the multiplication can make it a bit slower,

• store behavior on some chosen range (fpaqb) - it is less accurate, but can be a bit

faster and leaves freedom to choose exact encoding - we will explore this possibility

with tANS.

If the probability varies, in the former case we just use the formula for current p, while in

the latter case we should have prepared tables for diﬀerent probability distributions (e.g.

quantized) we could use for approximation. Such varying of probability distribution is

allowed as long I remains ﬁxed.

3 STREAM VERSION - ENCODING FINITE-STATE AUTOMATON 13

3.4 Analysis of a single step

Let us now take a closer look at a single step of stream encoding and decoding. Observe

that the number of digits to transfer to get from x to I

s

= ¦l

s

, .., bl

s

−1¦ is k = ¸log

b

(x/l

s

)|,

so the (new symbol, digit sequence) while stream coding is:

x

s

− →

_

C(s, ¸x/b

k

|), mod(x, b

k

)

_

where k = ¸log

b

(x/l

s

)| (14)

this mod(x, b

k

) = x−b

k

¸x/b

k

| is the information lost from x while digit transfer - this value

is being sent to the stream in base b numeral system. The original algorithm produces

these digits in reversed order: from less to more signiﬁcant.

Observe that for given symbol s, the number of digits to transfer can obtain one of

two values: k

s

− 1 or k

s

for k

s

:= ¸log

b

(x/l

s

)| + 1. The smallest x requiring to transfer

k

s

digits is X

s

:= lb

ks

. So ﬁnally the number of bits to transfer while encoding is k

s

− 1

for x ∈ ¦l, .., X

s

−1¦ and k

s

for x ∈ ¦X

s

, .., lb −1¦, like in Fig. 5.

While stream decoding, the current state x deﬁnes currently produced symbol s. In

example in Fig. 3, the state also deﬁned the number of digits to take: 1 for x = 4, 2 for

x = 5 and 0 for x = 6, 7.

However, in example from Section 3.2, state x = 13 is an exception: it is decoded to

s = 1 and x

**= 4. Now if the ﬁrst digit is 1, the state became x = 2 4 + 1 = 9 ∈ I. But
**

if the ﬁrst digit is 0, it became 2 4 = 8 / ∈ I and so we need to take another digit, ﬁnally

getting to x = 16 or 17. We can also see such situation in Fig. 5.

This issue - that decoding from a given state may require k

s

−1 or k

s

digits to transfer,

means that for some x ∈ N we have: bx < l, while bx + b −1 ≥ l. It is possible only if b

does not divide l.

To summarize: if b divides l, decoding requires to transfer a number of digits which

is ﬁxed for every state x. So we can treat these digits (mod(x, b

k

)) as blocks and store

them in forward or in backward order. However, if b does not divide l, the state itself

sometimes does not determine the number of digits: we should ﬁrst take k

s

−1 ﬁrst digits,

and if we are still not in I, add one more. We see that in this case, it is essential that

digits are in the proper order (reversed): from the least to the most signiﬁcant.

Figure 5: Example of stream coding/decoding step for b = 2, k

s

= 3, l

s

= 13, l =

9 4 + 3 8 + 6 = 66, p

s

= 13/66, x = 19, b

ks−1

x + 3 = 79 = 66 + 2 + 2 4 + 3.

3 STREAM VERSION - ENCODING FINITE-STATE AUTOMATON 14

3.5 Stationary probability distribution of states

In Section 3.2 there was mentioned ﬁnding the stationary probability distribution of

states used while encoding (and so also while decoding) - fulﬁlling (12) condition. We

will now motivate that this distribution is approximately Pr(x) ∝ 1/x.

Let us transform I into [0, 1) range (informational content):

y = log

b

(x/l) ∈ [0, 1) (15)

Encoding symbol of probability p

s

increases y by approximately log

b

(1/p

s

). However, to

remain in the range, sometimes we need to transfer some number of digits before. Each

digit transfer reduces x to ¸x/d| and so y by approximately 1. Finally a single step is:

y

s

− → ≈ ¦y + log

b

(1/p

s

)¦

where ¦a¦ = a −¸a| denotes the fractional part.

While neighboring values in arithmetic coding remain close to each other during ad-

dition of succeeding symbols (ranges are further compressed), here we have much more

chaotic behavior, making it more appropriate for cryptographic applications. There are

three sources of its chaosity, visualized in Fig. 6:

• asymmetry: diﬀerent symbols can have diﬀerent probability and so diﬀerent shift,

• ergodicity: log

b

(1/p

s

) is usually irrational, so even a single symbol should lead to

uniform covering of the range,

• diﬀusion: log

b

(1/p

s

) only approximates the exact shift and this approximation varies

with position - leading to pseudorandom diﬀusion around the expected position.

These reasons suggest that probability distribution while encoding is nearly uniform

for y variable: Pr(y ≤ a) ≈ a ∈ [0, 1] , what is conﬁrmed by numerical simulations.

Transforming it back to x variable, probability of using x ∈ I state is approximately

proportional to 1/x

Pr(x) ∝

∼

1/x (16)

It means that the informational content of being in state x is lg(1/ Pr(x)) ≈ lg(x) +

const, what was the initial concept behind ANS.

Figure 6: Three sources of chaotic behavior of y = log

b

(x/l), leading to nearly uniform

distribution on the [0, 1] range.

3 STREAM VERSION - ENCODING FINITE-STATE AUTOMATON 15

Figure 7: Left: ∆H for p = 0.3 uABS and diﬀerent l being multiplicities of 10. The red

line is formula (17). Additionally, the irregular yellow line is situation in these l when we

approximate some irrational number using fractions with denominator l. Right: situation

for ﬁxed l = 100 and p = i/100 for i = 1, 2, .., 99. Discontinuity for i = 50 is because

∆H = 0 there.

3.6 Bound for ∆H

We will now ﬁnd some upper bound for the distance from Shannon entropy ∆H. As

ﬁnding the exact stationary probability is a complex task and this distribution can

be disturbed by eventual correlations, we will use the absolute bound for inaccuracy:

[

s

(x)[ = [x

s

/x −p

s

[ ≤ 1/x for uABS (m/x for rANS) to ﬁnd an independent bound.

Generally, assuming that the probability distribution is indeed ¦p

s

¦, but we pay

lg(1/q

s

) bits per symbol s, the cost of inaccuracy is the Kullback-Leiber distance:

∆H =

s

p

s

lg(1/q

s

) −

s

p

s

lg(1/p

s

) = −

s

p

s

lg

_

1 −

_

1 −

q

s

p

s

__

=

=

s

p

s

ln(2)

_

_

1 −

q

s

p

s

_

+

1

2

_

1 −

q

s

p

s

_

2

+ O

_

_

1 −

q

s

p

s

_

3

__

≈

s

(p

s

−q

s

)

2

p

s

ln(4)

In our case we use lg(x/x

s

) bits to encode symbol s from state x

s

, so (p

s

− q

s

)

2

corresponds to [

s

(x)[

2

< 1/l

2

for x ∈ I = ¦l, .., lb − 1¦. The above sum should also

average over the encoder state, but as we are using a general upper bound for , we can

bound the expected capacity loss of uABS on I = ¦l, .., bl −1¦ range to at most:

∆H ≤

1

l

2

ln(4)

s=0,1

1

p

s

+ O(l

−3

) bits/symbol (17)

From Fig. 7 we see that it is a rough bound, but it reﬂects well the general behavior. For

rANS this bound should be multiplied by m.

3.7 Initial state and direction of encoding/decoding

Encoding and decoding in discussed methods are in reversed direction - what is perfect if

we need a stack for symbols, but often may be an inconvenience. One issue is the need

for storing the ﬁnal state of encoding, which will be required to start decoding - the cost

is usually negligible: a few bits of information for every data block.

3 STREAM VERSION - ENCODING FINITE-STATE AUTOMATON 16

However, if this cost is essential, we can avoid it be encoding information also in this

initial state. For this purpose, instead of starting for example with x = l, we can:

• start with x having a single or more symbols directly stored in its least signiﬁcant

bits, like x = l + s,

• while using formulas, start with x = 1 and encode succeeding symbols until getting

close to I, then use the obtained x as the initial state (can be a bit smaller than l),

• we can do analogously while using tables, but it requires more memory to deﬁne

them also for ¦2, .., l −1¦ range.

Another issue is when probabilities depend on already processed data - in this

case we can encode the data in backward direction, using information available while

later decoding in forward direction. For example for Markov source of s

1

..s

m

sym-

bols, we would encode them in backward direction: from m-th to the 1-st, but the

probability used for s

k

would depend on s

k−1

. For adaptive encoding, we would

need to process the whole data block in forward direction, assigning to each position

probability used while decoding, then encode in backward direction using theses prob-

abilities. Thanks of that, we can later use standard Markov or adaptive forward decoding.

However, if it is really necessary to encode and decode in the same direction, it is

possible to change direction of encoding or decoding, but it is much more costly. We can

see example of a few steps of such process in Fig. 8.

Figure 8: Example of encoding (left) and decoding (right) in reversed direction for encoder

from Section 3.2 - it is much more costly, but still possible. We ﬁnd some output sequence,

such that later correspondingly decoding or encoding in the same direction will produce

the input sequence. While reversed encoding we can start with all possible states, then

in every step remove those not corresponding to given symbol and make a step for every

possible digits. While reversed decoding, here from a single state, we try to encode all

possible symbols from given state (arrow up: a, arrow down: b) and check if the input

bits agree with the least signiﬁcant bits. The written small numbers are after removing

this least signiﬁcant bit. Finally, in both cases we choose a single path. There is always

such a path as we could perform this encoding/decoding in standard direction starting

with any state.

4 TABLED ASYMMETRIC NUMERAL SYSTEMS (TANS) 17

4 Tabled asymmetric numeral systems (tANS)

We will now focus on probably the most signiﬁcant advantage of ANS approach over AC -

thanks to smaller states, storing the entire behavior for large alphabet in relatively small

coding table, getting usually faster coding than for Huﬀman, with accuracy/compression

rate like in arithmetic coding. Instead of splitting a large symbol into binary choices, we

can process it in a simple unconditional loop with a single table use and a few binary

operations. Also instead of slow bit by bit AC renormalization, all bits can now be

transferred at once. The cost is storing the tables: let say 1-16kB for 256 size alphabet.

We can store many of them for various contexts/probability distributions, alphabet sizes

and, as long as they use the same state space I, just switch between them.

Instead of trying to ﬁnd large alphabet analogues of uABS formulas, we will directly

generate tables used for encoding/decoding, like in example from Section 3.2. There

is a large freedom of choice for such a table, as every symbol distribution corresponds

to a diﬀerent coding. The distance from the Shannon limit depends on how well we

approximate the x

s

/x ≈ p

s

relation. We will now focus on doing it in a very accurate way

to get very close to capacity limit. However, this process can be for example disturbed in

pseudorandom way using a cryptographic key to additionally encrypt the message.

4.1 Precise initialization algorithm

Assume we have given l, b and probability distribution of n symbols: 0 < p

1

, .., p

n

< 1,

s

p

s

= 1. For simplicity, let us assume for now that probabilities are deﬁned with 1/l

accuracy:

l

s

:= lp

s

∈ N (18)

For general probabilities, in Section 4.3 we will see that we can ”tune” the coder by shifting

symbol appearances to make it optimal for slightly diﬀerent probability distributions.

Shifting symbol appearances right (to larger x) generally corresponds to increasing its

informational content and so decreasing probability (≈ x

s

/s).

The encoding is deﬁned by distributing symbols in nearly uniform way on the

I = ¦l, .., bl − 1¦ range: (b − 1)l

s

appearances of symbol s. Then for every symbol s we

enumerate its appearances by succeeding numbers from I

s

= ¦l

s

, .., bl

s

−1¦ range, like in

Fig. 9.

Unfortunately, ﬁnding the optimal symbol distribution is not an easy task. We can do

it by checking all symbol distributions and ﬁnding ∆H for each of them - for example for

l

1

= 10, l

2

= 5, l

3

= 2 there are

_

17

2,5

_

= 408408 ways to distribute the symbols and each

of them corresponds to a diﬀered coding. While testing all these possibilities it turns out

that the minimal value of ∆H among them: ∆H ≈ 0.00121 bits/symbol, is obtained by

32 diﬀerent distributions - they are presented in the right hand side part of Fig. 9. They

can be obtained by switching pairs of neighboring nodes on some 5 positions (marked

with thick lines) - bit sequences produced by them diﬀer by corresponding bit ﬂips.

We will now introduce and analyze a heuristic algorithm which directly chooses a

symbol distribution in nearly uniform way. Example of its application is presented

4 TABLED ASYMMETRIC NUMERAL SYSTEMS (TANS) 18

in Fig. 9. In this case it ﬁnds one of the optimal distributions, but it is not always

true - sometimes there are a bit better distributions than the generated one. So if

the capacity is a top priority while designing a coder, it can be reasonable to check

all symbol distributions and generally we should search for better distributing algorithms.

To formulate the heuristic algorithm, let us ﬁrst deﬁne:

N

s

:=

_

1

2p

s

+

i

p

s

: i ∈ N

_

(19)

These n sets are uniformly distributed with required densities, but they get out of N

set - to deﬁne ANS we need to shift them there, like in Fig. 9. Speciﬁcally, to choose

a symbol for the succeeding position, we can take the smallest not used element from

N

1

, .., N

n

sets. This simple algorithm requires a priority queue to retrieve the smallest one

among currently considered n values - if using a heap for this purpose, the computational

cost grows like lg(n) (instead of linearly for greedy search). Beside initialization, this

queue needs two instructions: let put((v, s)) insert pair (v, s) with value v ∈ N

s

pointing

the expected succeeding position for symbol s. The second instruction: getmin removes

and returns pair which is the smallest for (v, s) ≤ (v

, s

) ⇔ v ≤ v

relation.

Figure 9: Left from top: precise algorithm initialization for (10, 5, 2)/17 probability dis-

tribution, corresponding encoding table and stationary probability distributions. Right:

among

_

17

2,5

_

possibilities to choose symbol distribution here, the minimal ∆H turns out

to be obtained by 32 of them - presented in graphical form. The sixth from the top is the

one generated by our algorithm. These 32 possibilities turn out to diﬀer by switching two

neighboring positions in 5 locations (2

5

= 32) - these pairs are marked by thick lines. We

can see that these 5 positions correspond to the largest deviations from ∝ 1/x expected

behavior of stationary probability distribution.

4 TABLED ASYMMETRIC NUMERAL SYSTEMS (TANS) 19

Finally the algorithm is:

Precise initialization of decoding or encoding function/table

For s = 0 to n −1 do {put( (0.5/p

s

, s) ); x

s

= l

s

};

For x = l to bl −1 do

{(v, s)=getmin; put((v + 1/p

s

, s));

D[x] = (s, x

s

) or C[s, x

s

] = x

x

s

++}

There has remained a question which symbol should be chosen if two of them have

the same position. Experiments suggest to choose the least probable symbol among them

ﬁrst, like in example in Fig. 9 (it can be realized for example by adding some small

values to N

s

). However, there can be found examples when opposite approach: the most

probable ﬁrst, brings a bit smaller ∆H.

4.2 Inaccuracy bound

Let us now ﬁnd some bound for inaccuracy

s

(x) = x

s

/x −p

s

for this algorithm to get an

upper bound for ∆H. Observe that the symbol sequence it produces has period l, so for

simplicity we can imagine that it starts with x = 0 instead of x = l.

From deﬁnition, #(N

s

∩ [0, x − 1/(2p

s

)]) = ¸xp

s

|. As ¸xp

s

| ≤ xp

s

≤ ¸xp

s

+ 1| and

s

p

s

= 1, we have also

s

¸xp

s

| ≤ x ≤

s

¸xp

s

+1|. Deﬁning p = min

s

p

s

, we can check

that in [0, x − 1/(2p)] range there is at most x symbols in all N

s

and in [0, x + 1/(2p)]

range there is at least x symbols:

s

#

_

N

s

∩

_

0, x −

1

2p

__

≤

s

#

_

N

s

∩

_

0, x −

1

2p

s

__

=

s

¸xp

s

| ≤ x

x ≤

s

¸xp

s

+ 1| =

s

#

_

N

s

∩

_

0, x +

1

2p

s

__

≤

s

#

_

N

s

∩

_

0, x +

1

2p

__

Figure 10: Left: numerical values and comparison with the (21) formula for (0.1, 0.4, 0.5)

probability distribution and l being a multiplicity of 10. Right: comparison for larger

alphabet: of size m. The probability distribution is (1, 2, .., m)/l, where l =

m

i=1

i =

m(m + 1)/2.

4 TABLED ASYMMETRIC NUMERAL SYSTEMS (TANS) 20

So x-th symbol found by the algorithm had to be chosen from [x −1/(2p), x + 1/(2p)] ∩

s

N

s

. From deﬁnition, the x

s

-th value of N

s

is 1/(2p

s

) +x

s

/p

s

. If this value corresponds

to x, it had to be in the [x −1/(2p), x + 1/(2p)] range:

¸

¸

¸

¸

1

2p

s

+

x

s

p

s

−x

¸

¸

¸

¸

≤

1

2p

⇒

¸

¸

¸

¸

1

2x

+

x

s

x

−p

s

¸

¸

¸

¸

≤

p

s

2xp

[

s

(x)[ =

¸

¸

¸

x

s

x

−p

s

¸

¸

¸ ≤

_

p

s

2p

+

1

2

_

/x (20)

Finally in analogy to (17) we get:

∆H ≤

1

l

2

ln(4)

s

1

p

s

_

p

s

2 min

s

p

s

+

1

2

_

2

+ O(l

−3

) bits/symbol (21)

From Fig. 10 we can see that it is a very rough bound. Figure 11 shows results of

simulations for practical application - tANS behavior for random probability distributions

while using the number of states being a small multiplicity of alphabet size. While the

behavior is quite complex, general suggestion is that using k more states than alphabet

Figure 11: ∆H for diﬀerent alphabet size. The three graphs correspond to using the

number of states (l, b = 2) being 2, 4 and 8 times larger than alphabet size (n = 2, .., 256).

For every 3 255 case there was taken 100 random distributions generated by procedure:

start with l

s

= 1 for all symbols, then repeat l −n times: increase l

s

for a random s by 1.

The yellow lines mark powers of 2 - there is some self-similarity relation there, probably

caused by using b = 2.

4 TABLED ASYMMETRIC NUMERAL SYSTEMS (TANS) 21

size, ∆H ≈ 0.5/k

2

. So if we want to work in ∆H ≈ 0.01 bits/symbol regime, we should

use k = 2 or 4, for ∆H ≈ 0.001 bits/symbol it should be rather 8 or 16.

These graphs do not take into consideration the fact that lp

s

usually are not natural

numbers - that there is some additional approximation needed here. However, from Fig. 7

we can see that such approximation can work both ways. We will now see that additionally

we can tune symbol distribution to design coders with properly shifted optimal probability

distribution.

4.3 Quasi ABS (qABS) and tuning

Let us now change the way of thinking for a moment: instead of focusing on symbol

distribution for probabilities approximated as l

s

/l, let us test ∆H for all probability

distributions and diﬀerent ﬁxed coders (symbol distributions) - some result can be seen

in Figure 12. Surprisingly, for 5 state automata, all symbol distributions turn out to

be useful in some narrow range of probabilities - choosing between such 5 automata, we

can work in ∆H ≈ 0.01 bits/symbol regime. For ∆H ≈ 0.001 bits/symbol, we should

choose approximately 20 of shown 43 automata for 16 states - they have deeper single

∆H minimum, but also narrower.

This kind of approach is very similar to Howard-Vitter quasi arithmetic coding -

Example 10 from [5] presents analogous switching for 6 state automata. Calculating

∆H for i.i.d. static sources of diﬀerent probabilities, it turns out that we obtain exactly

the same graph as the 5 state case from Fig. 12 - this time ANS requires 1 state less.

This agreement and further comparison of both approaches requires further investigation.

However, the fact that ANS uses a single number state, while AC needs 2 number state

Figure 12: Left: symbol distributions and ∆H for diﬀerent symbol probabilities and all

4 state automata with more ”0” symbol appearances than ”1”. Right: analogously for

16 states, but this time presenting only a single representant for each curve which is the

lowest one for some p - there are 43 of them.

4 TABLED ASYMMETRIC NUMERAL SYSTEMS (TANS) 22

for similar precision, suggests that the advantage of using ANS will grow.

Symbol distributions in Fig. 12 suggest that shifting symbol appearances toward

lower x generally shifts the optimal probability right. It can be understood that lowering

x here means increase of x

s

/x, which is the probability we approximate. We could try

to correspondingly modify the precise initialization this way: tune it when p

s

,= l

s

/l,

for example modifying the initial values of N

s

, like using real 1/(2p

s

) there instead of

approximated by l

s

/l. However, the general behavior is very complex and understanding

it requires further work. For speciﬁc purposes we can ﬁnd optimized symbol distributions,

for example by exhaustive search like in Fig. 9, and store them inside the compressor.

The fact that (000..01) symbol distribution corresponds to coder with the lowest op-

timal probability (left-most curves in Fig. 12), suggests how to build cheap and simple

coders for diﬃcult to optimally handle situation: when p ≈ 0. Automaton for this kind

of distribution is: the more probable symbol (1 −p) increases state by one (X ++), until

getting to the 2l − 2 state, from which (and state 2l − 1) it goes to the ﬁrst state (l),

producing a single bit. The less probable symbol (p) always gets to the last state (2l −1),

producing all but the most signiﬁcant bit of the current state. It produces approximately

lg(l) bits, plus 1 for the succeeding step. Comparing this number to lg(1/p), we obtain

that we should use l ≈ 0.5/p.

4.4 Combining tANS with encryption

We will now brieﬂy discuss using tANS to simultaneously encrypt the data or as a part

of a cryptosystem. While standard cryptography usually operates on constant length bit

blocks, encryption based on entropy coder has advantage of using blocks which lengths

vary in a pseudorandom way. The inability to directly divide the bit sequence into blocks

used in single steps makes cryptoanalysis much more diﬃcult.

As it was mentioned in Section 3.5, the behavior of the ANS state is chaotic - if someone

does not know the exact decoding tables, he would quickly loose the information about the

current state. While decoding from some two neighboring states, they often correspond

to a diﬀerent symbol and so their further behavior will be very diﬀerent (asymmetry). In

comparison, in arithmetic coding nearby values are compressed further - remain nearby.

The crucial property making ANS perfect for cryptographic applications is that, in

contrast to arithmetic coding, it has huge freedom of choice for the exact coding - every

symbol distribution deﬁnes a diﬀerent encoding. We can use a pseudorandom number

generator (PRNG) initialized with a cryptographic key to choose the exact distribution,

for example by disturbing the precise initialization algorithm. One way could be instead

of choosing the pair with the smallest v by getmin operation, use the PRNG to choose

between s which would be originally chosen and the second one in this order. This way

we would get a bit worse accuracy and so ∆H, but we have 2

l(b−1)

diﬀerent possibilities

among which we choose the coding accordingly to the cryptographic key.

This philosophy of encryption uses the cryptographic key to generate a table which

will be later used for coding. Observe that if we would make the generation of this

table more computationally demanding, such approach would be more resistant to

brute force attacks. Speciﬁcally, in standard cryptography the attacker can just start

decryption with succeeding key to test if given key is the correct one. If we enforce some

5 CONCLUSIONS 23

computationally demanding task to generate decoding tables (requiring for example 1ms

of calculations), while decoding itself can be faster as we just use the table, testing

large number of cryptographic keys becomes much more computationally demanding -

this encryption philosophy can be made more resistant to unavoidable: brute force attacks.

The safeness of using ANS based cryptography depends on the way we would use it

and the concrete scenario. Having access to both input and output of discussed automata,

one usually could deduce the used coding tables - there would be needed some additional

protection to prevent that. However, if there was a secure PRNG used to generate them,

obtaining the tables does not compromise the key. So initializing PRNG with the cryp-

tographic key and additionally some number (e.g. of data block), would allow to choose

many independent encodings for this key.

Beside combining entropy coding with encryption, ANS can be also seen as a cheap

and simple building block for stronger cryptographic systems, like a replacement for the

memoryless S-box. Example of such cryptography can be: use the PRNG to choose an

intermediate symbol probability distribution and two ANS coders for this distribution.

Encoding of a bit sequence would be using the ﬁrst encoder to produce a symbol in this

distribution, and immediately apply the second decoder to get a new bit sequence.

5 Conclusions

There was presented simpler alternative to arithmetic coding: using a single natural

number as the state, instead of two to represent a range. Its multiple ways of use give

alternatives for diﬀerent variants of AC. However, there are some advantages thanks to

this simplicity:

• instead of complex AC renormalization procedure, we can just transfer bit blocks

of known size (once per symbol),

• there appears additional variant (tANS): putting coder for large alphabet into a

table, combining speed of Huﬀman coding with compression rate of AC,

• the huge freedom while choosing the exact encoder and chaotic state behavior makes

it also perfect to simultaneously encrypt the data or as an inexpensive nonlinear

building block of cryptosytems.

The disadvantages are:

• tANS requires building and storing coding tables: about 1-16kB for 256 size alpha-

bet,

• the decoding is in opposite direction to encoding, what requires storing the ﬁnal

state and may be an inconvenience, especially while adaptive applications.

While maintaining the state space (I), we can freely change between diﬀerent probability

distributions, alphabets or even ABS/ANS variants. For example for complex tasks like

video compression, let say the used ﬁrst 32 coding tables can be for quantized diﬀerent

probabilities for binary alphabet, while succeeding tables would be for static probability

REFERENCES 24

distributions on larger alphabets for various speciﬁc contexts, like for quantized DCT

coeﬃcients. The ﬁnal memory cost would be a few kilobytes times the number of

tables/contexts.

There has remained many research question regarding this novel approach. Especially

how to eﬀectively generate good tANS coding tables for given probability distribution

(including tuning). We should also add correlated sources to considerations and generally

improve theoretical analysis, understanding of behavior. Another research topic regards

cryptographic capabilities of this new approach - to understand how to use it, combine

with other methods to obtain required level of cryptographic security.

Finally we should understand fundamental questions regarding low state entropy cod-

ing automata - is this method optimal, are there maybe more convenient families?

References

[1] Y. Collet, Finite State Entropy, https://github.com/Cyan4973/FiniteStateEntropy ,

[2] J. Duda, Optimal encoding on discrete lattice with translational invariant constrains

using statistical algorithms, arXiv:0710.3861,

[3] J. Duda, Asymmetric numerical systems, arXiv:0902.0271,

[4] J. Duda, Data Compression Using Asymmetric Numeral Systems, Wolfram Demon-

stration Project,

http://demonstrations.wolfram.com/author.html?author=Jarek+Duda ,

[5] P. G. Howard, J. S. Vitter, Practical Implementations of Arithmetic Coding, Image

and Text Compression, 1992, 85-112,

[6] D.A. Huﬀman, A Method for the Construction of Minimum-Redundancy Codes, Pro-

ceedings of the I.R.E., September 1952, pp 1098-1102,

[7] M. Mahoney, Data Compression Programs website, http://mattmahoney.net/dc/ ,

[8] D. Marpe, H. Schwarz, T. Wiegand, Context-Based Adaptive Binary Arithmetic Cod-

ing in the H.264/AVC Video Compression Standard, IEEE Transactions on Circuits

and Systems for Video Technology, Vol. 13, No. 7, pp. 620-636, July 2003,

[9] G.N.N. Martin, Range encoding: an algorithm for removing redundancy from a digi-

tized message, Video and Data Recording Conf., Southampton, England, July 1979,

[10] J.J. Rissanen, Generalized Kraft inequality and arithmetic coding, IBM J. Res. De-

velop., vol. 20, no. 3, pp. 198-203, May 1976,

[11] W. Szpankowski, Asymptotic Average Redundancy of Huﬀman (and Other) Block

Codes, IEEE Trans. Information Theory, 46 (7) (2000) 2434-2443.

- ScienceWithoutBounds(1)
- Fact Sheet Mercury
- Vaccine Tricks Tips
- 52
- Relatório Nacional
- 0102GS3 ConfigurandooLibO Ptbr(1)
- Autores Rudolf Steiner Teosofia
- taioba
- Em-defesa-do-PT-web
- Guia Direito Saude
- ch01_we1.pdf
- OpenMP4.0.0
- Chrono Trigger - Flames of Eternity Mini Guide
- Qaddafi Green Book
- Danza Final Kali
- Xldb2012 Wed 1400 MichaelHouston
- The Lost Age of High Knowledge Evidence of an Advanced Civilisation Prior to Recorded Histo
- Henri Durville - A Ciência Secreta - Vol IV
- Esoteric Prini Ciple s
- 56581718 Martin Lings the Underlying Religion an Introduction to the Perennial Philosophy
- Ensinamentos Herméticos (Charles Vega Parucker).pdf
- Clueless
- Fereydoon Batmanghelidj - Your Bodys Many Cries for Water Eng
- Yale
- GuideHealthyLiving English

- Fast Fourier Transform Algorithm Parallel Processing
- 2009_jou.pdf
- 3D Seismic Modeling in a Message Passing Environment
- CS6235 ProjectPRoposal Revised
- Number Systems
- A 10 Minute Tutorial for Solving Math Problems With Maxima
- Chapter 3 Scan Conversion
- Vector Finite Elements and C++
- chapt1
- Capacity approaching LDPC Codes
- AES1
- Interpolation of Contrast
- Construction of Binary LDPC Convolutional Codes
- Review Test C
- Image Compression Coding Schemes
- 3730-8960-1-PB
- Control System Lab 1a
- BCA Computer Concept & C Programming Assignment-2
- Mid Term Exam 2005 Solutions
- Matlab Bible
- Galois Field
- Riemann Zeta Function and History
- Codes and Curves
- Compiling P Arallel Sparse Co de for UserDefined Data
- L1-MatrixTutorial
- Chapter 2 Csa Summary
- Day 16_17 &wewewewewe 18
- Lecture 4
- Gamma
- C Programming.docx

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd