Arithmetic Supplementary

CMPT 365 Multimedia Systems
Arithmetic Coding
Additional Material
Spring 2015
CMPT365 Multimedia Systems 1

Outline – Part I
 Introduction
 Basic Encoding and Decoding
 Scaling and Incremental Coding
 Integer Implementation
 Adaptive Arithmetic Coding
 Binary Arithmetic Coding
 Applications
 JBIG, H.264, JPEG 2000

Limitations of Huffman Code
 Need a probability distribution
 Hard to adapt to changing statistics
 Minimum codeword length is 1 bit

 Serious penalty for high-probability symbols
 Example: Binary source, P(0)=0.9
• Entropy: -0.9*log2(0.9)-0.1*log2(0.1) = 0.469 bit
• Huffman code: 0, 1  Avg. code length: 1 bit
• Joint coding is not practical for large alphabet.
 Arithmetic coding:
 Can resolve all of these problems.
 Code a sequence of symbols without having to generate codes for all
sequences of that length.

 Recall table look-up decoding of Huffman code
Introduction
 N: alphabet size
 L: Max codeword length
 Divide [0, 2^L] into N intervals
 One interval for one symbol
 Interval size is roughly
proportional to symbol prob.
1
00
010 011
000 010 011 100

 Arithmetic coding applies this idea recursively
 Normalizes the range [0, 2^L] to [0, 1].
 Map a sequence to a unique tag in [0, 1).
abcd…..
dcba….. 0 1
0 1
Arithmetic Coding a b c
 Disjoint and complete partition of the range [0, 1)
[0, 0.8), [0.8, 0.82), [0.82, 1)
 Each interval corresponds to one symbol
 Interval size is proportional to symbol probability
 The first symbol restricts the tag

position to be in one of the intervals 0 1
 The reduced interval is partitioned

recursively as more symbols are 0 1
processed.
0 1
 Observation: once the tag falls into an interval, it never gets out
of it

Some Questions to think about:
 Why compression is achieved this way?

 How to implement it efficiently?
 How to decode the sequence?
 Why is it better than Huffman code?

Example:
1 2 3
Symbol Prob.
1 0.8
0 0.8 0.82 1.0
2 0.02  Map to real line range [0, 1)
 Order does not matter
3 0.18  Decoder need to use the same order
 Disjoint but complete partition:

 1: [0, 0.8): 0, 0.799999…9
 2: [0.8, 0.82): 0.8, 0.819999…9
 3: [0.82, 1): 0.82, 0.999999…9
 (Think about the impact to integer
implementation)

Encoding  Input sequence: “1321”
1 2 3
Range 1
0 0.8 0.82 1.0
1 2 3
Range 0.8
0 0.64 0.656 0.8
1 2 3
Range 0.144
0.656 0.7712 0.77408 0.8
1 2 3
Range 0.00288
0.7712 0.773504 0.7735616 0.77408
Final range: [0.7712, 0.773504): Encode 0.7712
Difficulties: 1. Shrinking of interval requires high precision for long sequence.
2. No output is generated until the entire sequence has been processed.
Cumulative Density Function (CDF)
 For continuous distribution: Probability Mass Function
0.4
x
0.2 0.2 0.2
FX ( x)  P( X  x)   p( x)dx

1 2 3 4 X
 For discrete distribution:
i
1.0
FX (i )  P ( X  i )   P( X  k ) CDF 0.8
k 1
 Properties: 0.4
 Non-decreasing


Piece-wise constant
Each segment is closed at the lower end.
0.2
X
1 2 3 4

Encoder Pseudo Code
low=0.0, high=1.0;
 Keep track of LOW,
while (not EOF) {
HIGH, RANGE n = ReadSymbol();
 Any two are sufficient, RANGE = HIGH - LOW;
e.g., LOW and RANGE. HIGH = LOW + RANGE * CDF(n);
LOW = LOW + RANGE * CDF(n-1);
}
output LOW;
Input HIGH LOW RANGE
Initial 1.0 0.0 1.0
1 0.0+1.0*0.8=0.8 0.0+1.0*0 = 0.0 0.8
3 0.0 + 0.8*1=0.8 0.0 + 0.8*0.82=0.656 0.144
2 0.656+0.144*0.82=0.77408 0.656+0.144*0.8=0.7712 0.00288
1 0.7712+0.00288*0=0.7712 0.7712+0.00288*0.8=0.773504 0.002304

Decoding Receive 0.7712
1 2 3
Decode 1
0 0.8 0.82 1.0
1 2 3
Decode 3
0 0.64 0.656 0.8
1 2 3
Decode 2
0.656 0.7712 0.77408 0.8
1 2 3
Decode 1
0.7712 0.773504 0.7735616 0.77408
Drawback: need to recalculate all thresholds each time.

Simplified Decoding
x  low
 Normalize RANGE to [0, 1) each time
x
 No need to recalculate the thresholds.
range
Receive 0.7712 1 2 3
Decode 1
x =(0.7712-0) / 0.8 0 0.8 0.82 1.0

= 0.964
1 2 3
Decode 3
0 0.8 0.82 1.0

x =(0.964-0.82) / 0.18
= 0.8 1 2 3
Decode 2
x =(0.8-0.8) / 0.02 0 0.8 0.82 1.0
=0
Decode 1
1 2 3
0 0.8 0.82 1.0

Decoder Pseudo Code
Low = 0; high = 1;
x = Encoded_number
While (x ≠ low) {
n = DecodeOneSymbol(x);
output symbol n;
x = (x - CDF(n-1)) / (CDF(n) - CDF(n-1));
};

Outline
 Introduction
 Basic Encoding and Decoding
 Applications
 JBIG, H.264, JPEG 2000

Scaling and Incremental Coding
 Problems of Previous examples:
 Need high precision
 No output is generated until the entire sequence is encoded
 Key Observation:
 As the RANGE reduces, many MSB’s of LOW and HIGH become identical:
• Example: Binary form of 0.7712 and 0.773504:
0.1100010.., 0.1100011..,
 We can output identical MSB’s and re-scale the rest:
  Incremental encoding
 This also allows us to achieve infinite precision with finite-precision integers.

E1 and E2 Scaling
 E1: [LOW HIGH) in [0, 0.5)
0 0.5 1.0
 LOW: 0.0xxxxxxx (binary),
 HIGH: 0.0xxxxxxx.
0 0.5 1.0
 Output 0, then shift left by 1 bit
 [0, 0.5) [0, 1): E1(x) = 2 x
 E2: [LOW HIGH) in [0.5, 1)

0 0.5 1.0
 LOW: 0.1xxxxxxx,
 Output 1, subtract 0.5, 0 0.5 1.0

shift left by 1 bit
 [0.5, 1) [0, 1): E2(x) = 2(x - 0.5)

Encoding with E1 and E2
Symbol Prob.
1 0.8
Input 1 2 0.02
0 0.8 1.0 3 0.18
Input 3
0 0.656 E2: Output 1
0.8 2(x – 0.5)
Input 2
0.312 0.5424 0.54816 0.6 E2: Output 1
0.0848 0.09632
E1: 2x, Output 0
0.1696 0.19264 E1: Output 0
0.3392 0.38528 E1: Output 0
0.6784 0.77056 E2: Output 1

Input 1
Encode any value
0.3568 0.54112 in the tag, e.g., 0.5
Output 1
0.3568 0.504256 All outputs: 1100011
To verify
 LOW = 0.5424 (0.10001010... in binary),

HIGH = 0.54816 (0.10001100... in binary).
 So we can send out 10001 (0.53125)
 Equivalent to E2E1E1E1E2
 After left shift by 5 bits:
 LOW = (0.5424 – 0.53125) x 32 = 0.3568
 HIGH = (0.54816 – 0.53125) x 32 = 0.54112
 Same as the result in the last page.

 Note: Complete all possible scaling before Symbol Prob.
1 0.8
encoding the next symbol
2 0.02
3 0.18
Comparison with Huffman

 Input Symbol 1 does not cause any output
 Input Symbol 3 generates 1 bit
 Input Symbol 2 generates 5 bits
 Symbols with larger probabilities generates less number of bits.

 Sometimes no bit is generated at all
 Advantage over Huffman coding
 Large probabilities are desired in arithmetic coding
 Can use context-adaptive method to create larger probability
and to improve compression ratio.

Input 1100011
Incremental Decoding Decode 1: Need ≥ 5 bits
(verify)
0 0.8 1.0 Read 6 bits:
Tag: 110001, 0.765625
0 0.656 Decode 3, E2 scaling
0.8 Tag: 100011 (0.546875)
0.312 0.5424 0.54816 0.6 Decode 2, E2 scaling
Tag: 000110 (0.09375)
0.0848 0.09632
E1: Tag: 001100 (0.1875)
0.1696 0.19264 E1: Tag: 011000 (0.375)
0.3392 0.38528 E1: Tag: 110000 (0.75)
0.6784 0.77056 E2: Tag: 100000 (0.5)
0.3568 0.54112 Decode 1

 Summary: Complete all possible scaling before further decoding
Adjust LOW, HIGH and Tag together.
Summary – Part I
 Introduction
 Encoding and Decoding
 E1, E2
 Next:
 Integer Implementation
• E3 scaling
 Adaptive Arithmetic Coding
 Binary Arithmetic Coding
 Applications
• JBIG, H.264, JPEG 2000

Outline – Part II
 Review
 Integer representation
 E3 Scaling
 Minimum word length


Encoding Without Scaling
1 2 3
Range 1
0 0.8 0.82 1.0
1 2 3
Range 0.8
0 0.64 0.656 0.8
1 2 3
Range 0.144
0.656 0.7712 0.77408 0.8
1 2 3
Range 0.00288
0.7712 0.773504 0.7735616 0.77408
 Input sequence: “1321”
Final range: [0.7712, 0.773504): Encode 0.7712

E1 and E2 Scaling
 E1: [LOW HIGH) in [0, 0.5)
0 0.5 1.0
 LOW: 0.0xxxxxxx (binary),
0 0.5 1.0
 Output 0, then shift left by 1 bit
 [0, 0.5) [0, 1): E1(x) = 2 x
 E2: [LOW HIGH) in [0.5, 1)

0 0.5 1.0
 LOW: 0.1xxxxxxx,
 Output 1, subtract 0.5, 0 0.5 1.0

shift left by 1 bit
 [0.5, 1) [0, 1): E2(x) = 2(x - 0.5)

Encoding with E1 and E2
Symbol Prob.
1 0.8
Input 1 2 0.02
0 0.8 1.0 3 0.18
Input 3
0 0.656 E2: Output 1
0.8 2(x – 0.5)
Input 2
0.312 0.5424 0.54816 0.6 E2: Output 1
0.0848 0.09632
E1: 2x, Output 0
0.1696 0.19264 E1: Output 0
0.3392 0.38528 E1: Output 0
0.6784 0.77056 E2: Output 1

Input 1
Encode any value
0.3568 0.54112 in the tag, e.g., 0.5
Output 1
0.3568 0.504256 All outputs: 1100011
To verify
 LOW = 0.5424 (0.10001010... in binary),

HIGH = 0.54816 (0.10001100... in binary).
 So we can send out 10001 (0.53125)
 Equivalent to E2E1E1E1E2
 After left shift by 5 bits:
 LOW = (0.5424 – 0.53125) x 32 = 0.3568
 HIGH = (0.54816 – 0.53125) x 32 = 0.54112
 Same as the result in the last page.

Encoding Pseudo Code with E1, E2
(For floating-point implementation) Symbol Prob. CDF
1 0.8 0.8
2 0.02 0.82
EncodeSymbol(n) { 3 0.18 1
//Update variables
RANGE = HIGH - LOW;
HIGH = LOW + RANGE * CDF(n);
//keep scaling before encoding next symbol

while LOW, HIGH in [0, 0.5) or [0.5, 1) {
send 0 for E1 and 1 for E2
scale LOW, HIGH
}
}

Input 1100011
Incremental Decoding Decode 1: Need ≥ 5 bits
(verify)
0 0.8 1.0 Read 6 bits:
Tag: 110001, 0.765625
0.8 Tag: 100011 (0.546875)
0.312 0.5424 0.54816 0.6 Decode 2, E2 scaling
Tag: 000110 (0.09375)
0.0848 0.09632
E1: Tag: 001100 (0.1875)
0.1696 0.19264 E1: Tag: 011000 (0.375)
0.3392 0.38528 E1: Tag: 110000 (0.75)
0.6784 0.77056 E2: Tag: 100000 (0.5)
0.3568 0.54112 Decode 1

 Summary: Complete all possible scaling before further decoding
Adjust LOW, HIGH and Tag together.
Decoding Pseudo Code with E1, E2
(For floating-point implementation)
DecodeSymbol(Tag) {
RANGE = HIGH - LOW;
n = 1;
While ( (tag - LOW) / RANGE >= CDF(n) ) {
n++;
} Symbol Prob. CDF
1 0.8 0.8
HIGH = LOW + RANGE * CDF(n); 2 0.02 0.82
3 0.18 1
//keep scaling before decoding next symbol
while LOW, HIGH in [0, 0.5) or [0.5, 1) {
scale LOW, HIGH by E1 or E2 rule
Left shift Tag and read one more bit to LSB
}
return n;
}

Outline
 Review
 E3 Scaling
 Complete Algorithm

 Applications
 JBIG, H.264, JPEG 2000

Integer Implementation
 Old formulas:
HIGH  LOW + RANGE * CDF(n);
LOW  LOW + RANGE * CDF(n-1);
 Integer approximation of CDF ( ):
 The number of occurrence of each symbol is usually collected by a counter.
 Allow adaptive arithmetic coding
1 2 3
k P(k) nk Cum(k)
0 40 41 50 0 - - 0
i 1 0.8 40 40
n k
Cum(i )
2 0.02 1 41
CDF FX (i )  k 1
 3 0.18 9 50
N N
Integer Implementation
HIGH  LOW + RANGE * CDF(n);
LOW  LOW + RANGE * CDF(n-1);
RANGE  HIGH  LOW  1
HIGH  LOW  RANGE  Cum(n) / N   1
LOW  LOW  RANGE  Cum(n  1) / N 
 Why + 1 in RANGE and – 1 in HIGH?
 HIGH should be less than the LOW of the next interval
 The best integer value is HIGH = (next LOW) – 1
 [HIGH, HIGH + 1) still belongs to the current interval,
although we could not represent it explicitly.
0 n-1 n
LOW1 HIGH1 LOW2

k P(k) Nk Cum(k)
Example 0 - - 0
RANGE  HIGH  LOW  1 1 0.8 40 40

2 0.02 1 41
3 0.18 9 50
 For 8-bit integers, initial LOW = 0, HIGH = 255  RANGE=256
 256  40   256  0 
If n = 1: HIGH  0     1  203 LOW  0    0
 50   50 
 256  41  256  40 
If n = 2: HIGH  0     1  208 LOW  0     204
 50   50 
 256  50   256  41
If n = 3: HIGH  0     1  255 LOW  0     209
 50   50 
1 2 3
0 203 208 255

E1 Scaling for Integer
 E1 Scaling: [0, 0.5)  [0, 1), E1(x) = 2 x.
 LOW = 0xxxxxxx, HIGH =0xxxxxxx
 Output the MSB value 0, then shift left by 1 bit
 Important trick: Shift in 1 to HIGH and 0 to LOW

 HIGH: 0xxxxxxx  xxxxxxx1 HIGH = (HIGH << 1) + 1;
 LOW: 0xxxxxxx  xxxxxxx0 LOW = LOW << 1;
 Always assume HIGH ends with infinite number of 1’s:

 So that it approximates the LOW of the next interval.
HIGH 0 . 0 x x x x x x x 1 1 1 ...
LOW 0 . 0 x x x x x x x 0 0 0 ...
 This also ensures the RANGE is doubled after scaling:
 HIGH – LOW + 1  (2 x HIGH + 1 – 2 x LOW + 1) = 2(HIGH – LOW + 1)

E2 Scaling for Integer
 E2 Scaling: [0.5, 1)  [0, 1), E2(x) = 2 (x - 0.5)
 LOW = 1xxxxxxx, HIGH =1xxxxxxx
 Output the MSB, then shift left by 1 bit (mul by 2)
 Same trick: Shift in 1 to HIGH and 0 to LOW

 HIGH: 1xxxxxxx  xxxxxxx1 HIGH = (HIGH << 1) + 1;
 LOW: 1xxxxxxx  xxxxxxx0 LOW = LOW << 1;
0 . 1 x x x x x x x 1 1 1 ...
0 . 1 x x x x x x x 0 0 0 ...

Integer Encoding
[0, 0.8)  LOW = 0, HIGH = 203.
[0.8, 0.82)  LOW = 204, HIGH = 208.
Can we represent an interval in [203, 204) ? (Sequence 1333333……)
Input 1
0 203 255
Input 3
0 167 203 E2: Output 1
78 151
LOW: 167 (10100111) HIGH: 203 (11001011)
After E2: (shift in an 1 to HIGH and 0 to LOW)

LOW: 1(01001110) 78 (8-bit) HIGH: 1(10010111) 151 (8-bit)
In 8.1 format (8 bits for integer, 1 bit for fractional):

LOW: (10100111.0) 167 HIGH: (11001011.1) 203.5
By shifting in an 1 to HIGH, we can cover the range [203, 203.5].

The entire range [203, 204) can be covered by always shifting in 1 to HIGH.
Outline
 Review
 E3 Scaling


E3 Scaling: [0.25, 0.75)[0, 1)
 If RANGE straddles 1/2, E1 and E2 cannot be applied,
but the range can be quite small
 Example: LOW=0.4999, HIGH=0.5001
Binary: LOW=0.01111…., HIGH=0.1000…
 We may not have enough bits to represent the interval.
0.25 0.5 0.75

0 1
 E3 Scaling:
[0.25, 0.75) [0, 1):
E3(x) = 2(x - 0.25) 0 0.5 1

Integer Implementation of E3
 Same trick: Shift in 1 to HIGH and 0 to LOW
 HIGH = ((HIGH – QUARTER) << 1) + 1;
 LOW = (LOW - QUARTER) << 1;
 QUARTER = 2^(M - 2) for m-bit integer. (64 for m = 8 bits)
LOW: HIGH:
01xxxxxx 10xxxxxx
- 01000000 - 01000000
00xxxxxx 01xxxxxx
× 2 × 2
0xxxxxx0 1xxxxxx0
LOW 01xxxxxx  0xxxxxx0 + 1
HIGH 10xxxxxx  1xxxxxx1 1xxxxxx1
Another way to implement E3 (Sayood book pp. 99):
 Left shift old LOW and HIGH, complement new MSB.

Signaling of E3
 What should we send when E3 is used?
 Recall: we send 1 if E2 is used, send 0 if E1 is used
 Important relationships: (www.copro.org/download/ac_en.pdf)
E1  E3   E2   E1
n n
E2  E3   E1   E2
n n
Ei  E j 
n
Apply n Ej scalings, followed by an Ei scaling.
 What do they mean?

 A series of E3 followed by an E1 is equivalent to an E1 followed by a series of E2.
 A series of E3 followed by an E2 is equivalent to an E2 followed by a series of E1.

Example  Previous example without E3:
Input 1
0 0.8 1.0
Input 3
0 0.656
0.8
Input 2
0.312 0.5424 0.54816 0.6 E2: Output 1
0.0848 0.09632
E1: Output 0
0.1696 0.19264
 With E3:
0.312 0.6
0.124 0.5848 0.59632 0.7 E3: (x-0.25)x2

Input 2
E2: Output 1
0.1696 0.19264
 The range after E2°E3 is the same as that after E1°E2
Another View of the Equivalence
 Scaling of a range in [0.5, 0.75) with E1°E2
0 0.25 0.5 0.75 1
E2
0 0.25 0.5 0.75 1
0 0.25 0.5 0.75 1 E1
 Equivalent scaling of the range in [0.5, 0.75) with E2°E3
0 0.25 0.5 0.75 1
E3
0 0.25 0.5 0.75 1
0 0.25 0.5 0.75 1 E2

A Simple Proof of E2°E3 = E1°E2
 Given an original range r:
 After applying E2: [0.5, 1) [0, 1), the range becomes
r1 = (r – 0.5) x 2
 After applying E1: [0, 0.5)  [0, 1), the range becomes
r2 = r1 x 2 = ((r – 0.5) x 2) x 2
 Given the same range r:

 After applying E3: [0.25, 0.75)  [0, 1), the range becomes
r3 = (r – 0.25) x 2
 After applying E2, the range becomes
r4 = (r3 – 0.5) x 2 = ((r – 0.25) x 2 – 0.5) x 2
= (r – 0.5) x 2 x 2
= r2
For formal proof: www.copro.org/download/ac_en.pdf

Encoding Operation with E3
 Without E3:
Input 2
0.312 0.5424 0.54816 0.6 E2: Output 1
0.0848 0.09632
E1: Output 0
0.1696 0.19264
 With E3:
0.312 0.6
0.124 0.5848 0.59632 0.7 E3

Input 2 (no output here)
E2: Output 1
Output 0 here!
0.1696 0.19264
 Don’t send anything when E3 is used, but send a 0 after E2:
 The bit stream is identical to that of the old method
 Subsequent encoding is also same because of the same final interval

Input 1100011
Decoding for E3 Read 6 bits:
Tag: 110001 (0.765625)
0 0.8 1.0 Decode 1

0.8 Tag: 100011 (0.546875)
0.312 0.5424 0.54816 0.6 Decode 2, E2 scaling
Tag: 000110 (0.09375)
0.0848 0.09632
E1: Tag: 001100 (0.1875)
0.1696 0.19264
 With E3: Apply E3 whenever it is possible, nothing else is needed
0.312 0.6 Tag: 100011 (0.546875)
0.124 0.5848 0.59632 0.7 E3:

Tag: 100110 (0.59375)
Decode 2, E2 scaling
Tag: 001100 (0.1875)
0.1696 0.19264
Same status as the old method: low, high, range, tag.
Summary of Different Scalings
0 0.25 0.5 0.75 1.0
Need E1 scaling
Need E2 scaling
Need E3 scaling
No scaling is required.
Ready to encode/decode
the next symbol.

Outline
 Review
 E3 Scaling


Encoding Pseudo Code with E1, E2, E3
(For integer implementation)
EncodeSymbol(n) {
Round off to integer
//Update variables
RANGE = HIGH - LOW + 1;
HIGH = HIGH + RANGE * Cum( n ) / N - 1;
LOW = LOW + RANGE * Cum(n-1) / N;
//Scaling before encoding next symbol

EncoderScaling(); //see next slide
}

Encoding Pseudo Code with E1, E2, E3
EncoderScaling( ) {
while (E1, E2 or E3 is possible) {
if (E3 is possible) {
HIGH = ((HIGH - QUARTER) << 1) + 1;
LOW = (LOW - QUARTER) << 1;
Scale3++; //Save number of E3, but send nothing
}
if (E1 or E2 is possible) {
Let b=0 for E1 and b=1 for E2
send b
HIGH = (HIGH << 1) + 1;
LOW = (LOW << 1);
while (Scale3 > 0) { //send info about E3 now
send complement of b //E2 ° (E3)^n = (E1)^n ° E2
Scale3 --; //Send one bit for each E3
}
}
}
}

Decoding Pseudo Code with E1, E2, E3
(For integer implementation) Intervals: [0, 203], [204, 208], [209, 255]
DecodeSymbol(Tag) {
RANGE = HIGH - LOW + 1;
n = 1;
While (Tag > LOW + RANGE * Cum(n) / N - 1) {
n++;
}
Round off to integer: HIGH of each interval
HIGH = LOW + RANGE * Cum(n) / N - 1;
LOW = LOW + RANGE * Cum(n-1) / N;
//keep scaling before decoding next symbol

DecoderScaling(Tag); //next slide
return n;
}

Decoding Pseudo Code with E1, E2, E3
DecoderScaling(Tag) {
while (E1, E2 or E3 is possible) {
if (E1 or E2 is possible) {
LOW = LOW << 1;
HIGH = (HIGH << 1) + 1;
Tag = Tag << 1;
Tag = Tag | ReadBits(1);
}
if (E3 is possible) {
LOW = (LOW - QUARTER) << 1;
HIGH = ((HIGH - QUARTER) << 1) + 1;
Tag = (Tag - QUARTER) << 1;
Tag = Tag | ReadBits(1);
}
}
}

Integer Encoding with E1, E2, E3
Input 1
0 203 255
Input 3
0 167 203 E2: Output 1
78 151 E3: Scale3=1
Input 2
28 146 148 175 E2: Output 1
Output 0
36 41 Scale3 = 0
E1: Output 0
72 83 E1: Output 0
144 167 E2: Output 1
32 79 E1: Output 0
64 159 E3: Scale3=1
Input 1
0 152 191 Output 0
Output 1
Final output: 11000100 10000000 Scale3=0
Output 7 more 0’s
Integer Decoding with E1, E2, E3
Input: 11000100 10000000
Read 8 bits:
11000100 (196)
0 203 255 Decode 1
Decode 3
0 167 203 E2: Tag=10001001 (137)
78 151 E3: Tag = 10010010 (146)

Decode 2
28 146 148 175
E2: Tag=00100100 (36)

36 41 Decode 1
Tag = LOW: stop.

Outline
 Review
 E3 Scaling


How to decide the word length m?
 Need to guarantee non-zero interval for all symbols
in the worst case: 1 2 3
0 40 41 50
 RANGE  Cum(n) 
HIGH  LOW    1 k P(k) Nk Cum(k)
 N 
0 - - 0
 RANGE  Cum(n  1) 
LOW  LOW    1 0.8 40 40
 N  2 0.02 1 41
3 0.18 9 50
 RANGE  Cum(n)   RANGE  Cum(n  1) 

Need
   
 N   N 
even when Cum(n) – cum(n -1) = 1.

Otherwise HIGH < LOW.
 RANGE cannot be too small at any time. Intuitive!
How to decide the word length m?
 When do we have the smallest RANGE without triggering a
scaling?
M = 2^m
0 M/4 M/2 3/4M M
0 M/4 M/2 3/4M M
 When interval is slightly larger than [M/4, M/2] or [M/2, 3/4M]

 None of E1, E2, and E3 can be applied
 Condition: 1/4 (2^m) > N

 Example: N = 50,  min m = 8 (1/4M=64)

Outline
 Review

Binary Arithmetic Coding
 Arithmetic coding is slow in general:
To decode a symbol, we need a seris of decisions and multiplications:
While (Tag > LOW + RANGE * Cum(n) / N - 1) {
n++;
}
 The complexity is greatly reduced if we have only two symbols: 0 and 1.
symbol 0 symbol 1
0 x 1
 Only two intervals in [0, 1): [0, x), [x, 1)

Encoding of Binary Arithmetic Coding
LOW = 0, HIGH = 1 Prob(0)=0.6. Sequence: 0110
LOW = 0, HIGH = 0.6
0 0.6 1
LOW = 0.36, HIGH = 0.6
0 0.36 0.6
LOW = 0.504, HIGH = 0.6

0.36 0.504 0.6
LOW = 0.504, HIGH = 0.5616
0.504 0.5616 0.6

Only need to update LOW or HIGH for each symbol.
Decoding of Binary Arithmetic Coding
Tag
0 0.6 1
Only one decision to make:
While (Tag > LOW + RANGE * Cum(n) / N - 1)) {

n++;
}
if (Tag > LOW + RANGE * Cum(Symbol 0) / N - 1)

{
n = 1;
} else {
n = 0;
} CMPT365 Multimedia Systems 60
Applications of Binary Arithmetic Coding
 Increasingly popular:
 JBIG, JBIG2, JPEG2000, H.264
 Covert non-binary signals into binary first

 H.264: Golomb-Rice Code
 Bit-plane coding
 Various simplifications to avoid multiplication:

 H.264: Table look-up for RANGE * Cum(n) / N
 JBIG: Eliminate multiplication by assuming the RANGE is
close to 1. Scale if RANGE too small.

Outline
 Review

Adaptive Arithmetic Coding
 Observation: The partition of [0, 1) can be

different from symbol to symbol
 The bit stream can be decoded perfectly as long
as both encoder and decoder are synchronized
(use the same partition).
 General approach:
 Starting from a pre-defined probability distribution
 Update probability after each symbol
 This is very difficult for Huffman coding:

 Has to redesign the codebook when prob changes

 Binary sequence: 01111
Example
 Initial counters for 0’s and 1’s:
C(0)=C(1)=1.
 P(0)=P(1)=0.5
0 0.5 1
 After encoding 0: C(0)=2, C(1)=1.

P(0)=2/3, P(1)=1/3
0 0.3333 0.5

P(0)=1/2, P(1)=1/2
0.3333 0.4167 0.5

P(0)=2/5, P(1)=3/5
0.4167 0.45 0.5
 After encoding 0111: C(0)=2, C(1)=4. P(0)=1/3,
P(1)=2/3.
0.45 0.4667 0.5 Encode 0.4667.

 Input 0.4667.
Decoding
 Initial counters for 0’s and 1’s:
C(0)=C(1)=1  P(0)=P(1)=0.5
Decode 0
0 0.5 1
 After decoding 0: C(0)=2, C(1)=1. P(0)=2/3,

P(1)=1/3
0 0.3333 0.5 Decode 1

P(1)=1/2
0.3333 0.4167 0.5 Decode 1

P(1)=3/5
0.4167 0.45 0.5 Decode 1
 After decoding 0111: C(0)=2, C(1)=4.

P(0)=1/3, P(1)=2/3.
0.45 0.4667 0.5 Decode 1

Context-adaptive Arithmetic Coding
 In many cases, a sample has strong correlation with its near
neighbors.
 Idea:
 Collect conditional probability distribution of a symbol for given
neighboring symbols (context):
P(x(n) | x(n-1), x(n-2), … x(n-k))
 Use this conditional probability to encode the next symbol
 More skewed probability distribution can be obtained (desired by
arithmetic coding)
abcbcab abcbcab
1-D
cbabcba
Context 2-D
template Context
template
Example 0110101
 Binary sequence: 0, 1
 Neighborhood (template) size: 3
2^3=8 possible combinations (contexts) of 3 neighbors
(x(n-1), x(n-2), x(n-3)).
 Collect frequencies of 0’s and 1’s under each context
Context C(0) C(1)

(0, 0, 0) 9 2
0 1
(0, 0, 1) 3 6
(0, 1, 0) 10 10 0 1
(0, 1, 1) … …
0 1
(1, 0, 0) … …
(1, 0, 1) … … Each symbol is coded with
(1, 1, 0) … … the probability distribution
(1, 1, 1) … … associated with its context.

Encoding Pseudo Code
InitProbModel(ProbModel);
While (not EOF) {
n = ReadSymbol( );
//Determine the neighboring combination (context)

context = GetContext();
//Encode with the corresponding probability model

EncodeSymbol(ProbModel[context], n);
//update probability counters for this context

UpdateProb(ProbModel[context], n);
}

Decoding Pseudo Code
InitProbModel(ProbModel);
While (not EOF) {
//Determine the neighboring combination (context)
context = GetContext();
//Decode with the corresponding probability model

n = DecodeSymbol(ProbModel[context]);
//update probability counters for this context

UpdateProb(ProbModel[context], n);
}

Performance of Arithmetic Coding
 For a sequence of length m:
 H(X) <= Average Bit Rate <= H(X) + 2/m
 Can approaches the entropy very quickly.
 Huffman coding:
 H(X) <= Average Bit Rate <= H(X) + 1/m
 Impractical: need to generate codewords for all sequences of
length m.

Summary – Part II
 Arithmetic coding:
 Partition the range [0, 1) recursively according to symbol
probabilities.
 Incremental Encoding and Decoding
 E1, E2, E3 scaling

 Context Adaptive Arithmetic Coding
 Next: Quantization (lossy compression)

Arithmetic Supplementary

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arithmetic Supplementary

Uploaded by

Copyright:

Available Formats

CMPT 365 Multimedia Systems

CMPT365 Multimedia Systems 1

CMPT365 Multimedia Systems 2

 Hard to adapt to changing statistics

 Minimum codeword length is 1 bit

CMPT365 Multimedia Systems 3

000 010 011 100

 The first symbol restricts the tag

 The reduced interval is partitioned

CMPT365 Multimedia Systems 5

 Why compression is achieved this way?

CMPT365 Multimedia Systems 6

 Disjoint but complete partition:

CMPT365 Multimedia Systems 7

CMPT365 Multimedia Systems 9

Input HIGH LOW RANGE

Initial 1.0 0.0 1.0

1 0.0+1.0*0.8=0.8 0.0+1.0*0 = 0.0 0.8

3 0.0 + 0.8*1=0.8 0.0 + 0.8*0.82=0.656 0.144

2 0.656+0.144*0.82=0.77408 0.656+0.144*0.8=0.7712 0.00288

1 0.7712+0.00288*0=0.7712 0.7712+0.00288*0.8=0.773504 0.002304

CMPT365 Multimedia Systems 10

Drawback: need to recalculate all thresholds each time.

x =(0.7712-0) / 0.8 0 0.8 0.82 1.0

0 0.8 0.82 1.0

0 0.8 0.82 1.0

CMPT365 Multimedia Systems 13

CMPT365 Multimedia Systems 14

CMPT365 Multimedia Systems 15

 E2: [LOW HIGH) in [0.5, 1)

 Output 1, subtract 0.5, 0 0.5 1.0

CMPT365 Multimedia Systems 16

0.3392 0.38528 E1: Output 0

0.6784 0.77056 E2: Output 1

 LOW = 0.5424 (0.10001010... in binary),

CMPT365 Multimedia Systems 18

Comparison with Huffman

 Symbols with larger probabilities generates less number of bits.

CMPT365 Multimedia Systems 19

0.1696 0.19264 E1: Tag: 011000 (0.375)

0.3392 0.38528 E1: Tag: 110000 (0.75)

0.6784 0.77056 E2: Tag: 100000 (0.5)

0.3568 0.54112 Decode 1

CMPT365 Multimedia Systems 21

 Binary Arithmetic Coding

CMPT365 Multimedia Systems 22

Final range: [0.7712, 0.773504): Encode 0.7712

 E2: [LOW HIGH) in [0.5, 1)

 Output 1, subtract 0.5, 0 0.5 1.0

CMPT365 Multimedia Systems 24

0.3392 0.38528 E1: Output 0

0.6784 0.77056 E2: Output 1

 LOW = 0.5424 (0.10001010... in binary),

CMPT365 Multimedia Systems 26

//keep scaling before encoding next symbol

CMPT365 Multimedia Systems 27

0.1696 0.19264 E1: Tag: 011000 (0.375)

0.3392 0.38528 E1: Tag: 110000 (0.75)

0.6784 0.77056 E2: Tag: 100000 (0.5)

0.3568 0.54112 Decode 1

CMPT365 Multimedia Systems 29

 Binary Arithmetic Coding

CMPT365 Multimedia Systems 30

LOW1 HIGH1 LOW2

1 0.0+1.00.8=0.8 0.0+1.00 = 0.0 0.8

3 0.0 + 0.81=0.8 0.0 + 0.80.82=0.656 0.144

2 0.656+0.1440.82=0.77408 0.656+0.1440.8=0.7712 0.00288

1 0.7712+0.002880=0.7712 0.7712+0.002880.8=0.773504 0.002304