You are on page 1of 8

Universidad Autónoma de Nuevo León

Facultad de Ingeniería Mecánica y Eléctrica

Semester August-June 2022

Information theory and coding

Fundamental Activity 2.2


Adaptive Huffman Coding

Instructor: José Ramon Rodríguez Cruz

Students Matrícula
Alfredo de Alba Alvarado 1794507
Eduardo Melendrez Escobedo 1884202
Kenya Giselle Martinez Puente 1862460

Career: Electronic and communications engineering


Day y Hour: Tuesday_M1-M3

Date of delivery: 28-February-2022


Introduction

In this activity we will see more about the algorithmics descriptions of


the information source that we choose, identifying numbers of samples
or data from the source, more characteristics of the system that
produces the information source, also the use of mathematical
functions using MATLAB, which we require for this activity.

In this work we will talk about the Adaptive Huffman Compression, in


which circumstances it is useful and in which it is not, we will see the
algorithms used and more in-depth about the use of this tool, for its
understanding and implementation in the class, divided into the
following sections:
-Algorithmic description
-Algorithms Comparison

These two sections will be carried out through the team's


understanding of the explained models, how the data is compressed
and what are the mathematical principles behind the algorithms.
Likewise, we will analyze and argue the usefulness of this method for
the realization of our project.
Algorithmic Description

Huffman coding requires knowledge of the probabilities of the source


sequence. If this knowledge is not available, Huffman coding becomes
a two-pass procedure: the statistics are collected in the first pass, and
the source is encoded in the second pass. In order to convert this
algorithm into a one-pass procedure, if we wanted to encode the (k+1)-
th symbol using the statistics of the first k symbols, we could
recompute the code using the Huffman coding procedure each time
asymbol is transmitted, we add two other parameters to the binary tree:
the weight of each leaf, which is written as a number inside the node,
and a node number. The weight of each external node is simply the
number of times the symbol corresponding to the leaf has been
encountered. The weight of each internal node is the sum of the
weights of its offspring. The node number yi is a unique number
assigned to each internal and external node. If we have an alphabet of
size n, then the 2n−1 internal and external nodes can be numbered as
y1...y2n−1 such that if xj is the weight of node yj, we have x1 ≤ x2 ≤···≤
x2n−1.

The function of the update procedure is to preserve the sibling


property. In order that the update procedures at the transmitter and
receiver both operate with the same information, the tree at the
transmitter is updated after each symbol is encoded, and the tree at
the receiver is updated after each symbol is decoded. If the symbol to
be encoded or decoded has occurred for the first time, a new external
node is assigned to the symbol and a new NYT node is appended to
the tree. Both the new external node and the new NYT node are
offsprings of the old NYT node, we increment the weight of the new
external node by one
The tree at both the encoder and decoder consists of a single node,
the NYT node. Therefore, the codeword for the very first symbol that
appears is a previously agreed-upon fixed code. After the very first
symbol, whenever we have to encode a symbol that is being
encountered for the first time, we send the code for the NYT node,
followed by the previously agreed-upon fixed code for the symbol. The
code for the NYT node is obtained by traversing the Huffman tree from
the root to the NYT node, or the NYT node is obtained by traversing
the Huffman treefrom the root to the NYT node. This alerts the receiver
to the fact that the symbol whose code follows does not as yet have a
node in the Huffman tree. If a symbol to be encoded has a
corresponding node in the tree, then the code for the symbol is
generated by traversing the tree from the root to the external node
corresponding to the symbol.

As we read in the received binary string, we traverse the tree in a


manner identical to that used in the encoding procedure. Once a leaf is
encountered, the symbol corresponding to that leaf is decoded. If the
leaf is the NYT node, then we check the next e bits to see if the
resulting number is less than r. If it is less than r, we read in another bit
to complete the code for the symbol. The index for the symbol is
obtained by adding one to the decimal number corresponding to the e-
or e + 1-bit binary string. Once the symbol has been decoded, the tree
is updated and the next received bit is used to start another traversal
down the tree.

The Golomb-Rice codes belong to a family of codes designed to


encode integers with the assumption that the larger an integer, the
lower its probability of occurrence. The simples code for this situation is
the unary code. The unary code for a positive integer n is simply n 1s
followed by a 0. Thus, the code for 4 is 11110, and the code for 7 is
11111110. The unary code is the same as the Huffman code for the
semi-infinite alphabet {1, 2, 3…} with probability model.
1
P [ k ]=
2k

Because the Huffman code is optimal, the unary code is also optimal
for this probability model.
The Rice code can be viewed as an adaptive Golomb code. In the Rice
code, a sequence of nonnegative integers is divided into blocks of J
integers apiece. Each block is then coded using one of several options,
most of which are a form of Golomb codes. Each block is encoded with
each of these options, and the option resulting in the least number of
coded bits is selected. The particular option used is indicated by an
identifier attached to the code for each block.

As an application of the Rice algorithm, let’s briefly look at the


algorithm for lossless data compression recommended by CCSDS.
The preprocessor removes correlation from the input and generates a
sequence of nonnegative integers. This sequence has the property that
smaller values are more probable than larger values. The binary coder
generates a bitstream to represent the integer sequence. The binary
coder is our main focus at this point.
The Tunstall code is an important exception. In the Tunstall code, all
codewords are of equal length. However, each codeword represents a
different number of letters, an example of a 2-bit Tunstall code for an
alphabet A = {A, B}

A 2-bit Tunstall code A 2-bit (non-Tunstall) code

The design of a code that has a fixed codeword length but a variable
number of symbols per codeword should satisfy the following
conditions:
1. We should be able to parse a source output sequence into
sequences of symbols that appear in the codebook.
2.We should maximize the average number of source symbols
represented by each codeword.

Algorithm Comparison

Golomb Code Rice Code Tunstall Code


Alphabets that follow a it can refer to that A method for constructing
geometric distribution will adaptive scheme or to a Tunstall code that is
have a Golomb code as the use of that subset of linear time in the number
their optimal prefix, Golomb codes. While a of output items. This is an
which makes Golomb Golomb code has an improvement on the state
encoding well suited for adjustable parameter of the art for non-Bernoulli
situations where small that can be any positive sources,including Markov
values are significantly integer value, Rice sources, which require a
more likely to occur in codes are those where (suboptimal)generalizatio
the input stream than the adjustable parameter n of Tunstall’s algorithm
large values.The Run- is a power of two. This proposed by Savari and
LengthGolomb-Rice makes theRicecodes analytically examined by
(RLGR) accomplishes convenient to use on a Tabus and Rissanen. In
this by using a very computer, since general, if n is the total
simple algorithm that multiplication and number of output leaves
adjusts the Golomb-Rice division by 2 can be across all Tunstall trees,
parameter up or down, more efficiently s is the number of trees
depending on the last implemented in binary (states), and D is the
encoded symbol. A arithmetic. number of leaves of
decoder can follow the each internal node.
same rule to track the Rice was motivated to
variation of encoding propose this simpler
parameters, so no subset due to the fact
additional information that geometric
needs to be transmitted, distributions are often
just the encoded data. time-varying, not
precisely known, or both,
so selecting the
apparently optimal code
might not be very
advantageous.

Rice coding is used as


an entropy coding stage
in various lossless image
and audio data
compression methods.

Conclusions

During this activity we managed to understand the algorithmic


descriptions, among them the adaptive understanding of huffman, how
it can become useful in some scenarios, and in others it is not viable, it
was also possible to appreciate the use of this tool for compression
and implementation during the class which are seen in two segments
one of them is the algorithmic description and the comparison of
algorithms, which were carried out through the team models seen
during the activity, see how the data and the mathematical principles
on these algorithms as well as its great utility to be able to work during
this activity was analyzed, likewise it was possible to observe that
huffman coding requires knowledge of probabilities on the source
sequence, in addition to not having Said knowledge is divided into two
steps, as observed during the activity, the first step being the
compilation of ing the data like this once as in the second step the
source is encoded.

References
[1] A. Kiely, «Golomb parameter selection in rice coding,» 2004. [En línea]. [Último acceso: 26
February 2022].

[2] K. Sayood, «Introduction to,» 2006. [En línea]. [Último acceso: 26 February 2022].

[3] M. B. Baer, «Efficient Implementation of the Generalized,» 8 May 2009. [En línea]. Available:
https://arxiv.org/pdf/0809.0949.pdf. [Último acceso: 26 February 2022].

You might also like