Professional Documents
Culture Documents
INTRODUCTION
The process of reducing the size of a data file is often referred to as data
compression. In the context of data transmission, it is called source coding; encoding
done at the source of the data before it is stored or transmitted. Source coding should
not be confused with channel coding, for error detection and correction or line coding,
the means for mapping data onto a signal. Compression is useful because it reduces
the resources required to store and transmit data. Computational resources are
consumed in the compression and decompression processes. Data compression is
subject to a space–time complexity trade-off. For instance, a compression scheme for
video may require expensive hardware for the video to be decompressed fast enough
to be viewed as it is being decompressed, and the option to decompress the video in
full before watching it may be inconvenient or require additional storage. The design
of data compression schemes involves trade-offs among various factors, including the
degree of compression, the amount of distortion introduced (when using lossy data
compression), and the computational resources required to compress and decompress
the data.
1
1.1.1 Lossless compression
Lossless data compression algorithms usually exploit statistical
redundancy to represent data without losing any information, so that the process is
reversible. Lossless compression is possible because most real-world data exhibits
statistical redundancy. For example, an image may have areas of color that do not
change over several pixels; instead of coding "red pixel, red pixel, ..." the data may be
encoded as "279 red pixels". This is a basic example of run-length encoding; there are
many schemes to reduce file size by eliminating redundancy.
The Lempel–Ziv (LZ) compression methods are among the most popular
algorithms for lossless storage. DEFLATE is a variation on LZ optimized for
decompression speed and compression ratio, but compression can be slow. In the mid-
1980s, following work by Terry Welch, the Lempel–Ziv–Welch (LZW) algorithm
rapidly became the method of choice for most general-purpose compression systems.
LZW is used in GIF images, programs such as PKZIP, and hardware devices such as
modems. LZ methods use a table-based compression model where table entries are
substituted for repeated strings of data. For most LZ methods, this table is generated
dynamically from earlier data in the input. The table itself is often Huffman encoded.
Grammar-based codes like this can compress highly repetitive input extremely
effectively, for instance, a biological data collection of the same or closely related
species, a huge versioned document collection, internet archival, etc. The basic task of
grammar-based codes is constructing a context-free grammar deriving a single string.
Other practical grammar compression algorithms include Sequitur and Re-Pair.
2
coding applies especially well to adaptive data compression tasks where the statistics
vary and are context-dependent, as it can be easily coupled with an adaptive model of
the probability distribution of the input data. An early example of the use of
arithmetic coding was in an optional (but not widely used) feature of the JPEG image
coding standard. It has since been applied in various other designs including H.263,
H.264/MPEG-4 AVC and HEVC for video coding.
3
telephony, for example, audio compression is used for CD ripping and is decoded by
the audio players.
And the last phase is Entropy Coding (EC) which compresses data efficiently.
The input data might be of any form i.e., text, images, video, audio etc. And the
output is a compressed data stream which when decompressed, gives us the
original data stream or an equivalent representation.
4
1.1.4 Compression ratio
Data compression ratio, also known as compression power, is a measurement
of the relative reduction in size of data representation produced by a data compression
algorithm. It is typically expressed as the division of uncompressed size by
compressed size.
¿ original file
Compressionratio=
¿ compressed file
¿ compressed file
Space saving=1−
¿ original file
To achieve better compression ratios for medical images using the BWT as the
main transformation, and reduce the time taken to compress data using BWT in a
different setup as compared to the conventional one.
1.3 Methodology
5
initially proposed Burrows-Wheeler Compression Algorithm (BWCA) written by
Michael Burrows and D. J. Wheeler in their 1994 paper titled “A block-sorting
lossless data compression algorithm”.[2] We obtain the compression ratio according
to the formula given in section 1.1.4. We aim to implement the project in MATLAB
since it is fluent in dealing with arrays of multiple dimensions. As you would notice,
we use a large amount of array manipulation in the project.
Chapter 2 contains the details of the literature survey conducted regarding our
project.
Chapter 3 contains a formal introduction to the Burrows-Wheeler Transform
and focusses on its uses in text compression, how the actual transformation
takes place and some analysis of the algorithm.
6
Chapter 4 deals with the proposed model and explains all the algorithms used
in the project using a running example. It also deals with the fundamentals of
an image and formally defining an image.
Chapter 5 contains a brief text about MATLAB, its features, and
specifications. We focus mainly on the implementation of our project in
MATLAB.
Chapters 6 and 7 contain the results obtained and conclusions drawn by the
members from the presented work.
7
CHAPTER 2
LITERATURE SURVEY
2.1 Introduction
8
GST stage in a BWCA original scheme. The MTF stage is a List Update Algorithm
(LUA), which replaces the input symbols with corresponding ranking values. Just like
the BWT stage, the LUA stage does not alter the number of symbols.
The last stage is the Entropy Coding (EC) stage, which compresses the
symbols by using an adapted model. We focus on lossless compression due to the
aimed applications in medical field, nevertheless this scheme can be considered for
lossless image compression as well as for lossy image compression. In lossy
configuration a pre-processing based on DCT is added to compression.
The main function of the RLE is to support the probability estimation of the
next stage. Long runs of identical values tend to overestimate the global symbol
probability, which leads to lower compression. Balkenhol and Shtarkov call this
phenomenon "the pressure of runs" [6]. The RLE stage helps to decrease this
pressure. To improve the probability estimation of the EC stage, the common BWCA
schemes positions the RLE stage directly in front of the EC stage.
One common RLE stage for BWT, based compressors is Run Length
Encoding Zero (RLE-0). Wheeler has suggested to code only the runs of the 0
symbols and no runs of other symbols, since 0 is the symbol with the most runs.
Hereto an offset of 1 is added to symbols greater than 0. The run length is
incremented by one and all bits of its binary representation except the most
significant bit – which is always 1 – are stored with the symbols 0 and 1. Some
authors have suggested an RLE stage before the BWT stage for speed optimization
and for reducing BWT input, but such a stage deteriorates in general the compression
9
ratio. Otherwise, specific sorting algorithms are used to arrange the runs of symbols
practically in linear time.
Other type of Run Length Encoding is RLE-2s that has been used by Abel [1].
The RLE-2s stage replaces all runs of two or more symbols by a run consisting of
exactly two symbols. In contrast to other approaches, the length of the run is not
placed behind the two symbols inside the symbol stream but transmitted into a
separate data stream, so the length information does not disturb the context of the
main data stream.
Most GST stages use a recent ranking scheme for the List Update problem like
Move-To-Front (MTF) algorithm, which is used in the original BWCA approach from
Burrows and Wheeler. Many authors have presented improved MTF stages, which are
based on a delayed behaviour, such as the MTF-1 and MTF-2 approaches of
Balkenhol et al. or a sticky version by Fenwick [5]. Another approach, which
achieved a much better compression ratio than MTF stages, is the Weighted
Frequency Count (WFC) stage presented by Deorowicz [7], this scheme has a very
high cost of computation. Other GST schemes like Inversion Frequencies (IF) [6] use
a distance measurement between the occurrences of same symbol. Like the WFC
stage of Deorowicz, Abel presented a list of counters, Incremental Frequency Count
(IFC) [1]. The difference to the WFC stage is to minimize calculation.
The very first proposition of Burrows and Wheeler was to use the Huffman
coder as the last stage; it is fast and simple, but the arithmetic coder is a better choice
to achieve better compression ratio. Abel has modified arithmetic coding, because of
the coding type of the IFC output inside the EC stage has a strong influence on the
compression rate, indeed it is not sufficient to compress the index stream just by a
simple arithmetic coder with a common order-n context. The index frequency of the
IFC output has a nonlinear decay. Even after the use of an RLE-2 stage, the index 0 is
still the most common index symbol on average.
10
Normally, Burrows and Wheeler, in their 1994 paper titled “A block-sorting
lossless data compression”, [2] proposed a coding scheme based on Huffman coding
which is a variable length code that encodes highly probable symbols with minimal
length codes and those that are less probable with maximal length codes. These codes
followed the prefix property. Prefix property means that no code word assigned to a
symbol using the algorithm is a prefix of another code word. It is also known as the
prefix free property.
2.6 Conclusion
In this chapter we discussed about the conventional use of BWT in text
compression and we shed some light on some improvements and additions to the
classical BWCA.
11
CHAPTER 3
BURROWS-WHEELER TRANSFORM
3.1 Introduction
The most widely used data compression algorithms are based on the sequential
data compressors of Lempel and Ziv. Statistical modelling techniques may produce
superior compression, but are significantly slower. In this paper, we present a
technique that achieves compression within a percent or so of that achieved by
statistical modelling techniques, but at speeds comparable to those of algorithms
based on Lempel and Ziv’s.
Our algorithm does not process its input sequentially, but instead processes a
block of text as a single unit. The idea is to apply a reversible transformation to a
block of text to form a new block that contains the same characters, but is easier to
compress by simple compression algorithms. The transformation tends to group
characters together so that the probability of finding a character close to another
instance of the same character is increased substantially. Text of this kind can easily
be compressed with fast locally-adaptive algorithms, such as move-to-front coding in
combination with Huffman or arithmetic coding.
The sorting operation brings together rotations with the same initial characters.
Since the initial characters of the rotations are adjacent to the final characters,
consecutive characters in L are adjacent to similar strings in S. If the context of a
character is a good predictor for the character, L will be easy to compress with a
simple locally-adaptive compression algorithm.
12
In the following sections, we describe the transformation in more detail, and
show that it can be inverted. We explain more carefully why this transformation tends
to group characters to allow a simple compression algorithm to work more
effectively. We then describe efficient techniques for implementing the
transformation and its inverse, allowing this algorithm to be competitive in speed with
Lempel-Ziv-based algorithms, but achieving better compression. Finally, we give the
performance of our implementation of this algorithm, and compare it with well-
known compression programmes.
So, here our string S translates to its hexadecimal equivalent, given by taking
the value mapped to each character of S from the ASCII table and converting it to
hexadecimal. Now, S = 61 62 72 61 63 61 64 61 62 72 61 61 62 72 61 63 61 64 61 62
72 61.
13
Step 1: [sort rotations]
In our example, the index is I and the matrix M is of order N x N, i.e., 22 x 22.
position a b r a c a d a b r a a b r a c a d a b r a
0 a b r a c a d a b r a a b r a c a d a b r a
1 b r a c a d a b r a a b r a c a d a b r a a
2 r a c a d a b r a a b r a c a d a b r a a b
3 a c a d a b r a a b r a c a d a b r a a b r
4 c a d a b r a a b r a c a d a b r a a b r a
5 a d a b r a a b r a c a d a b r a a b r a c
6 d a b r a a b r a c a d a b r a a b r a c a
7 a b r a a b r a c a d a b r a a b r a c a d
8 b r a a b r a c a d a b r a a b r a c a d a
9 r a a b r a c a d a b r a a b r a c a d a b
10 a a b r a c a d a b r a a b r a c a d a b r
11 a b r a c a d a b r a a b r a c a d a b r a
12 b r a c a d a b r a a b r a c a d a b r a a
13 r a c a d a b r a a b r a c a d a b r a a b
14 a c a d a b r a a b r a c a d a b r a a b r
15 c a d a b r a a b r a c a d a b r a a b r a
16 a d a b r a a b r a c a d a b r a a b r a c
17 d a b r a a b r a c a d a b r a a b r a c a
18 a b r a a b r a c a d a b r a a b r a c a d
19 b r a a b r a c a d a b r a a b r a c a d a
20 r a a b r a c a d a b r a a b r a c a d a b
21 a a b r a c a d a b r a a b r a c a d a b r
14
Now we need to sort the rows of this matrix in the lexicographical order i.e., in
the alphabetical order considering each row as a word formed by its characters. We
observe that each row and column of M is a permutation of S. In the table below, we
present M after the rows are sorted.
positio F L
n
10 a a b r a c a d a b r a a b r a c a d a b r
21 a a b r a c a d a b r a a b r a c a d a b r
18 a b r a a b r a c a d a b r a a b r a c a d
7 a b r a a b r a c a d a b r a a b r a c a d
0 a b r a c a d a b r a a b r a c a d a b r a
11 a b r a c a d a b r a a b r a c a d a b r a
3 a c a d a b r a a b r a c a d a b r a a b r
14 a c a d a b r a a b r a c a d a b r a a b r
5 a d a b r a a b r a c a d a b r a a b r a c
16 a d a b r a a b r a c a d a b r a a b r a c
8 b r a a b r a c a d a b r a a b r a c a d a
19 b r a a b r a c a d a b r a a b r a c a d a
1 b r a c a d a b r a a b r a c a d a b r a a
12 b r a c a d a b r a a b r a c a d a b r a a
4 c a d a b r a a b r a c a d a b r a a b r a
15 c a d a b r a a b r a c a d a b r a a b r a
6 d a b r a a b r a c a d a b r a a b r a c a
17 d a b r a a b r a c a d a b r a a b r a c a
9 r a a b r a c a d a b r a a b r a c a d a b
20 r a a b r a c a d a b r a a b r a c a d a b
2 r a c a d a b r a a b r a c a d a b r a a b
13 r a c a d a b r a a b r a c a d a b r a a b
15
Step 2: [find last characters of rotations]
Let the string L be the last column of M, with characters L[0], …, L[N – 1]
(equal to M [0, N – 1], …, M [N – 1, N – 1]). The output of the transformation is the
pair (L, I).
This step calculates the first column F of the matrix M of Algorithm 1. This is
done by sorting the characters of L to form F. We observe that any column of the
matrix M is a permutation of the original string S, and therefore of one another.
Furthermore, because the rows of M are sorted, and F is the first column of M, the
characters in F are also sorted.
F = 61 61 61 61 61 61 61 61 61 61 62 62 62 62 63 63 64 64 72 72 72 72.
To assist our explanation, we describe this step in terms of the contents of the
matrix M. The reader should remember that the complete matrix is not available to the
decompressor; only the strings F, L, and the index I (from the input) are needed by
this step.
16
Consider the rows of the matrix M that start with some given character ch.
Algorithm 1 ensured that the rows of matrix M are sorted lexicographically, so the
rows that start with ch are ordered lexicographically.
positio F L
n
0 a a b r a c a d a b r a a b r a c a d a b r
1 a a b r a c a d a b r a a b r a c a d a b r
2 a b r a a b r a c a d a b r a a b r a c a d
3 a b r a a b r a c a d a b r a a b r a c a d
4 a b r a c a d a b r a a b r a c a d a b r a
5 a b r a c a d a b r a a b r a c a d a b r a
6 a c a d a b r a a b r a c a d a b r a a b r
7 a c a d a b r a a b r a c a d a b r a a b r
8 a d a b r a a b r a c a d a b r a a b r a c
9 a d a b r a a b r a c a d a b r a a b r a c
10 b r a a b r a c a d a b r a a b r a c a d a
11 b r a a b r a c a d a b r a a b r a c a d a
12 b r a c a d a b r a a b r a c a d a b r a a
13 b r a c a d a b r a a b r a c a d a b r a a
14 c a d a b r a a b r a c a d a b r a a b r a
15 c a d a b r a a b r a c a d a b r a a b r a
16 d a b r a a b r a c a d a b r a a b r a c a
17 d a b r a a b r a c a d a b r a a b r a c a
18 r a a b r a c a d a b r a a b r a c a d a b
19 r a a b r a c a d a b r a a b r a c a d a b
20 r a c a d a b r a a b r a c a d a b r a a b
21 r a c a d a b r a a b r a c a d a b r a a b
17
those rows in M ‘ that start with a character ch, they must appear in lexicographical
order relative to one another; they have been sorted lexicographically starting with
their second characters, and their first characters are all the same and so do not affect
the sort order. Therefore, for any given character ch, the rows in M that begin with ch
appear in the same order as the rows in M ‘ that begin with ch.
positio
n
0 r a a b r a c a d a b r a a b r a c a d a b
1 r a a b r a c a d a b r a a b r a c a d a b
2 d a b r a a b r a c a d a b r a a b r a c a
3 d a b r a a b r a c a d a b r a a b r a c a
4 a a b r a c a d a b r a a b r a c a d a b r
5 a a b r a c a d a b r a a b r a c a d a b r
6 r a c a d a b r a a b r a c a d a b r a a b
7 r a c a d a b r a a b r a c a d a b r a a b
8 c a d a b r a a b r a c a d a b r a a b r a
9 c a d a b r a a b r a c a d a b r a a b r a
10 a b r a a b r a c a d a b r a a b r a c a d
11 a b r a a b r a c a d a b r a a b r a c a d
12 a b r a c a d a b r a a b r a c a d a b r a
13 a b r a c a d a b r a a b r a c a d a b r a
14 a c a d a b r a a b r a c a d a b r a a b r
15 a c a d a b r a a b r a c a d a b r a a b r
16 a d a b r a a b r a c a d a b r a a b r a c
17 a d a b r a a b r a c a d a b r a a b r a c
18 b r a a b r a c a d a b r a a b r a c a d a
19 b r a a b r a c a d a b r a a b r a c a d a
20 b r a c a d a b r a a b r a c a d a b r a a
21 b r a c a d a b r a a b r a c a d a b r a a
18
M ‘ is defined as the matrix formed by rotating each row of M one character to
the right, so for each i = 0, …, N – 1, and each j = 0, …, N – 1,
In our example, this fact is demonstrated by the rows that begin with ‘a’. The
rows 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 in M correspond to the rows 4, 5, 10, 11, 12, 13, 14,
15, 16, and 17 in M ‘.
In our example, T is: (18, 19, 16, 17, 0, 1, 20, 21, 14, 15, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13).
Now, for each i = 0, …, N – 1, the characters L [ i] and F [ i] are the last and
the first characters of the row i of M. Since each row is a rotation of S, the character L
[ i] cyclicly precedes the character F [ i] in S. From the construction of T, we have F [
T [ j]] = L [ j]. Substituting i = T [ j], we have L [ T [ j]] cyclicly precedes L [ j] in S.
The index I is defined by Algorithm 1 such that row I of M is S. Thus, the last
character of S is L [ I]. We use the vector T to give the predecessors of each character:
Where T 0[x] = x, and T i+1[x] = T [T i[x]]. This yields S, the original input to the
compressor. In our example, S = ‘a b r a c a d a b r a a b r a c a d a b r a’.
We could have defined T so that the string S would be generated from front to
back, rather than the other way around.
19
The sequence T i [ I] for i = 0, …, N – 1 is not necessarily a permutation of the
numbers 0, …, N – 1. If the original string is of the form Z P for some substring Z and
some P > 1, then the sequence T i [ I] for i = 0, …, N – 1 will also be of the form Z1 P
for some subsequence Z1. That is the repetitions in S will be generated by visiting the
same elements of T repeatedly. For example, if S = ‘c a n c a n’, Z = ‘c a n’ and P =
2, the sequence T i [ I] for i = 0, …, N – 1 will be (2, 4, 0, 2, 4, 0).
In our example, T is: (4, 5 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 8, 9, 2,
3, 0, 1, 6, 7).
The index I is defined by the Algorithm 1 such that row I of M is S. Thus, the
first character of S is F [ I]. We use the vector T to get successors of each character.
Where T 0[x] = x, and T i+1[x] = T [T i[x]]. This yields S, the original input to the
compressor. In our example, S = ‘a b r a c a d a b r a a b r a c a d a b r a’.
Algorithm 1 sorts the rotations of an input string S, and generates the string L
consisting of the last character of each rotation.
To see why this might lead to effective compression, consider the effect on a
single letter in a common word in a block of English text. We will use the example of
the letter ‘t’ in the word ‘the’, and assume an input string containing many instances
of ‘the’.
When the list of rotations of the input is sorted, all the rotations starting with
‘he ’ will sort together; a large proportion of them are likely to end in ‘t’. One region
of the string L will therefore contain a disproportionately large number of ‘t’
characters, intermingled with other characters that can proceed ‘he ’ in English, such
as space, ‘s’, ‘T’, and ‘S’.
The same argument can be applied to all characters in all words, so any
localized region of the string L is likely to contain a large number of a few distinct
characters. The overall effect is that the probability that given character ch will occur
21
at a given point in L is very high if ch occurs near that point in L, and is low
otherwise. This property is exactly the one needed for effective compression by a
move-to-front coder, which encodes an instance of character ch by the count of
distinct characters seen since the next previous occurrence of ch. When applied to the
string L, the output of a move-to-front coder will be dominated by low numbers,
which can be efficiently encoded with a Huffman or arithmetic coder.
3.4 Conclusion
22
CHAPTER 4
IMAGE COMPRESSION USING BWT
4.1 Introduction
23
Fig 4.1: An image of the human eye.[8]
f ( x, y) = g,
where g is the gray-level intensity of the image at the spatial co-ordinate (x, y).
The number of bits required to store a single pixel depends on the gray-level
resolution of the image. Gray-level resolution of an image is the number of distinct
gray-levels that a single pixel of that image could be assigned. For example, a
standard grayscale image having 256 distinct gray-levels for each pixel requires 8 bits
per pixel, this is because 8 bits are required to encode 256 different levels.
24
Fig 4.2: A grayscale image and its representation with gray-level values. [9]
Raster images have a finite set of digital values, called picture elements or
pixels. The digital image contains a fixed number of rows and columns of pixels.
Pixels are the smallest individual element in an image, holding antiquated values that
represent the brightness of a given color at any specific point. Typically, the pixels are
stored in computer memory as a raster image or raster map, a two-dimensional array
of small integers.
As we can see in figure 3. (b) and (c), the compressed image is different when
compared to the original (a). This is due to the fact that the image is compressed using
a lossy algorithm. These algorithms exploit the psychovisual redundancy in the
images and they concentrate primarily on obtaining a better compression ratio.
However, there is a trade-off between the compression ratio and the quality of the
image. Obtaining a high compression ratio often leads to the image losing its ability to
please or convince the observer visually, as in (c), whereas low compression ratio
means most of the image quality is preserved except for a very few changes compared
to the original image, i.e., the image is visually lossless as in (b).
25
Coming to lossless compression, one cannot simply view the compressed
image since the form in which it is stored is not a direct representation of the original
image. One needs to apply a decompression algorithm which reverts the compressed
data into a presentable form, and then one can view the image as it was before
compression, i.e., lossless both visually and literally.
Compressor Decompressor
Store an Retrieve an
image image
Compressed
file
When we have an input image to store, we input the image into the
compressor and obtain a compressed stream of data as the output and we store that
data stream instead of the original data, this saves space for storage and transmission
bandwidth when we need to send this data wirelessly.
26
Similarly, when we want to look at the image, we retrieve it by decompressing
the stored compressed data stream using the decompressor. The output of the
decompressor can be ultimately compared to the original data to prove the fact that
the whole process is lossless. The compression ratio and space saving are calculated
as specified in the section 1.1.4 to get a grasp of the efficiency and effectiveness of
the compression algorithms used in the compressor.
4.3 Compressor
Reordering
Input image
BWT
MTF
RLE
Huffman
coding
Compressed
27 data
In this section, we explain the block diagram of the compressor and the
algorithms used in each block.
So now, reordering comes into picture. This conversion process from two-
dimensional representation to a one dimensional one is called reordering or path
scanning. Based on the type of path taken to cover the spatial plane, we have a
plethora of reordering techniques. The most popular and effective ones are shown in
the figure below.
Fig 4.6: Some of the reordering techniques available. (a) left, (b) left-
right, (c) up-down, (d) spiral, and (e) zig-zag.
28
The left scan just scans the entire image row after row and appends all of them
together. Whereas the left-right scan scans odd numbered rows from left to right and
even numbered rows form right to left. The up-down method is similar to left-right
method but instead of rows it scans columns, odd numbered columns are scanned
from top to bottom and even numbered columns from bottom to top. In the spiral
technique, the scan starts from one of the four extreme positions and spirals inward.
Whereas in zigzag method, the scan follows a zig-zag path from start to the end of the
image.
So, one can say that the Burrows-Wheeler Transform is a pre-processing stage
before the actual compression whose objective is to turn the input array into a more
compression-friendly form. Finally, the outputs of this stage are L and I.
29
4.4.3 Stage 3: Move-to-Front coding
The move-to-front algorithm belong to a family of algorithms called the
Global Structure Transforms. It basically transforms a local structure redundancy into
a global structure redundancy which can be exploited further by RLE and Huffman
coding.
Algorithm 3: MTF
This algorithm encodes the output, L of the stage 2, which is a string of length
N. It uses to move-to-front technique on each of the individual characters. The move-
to-front (MTF) transform is an encoding of data (typically a stream of bytes) designed
to improve the performance of entropy encoding techniques of compression. When
efficiently implemented, it is fast enough that its benefits usually justify including it
as an extra step in data compression algorithm. We now define a vector of integers,
R[0], …, R[N – 1], which are the codes for the characters L[0], …, L[N – 1].
a r r d d a a r r c c a a a a a a a a b b b
b a a d d a a r r c c c c c c c c a a a
List Y c b b r r r r d d a a r r r r r r r r c c c
d c c a a b b b b d d d d d d d d d d r r r
r d d b b c c c c b b b b b b b b b b d d d
c c
List L r r d d a a r r c c a a a a a a a a b b b b
List R 4 0 4 0 2 0 2 0 4 0 2 0 0 0 0 0 0 0 4 0 0 0
Table 4.1: MTF encoding output.
30
As shown in the table, the entries in row 1 are Y, updated after encoding each value of
L and the output is stored in R.
For example, consider the first two elements, L[0] and L[1], now here, Y is
initially as given above now L[0] is ‘r’ which appears at index 4 in Y so the output
R[0] is 4 and the list Y is updated by moving ‘r’ to the start of the list. So, now Y =
{‘r’, ‘a’, ‘b’, ‘c’, ‘d’}. Now for L[1] = ‘r’, the output R[1] would be the position of ‘r’
in the updated list Y, i.e., 0. So, in this way, all repetitions of length j would appear as
j – 1 zeros in the output. This transforms the local structure redundancy to a global
one, this transformation of redundancy in local areas of the data to global areas is key
in GST algorithms.
So, the main distinction between the input and the output is not size of the
array, but the way repetitions of character or as we call them runs, are being portrayed
in the array. As we can see, in the output array R all runs of different characters have
been reduced to runs of zeros. This drastically increases the probability of lower
indices and decreases the probabilities of lower indices. This comes in handy while
applying data coding algorithms like Huffman or Arithmetic coding to reduce the
average word length of such data.
The input to this stage is L, the data stream that has been transformed using
Burrows-Wheeler transform and the output of this stage is R, the MTF encoded array.
Algorithm 4: RLE
This algorithm encodes the output, R of stage 3, which is a string of length N.
It encodes the runs in the sequence by replacing them with their frequencies. The
31
main aim of this step is to shrink long runs of same symbols. We now define E to be
an empty array.
Now we iterate over the array R and encode it in the following way. We start
at the beginning of the array R. We now append the first character, ch, of R to the
array E and now we start counting the subsequent occurrences of ch and append the
count to E and we move on to the next character ch1 (say) and continue the process
until the end of R is reached.
We repeat the above process until the array R is exhausted. The output of this
stage, E is as follows:
r1 r2 r3
ORIGINAL STRING: 4 0 4 0 2020402000
0 0 0 0 4 0 0 0 = R, length = 22.
32
because of the presence of unit length runs in excess. To solve this problem, we
introduce a variation of the RLE scheme in which the unit length runs would not be
accounted for.
It is important to observe that the length of the output sequence of this stage is
not same as the length of the input sequence. So, this is the first stage in which actual
compression of data is supposed to happen. But in our case, instead of compression,
the length of the data increased. So, to rectify this issue, we use a variant of RLE
called the RLE-2s to encode the output of stage 3. The working of RLE-2s is
described in the section below.
Now we iterate over the array R and encode it in the following way. We start
at the beginning of the array R. We now append the first character, ch, of R to the
array E and now we start counting the subsequent occurrences of ch and if the count is
greater than 1, then we append ch to E once more and then append the count. If the
count is 1, i.e., it is a unit length run, then we move on to the next run.
33
The output of this algorithm for the given input R is as follows.
r1 r2 r3
ORIGINAL STRING: 4 0 4 0 2020402000
0 0 0 0 4 0 0 0 = R, length = 22.
As we can see, the length of the output sequence is 18, which is less than that
of the input sequence. The input r1 = ‘4’ is encoded as e1 = ‘4’ of length 1, r2 = ‘0 0
0 0 0 0 0’ is encoded as e2 = ‘0 0 7’, which is of length 3, and r3 = ‘0 0 0’ is encoded
as e3 = ‘0 0 3’ of length = 3. So, one can say confidently, that compression has been
achieved. This is because that the runs with unit length are encoded with a single
character.
The input to this stage is R, the MTF encoded output and the output of this
stage is E, the data stream encoded using RLE-2s algorithm.
34
symbols. Huffman's method can be efficiently implemented, finding a code in time
linear to the number of input weights if these weights are sorted.
We now take a look at the first step using our running example. We need to
initially obtain a one-to-one mapping between symbols and their frequencies in E. We
initialize an array F containing all the unique characters in E, F = {0, 2, 3, 4, 7}. Now
we define P such that P[i] = frequency of F[i] in E, P = {9, 3, 1, 4, 1}. We need to
create a leaf node for each unique character and build a min heap of all leaf nodes
(Min Heap is used as a priority queue. The value of frequency field is used to
compare two nodes in min heap. Initially, the least frequent character is at root).
Extract two nodes with the minimum frequency from the min heap. Create a new
internal node with a frequency equal to the sum of the two nodes frequencies. Make
the first extracted node as its left child and the other extracted node as its right child.
Add this node to the min heap. Repeat the above steps until the heap contains only
one node. The remaining node is the root node, and the tree is complete.
We now apply the above steps to our example. Initially all the nodes in the
tree are leaves. And the least frequent character is at the root. This is shown below.
Here the node is a circle and its divided into two parts, on the left we have the symbol
35
and on the right, the frequency, but the internal nodes have the left side empty
because internals nodes so not represent any symbol.
root
3 1
7 1
2 3
4 4
0 9
symbol frequency
Now a new parent node is created using the two least frequency leaves from
the tree and a min heap is formed using the new node, called the internal node, and
the nodes excluding the two children of the new node is formed and the process in
again repeated until we get a single node whose value in the frequency field is the
length of the array E. We make the first extracted node as the left child. The value in
the frequency field of the new node is given by the sum of the two children nodes.
The state of the binary tree after forming the first internal node is given below.
root
I1 2
3 1 7 1
2 3
4 4
36 0 9
Fig 4.8: Tree after formation of first internal node.
Now we have node {I1, 2} as the root node and other nodes. We now form a
parent node using the two nodes with the least frequency values i.e., nodes I1 and
node with ‘2’ in the symbol field. After this, the tree looks as follows.
4 4 root
I2 5
I1 2 2 3 0 9
3 1 7 1
We continue this until the root node has the length of the sequence, N in its
frequency field. After this, we label the path from parent node to left child node as ‘0’
and path to the right child node as ‘1’. Now, to obtain the symbol for F[i], we need to
traverse from the root node to the leaf consisting of F[i] at the symbol field and
append all the labels on the paths followed. The final form of the Huffman tree is
given below.
37
root
I4 18
0 1
I3 9 0 9 code = 1
0 1
I2 5 4 4 code = 01
0 1
I1 2 2 3 code = 001
0 1
3 1 7 1 code = 0001
code = 0000
Given below is the table of symbols in F mapped to code words obtained from
the Huffman tree.
We now define an array, C such that C[i] = code length of the symbol F[i]. So,
the array C is given by C = {1, 3, 4, 2, 4}. The word length of a symbol is defined as
the number of bits in the Huffman coded representation of the symbol. So, the average
codeword length, awl, is defined as the total number of bits in the input data coded
according to Huffman tree divided by the length of the input data stream, M. Which is
given by the formula
∑( P[i]∗C [i])
awl =
M
9∗1+3∗3+1∗4 +4∗2+1∗4 34
awl = = = 1.8889
18 18
We can see that the output stream requires much lesser bits than the input
stream, meaning the input to this stage has been compressed successfully. The outputs
of this stage are P, F, and H. These outputs constitute the compressed data block in
the flowchart of section 3.2.2.
39
This is a quiescent stage in the whole process. Meaning that no active work
takes place in this stage. The compressed data from the Huffman coding stage of the
compressor block is given as input to this block and the same is reproduced as the
output of this block, which in turn is the input to the decompressor block.
4.6 Decompressor
Its duty is to retrieve the original data stream from the compressed data. The
order of the algorithms to be applied on the compressed data in order to decompress it
is presented in the form of a flowchart in the current section.
Compressed Huffman
data Decoding
RLE-2s
Decoding Output image
MTF
Decoding
Inverse Undo
BWT Reordering
40
The input to the decompressor block is the output of the compressor block,
which for our example are the arrays P, F, and H. The output of this block is a
decoded data stream obtained from the Huffman tree constructed using P and F.
root
I4 18
0 1
I3 9 0 9 code = 1
0 1
I2 5 4 4 code = 01
0 1
I1 2 2 3 code = 001
0 1
3 1 7 1 code = 0001
code = 0000
41
Fig 4.12: The generated Huffman tree.
Table 4.3: Mapping between symbols and code words.
F = Symbol P = Frequency Code
0 9 1
2 3 001
3 1 0000
4 4 01
7 1 0001
The mapping between symbols and their corresponding codes from the
Huffman tree are tabulated as shown above. Now, we have H = {0 1 1 0 1 1 0 0 1 1 0
0 1 1 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 1 0 0 0 0} as an input, we now need to decode this
using the above table and obtain the output in an array. Let us now define an empty
array IH to store the output of this stage. Since P and F arrays here are same as the
ones used the construction of Huffman tree in the last stage of the compressor block,
and since we use same fixed rules for the construction of a Huffman tree consistently
throughout the program, the Huffman tree constructed here is identical to the one
generated in the Huffman coding stage of the compressor block
H=0 110110011001101100111000101110000
IH = 4 04020204020074003
42
At index i = 0, s = ‘0’ and since ‘0’ is not a defined code word, we move on to
the next index, i = 2, here, s becomes s = ‘0 1’, we now observe that the symbol ‘4’ is
mapped to the code word represented by s according to the constructed Huffman tree.
So, now, we append ‘4’ to IH and reset s, and then move to the index i = 2. At i = 2, s
= ‘1’ and we see that ‘1’ is a defined code word that corresponds to symbol ‘0’ in the
table 3. , so, we append ‘0’ to IH and move to the next index. The process continues
until the end of array H is reached.
From the above stated encoding rules, we can deduce that if we encounter two
consecutively identical characters, i.e., at indices i, i + 1 of the array IH, then, we
need to append that character, IH[i], IH[i + 2] times to the output stream, IE, and we
move to the index i + 3 of the array IH.
43
But, if two consecutive elements of IH are dissimilar means that the run of the
character, IH[i] is of unit length. So, we append IH[i] to the output array IE and we
move to the index i + 1 in the array IH.
We continue the above stated process until we reach the end of the array IH.
Now, we apply these steps to our running example. Here, IH = ‘4 0 4 0 2 0 2 0 4 0 2 0
0 7 4 0 0 3’. The process is shown below.
e1 e2 e3
ORIGINAL STRING: 4 0 4 0 2 0 2 0 4 0 2 0 0 7 4 0 0 3 = IH, length = 18.
When the index i reaches to the value 11, IH[11] = ‘0’ and we also have the
next element, i.e., IH[12] = ‘0’. This implies that the string e2 corresponds to a run of
character e2[0] of length e2[2], which means that in our case, e2 represents a run of
character ‘0’ which has a length of 7. So, we append IH[11] to IE IH[13], 7 times. We
then move to index i + 3, equal to 14 here.
We continue the process until the end of array IH. Here the output of this stage
is the array IE, which for our example is given by IE = ‘4 0 4 0 2 0 2 0 4 0 2 0 0 0 0 0
0 0 4 0 0 0’ of length 22. We note that the array IE is equivalent to the array R which
is given as the input to the RLE-2s encoding stage of the compressor block. So, we
can say that RLE-2s is a reversible transformation.
44
4.7.3 Stage 3: MTF decoding
The input to this stage is the array IE which is the output of the RLE-2s
decoding stage. We now need to decode that data, which is shown to be equivalent to
the output R of the MTF coding stage in the compressor block. MTF algorithm is an
involution, that means the inverse of MTF function is itself. So, we just apply the
MTF encoding algorithm to the array IE.
To successfully decode the data stream, we need the character array defined in
the section 3.2.2.3.1, Y = {‘a’, ‘b’, ‘c’, ‘d’, ‘r’}. Provided Y, we can successfully
decode the data encoded by MTF algorithm.
We iterate over the array IE starting at i = 0 and each time, we append the
value at index IE[i] of the character array Y to the output IR. And we then move that
element in Y to the front of the array Y.
a r r d d a a r r c c a a a a a a a a b b b
b a a r d d a a r r c c c c c c c c a a a
List Y c b b r a r r d d a a r r r r r r r r c c c
d c c a b b b b b d d d d d d d d d d r r r
r d d b c c c c c b b b b b b b b b b d d d
c
List 4 0 4 0 2 0 2 0 4 0 2 0 0 0 0 0 0 0 4 0 0 0
IE
List r r d d a a r r c c a a a a a a a a b b b b
IR
Table 4.4: MTF decoding output.
45
As shown in the table, the entries in row 1 are Y, updated after encoding each
value of IE and the output is stored in IR.
For example, consider the first two elements, IE[0] and IE[1], now here, Y is
initially as given above, now IE[0] is ‘4’ which means we append Y[4] to the output
array IL and the list Y is updated by moving Y[4], ‘r’ to the start of the list. So, now Y
= {‘r’, ‘a’, ‘b’, ‘c’, ‘d’}. Now for IE[1] = ‘0’, the output IR[1] would be the element in
Y at the index 0, i.e., ‘r’. So, in this way, we decode till we reach the end of the array
IE.
46
in the reordering stage of the compressor block, the inverse reordering algorithm
selects the path in which the two-dimensional output array, RI (Reconstructed Image)
is to be filled.
Once the array RI is filled, we can verify the sanity of our results by
comparing RI with the input image given to the reordering stage in the compressor
block. We find that both the arrays are identical. This means that the compression is
performed in a lossless manner.
4.8 Conclusion
We described using an example, all the forward and reverse algorithms used in
both compressor and decompressor blocks of the block diagram.
47
CHAPTER 5
IMPLEMENTATION
5.1 Introduction
48
Although MATLAB is intended primarily for numeric computing, an optional
toolbox uses the MuPAD (Multi Processing Algebra Data) symbolic engine allowing
access to symbolic computing abilities. An additional package, Simulink, adds
graphical multi-domain simulation and model-based design for dynamic and
embedded systems. As of 2020, MATLAB has more than 4 million users worldwide.
They come from various background of engineering, science, and economics.
5.2 Specifications
This section deals with the software and hardware specifications required to
run the MATLAB code. And it also deals with the constraints on input images.
The whole model is divided into two programs, one for compression and the
other for decompression. The input to the compression program is an image. The
input image to the compression program must be an 8-bit grayscale image encoded in
‘TIFF’ (Tag Image File Format or tiff). The dimensions of the image must be lower
49
than 400. The output of the compression program is data embedded in the form of tiff
images in a folder in the file system. We call this data compressed data.
The inputs to the decompression program are the outputs of the compression
program. And the output is an image constructed using the compressed data. The
output image is then, if needed, compared to the original image before compression,
to verify that the compression is lossless.
The essential image file consists of data like the dimensions of the image
required by the last stage of the decompressor i.e., the undo reordering stage, to
correctly arrange the linear data into a two-dimensional array, which is then encoded
as a tiff image. It also consists of the information regarding the primary index
required by the inverse BWT stage of the decompression, and the symbols and their
frequencies in the data stream before the applying Huffman encoding to the data
stream in the compressor program, this is required for the Huffman decoding function
at the decompression block to construct the Huffman tree and decode the Huffman
encoded data stream.
50
Finally at the end of the compression program, the essential data and
compressed data are converted to 8-bit tiff images and are stored in a folder created
by the program. We also display the compression ratio as the total input image file
size divided by the total size of output images
The output is converted into an 8-bit tiff image and stored in a folder created
by the program. Finally, we cross check the output image obtained with the original
image which was given as an input to the compression program, and this proves that
the compression has been done in a lossless fashion.
5.4 Conclusion
51
CHAPTER 6
RESULTS
We now give an 8-bit tiff image as input to the compression program and the
results are as follows. The name of the image is ‘lukas_2d_8_head_0_t.tiff’
The compression ratio is the key metric that indicates the efficiency of our
program. We now give the compressed image and essential image files as inputs to
the decompression program and the results are as follows.
52
Fig 6.2: Images in compressed form.
The figure 6.3 consists of the workspace of our compression program. Let me
walk you through the variables created at various stages of the program. At the start of
the program, we import the input tiff image by setting the variable ImFileName to the
name of the image file, here ImFileName is ‘lukas_2d_8_head_0_t’ and then we
convert it into a 2-Dimensional array and that array is named as the orgImg and here it
has the dimensions 270x207. This array is now given as an input to the reordering
function. The output is a 1-Dimensional array named reordImg which has dimensions
1x55890. Now this is taken as input by the forward BWT function, and the outputs
are pidx whose value is 53255 for this image and BWTEncData which is the input
encoded using BWT, whose dimensions are 1x55891. This array goes as input to the
MTF coding block, and the output is a 1D array named MTFEncData whose
dimensions are 1x55891. Now this is the input to RLE-2s encoding function, and the
output of this is the array RLEncData whose dimensions are 1x28411. And finally,
this is given as an input to the Huffman coding block, whose outputs are
53
HuffmanDict which is a 256x2 cell that stores the mapping between symbols (0-256)
and their frequencies in the RLEncData which is used to construct the Huffman tree
used for Huffman decoding, and the other output is the HuffmanEncData which is the
array RLEncData encoded using Huffman coding. The variable avgwl is the average
word length calculated using the formula presented in the section 4.4.5. We see that
the dimensions of this array are much larger than the input to that stage, this is
because the elements are represented in a binary form, means we can encode 8
elements of that array using 8 bits or one element that can store a byte of data. So, we
perform the operation of converting the binary array HuffmanEncData into a decimal
array, and the output is the array named HuffmanRed with dimensions 1x20588. Now
finally we create a folder named ‘<image file name>_comp’ and write HuffmanRed
into a tiff file named ‘<image file name>_HuffmanRed.tiff’ and the data in
HuffmanDict into a tiff file called ‘<image file name>_Data.tiff’. For this image, the
folder name is ‘lukas_2d_8_head_0_t_comp’ and the HuffmanRed file is
‘lukas_2d_8_head_0_t_HuffmanRed.tiff’ and the file containing the HuffmanDict is
‘lukas_2d_8_head_0_t_Data.tiff’. And finally, the compression ratio is calculated by
the formula,
¿ original file
Compressionratio=
¿ compressed file
Here, size of original file is given by the size of the 2D array orgImg =
270x207 bytes = 55890 bytes. And the compressed file size is given by the sum of the
sizes of HuffmanRed array and the HuffmanDict cell = 20558 + 780 = 21338.
55890
Compressionratio= =2.6156 ≈ 2.62
21338
The output is an image that is an exact copy of the image which was given to
the compression program.
55
We now take a look at the workspace of the decompression program and
analyse its variables. The workspace of the program for this input is shown in the
figure 6.6
56
Fig 6.6: Workspace of the decompression program.
In the figure above, the string inp_img is given as the input. Here, it is the
string ‘lukas_2d_8_head_0_t’. Now the program generates the string ‘<image file
name>_comp’ from the variable inp_img and here it is ‘lukas_2d_8_head_0_t_comp’
and this is the name of the folder in which the compressed data is stored. Now we
create variables HuffmanRed and HuffmanDict with the contents of the file
‘lukas_2d_8_head_0_t_HuffmanRed’ and ‘lukas_2d_8_head_0_t_Data’ respectively
and then give them as an input to the Huffman decoding function and the output is the
array HuffmanDecData with dimensions 1x20588. Now this array is given as the
input to the RLE-2s decoding function, and the output is the array RLDecData which
has dimensions 1x55891. This is given to the MTF decoding function as the input and
the output is the array MTFDecData of dimensions 1x55891. Now this along with the
pidx obtained from the data file are given to the inverse BWT function as inputs and
the output is the array BWTDecData of dimensions 1x55891. Now the dimensions of
the original image stored in the data file along with the BWTDecData are given as
inputs to the undo reordering function, and the output is a 2D array whose dimensions
57
are 270x207. The array is then written to a tiff file, in a folder created by the program
named ‘<image file name>_comp_decomp’ and the file is named as ‘<image file
name>_comp_orgimg.tiff’. Here, for this input, the folder name is
‘lukas_2d_8_head_0_t_comp_decomp’ and the name of the output file is
‘lukas_2d_8_head_0_t_comp_orgimg.tiff’.
We can compare this image to the original image from which the compressed
folder was formed and make sure that the image is retrieved successfully. The original
image is imported into the variable RefImg in the form of a 2D array. Here in our
program, the variable named check indicates whether the output image is same as the
image given as input to the compression program. The variable check = 1 means that
the retrieval has been successful else unsuccessful.
In the below table we summarize the results of the decompression program for
this input.
We now present a few sample cases with a few input images and finally
tabulate the results obtained for various input images. We have explained the first
sample input case in a very detailed fashion. So, that the reader would know what to
interpret from the program. From now on we just present the input and output of the
program for the test cases.
58
CHAPTER 7
CONCLUSION
7.1 Conclusion
We presented the Burrows Wheeler Transform (BWT), state of the art in text
compression field, and proposed a lossless medical image compression scheme based
on this transform. Our project mainly focusses on the use of BWT as a pre-processing
stage and the addition of RLE-2s to the BWCA for the compression of medical
images to exploit the redundancies in them and to provide a good compression ratio.
Generally, a compression ratio of 2 or above is said to be excellent for lossless
compression. In our project, as we can see, most of the sample images have a
compression ratio of 2 or higher and for some images its lower than 2, but we observe
that the ratios are very close to 2. This project is really application specific, means we
cannot use it as a general-purpose compression algorithm for all images since there
might be other algorithms that perform a better job as compared to the presented
algorithm.
59
REFERENCES
60
APPENDIX
Compression program
clc;
close all;
clear;
% changing current working directory to our images folder
cd 'C:\Users\gvsrl\Documents\Academics\Literary_Documents\
Major_Project\References\Datasets\lukas_2d_8_tif\Resized\'
imshow(orgImg);
% "0" is the special character, work on the special character being the
% least frequent one in the reordImg
ks = 0 : 255;
vs = zeros(1, 256);
D = containers.Map(ks, vs);
for i = 1 : size(reordImg, 2)
D(reordImg(i)) = D(reordImg(i)) + 1;
61
end
end
for i = 1 : size(reordImg, 2)
if reordImg(i) == 0
reordImg(i) = leastFreqEle;
else
if reordImg(i) == leastFreqEle
Splmtr(end + 1) = i % may need static allocation for large
arrays
end
end
end
MTFEncData = MTF_coding(BWTEncData);
% 1x37201 uint8
62
[MaxEleRLE, RLEncData] = RL2SEncoding(MTFEncData);
% [1x1 uint16, 1x22282 uint8]
% _______________________reordering up-down__________________________
r = uint8(zeros(1, nr * nc));
for i = 1 : nc
end
end
end
63
% need to declare size(ri, 2) in a variable for easier access
for i = 1 : size(ri, 2)
s = append(s, char(ri(i)));
% char(p) --> character mapped to value p in the ASCII table
end
for i = 1 : size(ri, 2)
l2(i) = append(s(i : end), s(1 : i - 1)); % rotate from left to
right
end
for i = 1 : size(ri, 2)
s1 = char(l2(i));
end
for i = 1 : size(L, 2)
% find(Arr == val) returns array of all indices of val in arr
R(i) = find(Y == L(i)); % 1 to 256
64
R(i) = R(i) - 1; % 0 to 255
end
R = uint8(R);
end
%________________________Run-Length Encoding__________________________
for i = 1 : size(M, 2) - 1
if M(i) == M(i + 1)
c = c + 1;
else
R = [R, M(i), c];
c = 0;
end
end
for i = 2 : 2 : size(R, 2)
end
end
65
% RLE 2S is better than RLE because we code it with only one symbol for
% unit length runs and 3 symbols for others.
% So, we code 1 to 255 as runlengths 2 to 256.
function [Mx, R] = RL2SEncoding(M)
M = double(M); % to store -1 as termination character
M(end + 1) = -1; % character to terminate the count of last run
for i = 1 : size(M, 2) - 1
else
if c == 0
R = [R, M(i)]; % we append only the element if
runlength = 1
else
R = [R, M(i), M(i), c]; % we append the element twice
and the count if runlength > 1
end
end
end
for i = 1 : size(R, 2)
66
% disp(i)
% when count, c of an element, e > 256 we write it into two
seperate runs
% as [e, e, c0, e, e, c1], where c1c0 = count in base 256
% R = [R(1:i - 1) c1 R(i - 1) R(i - 1) c2 R(i + 1:end)];
Res(k) = c1;
Res(k + 1) = R(i - 1);
Res(k + 2) = R(i - 1);
Res(k + 3) = c2;
k = k + 4;
else
Res(k) = R(i);
k = k + 1;
end
end
end
%_________________________Huffman coding______________________________
D = containers.Map(ks, vs);
for i = 1 : size(R, 2)
D(R(i)) = D(R(i)) + 1; % frequency of elements in M
end
for i = 0 : 255
prbs(i + 1) = D(i) / size(R, 2); % probability =
favourable/Total
end
67
% the signal stream sig based on the probabilities in huffmandict
H = huffmanenco(R, Hfd);
H = uint8(H);
if H(end) == 0
% storing the change from right most to help while decoding
H(end + 1) = 2 ^ (8 - mod(sz, wl)) - 1;
else
H(end + 1) = 0;
end
for i = 1 : size(R, 2) - 1
for j = 1 : 8
offs = (i - 1) * 8;
R(i) = R(i) + H(offs + j) * (2 ^ (8 - j)); % converting
binary to decimal
end % write full 8 bit notations while reversing...
end
for i = 1 : 256
Hfd{i, 2} = uint8(Hfd{i, 2}); % converting Hfd to uint8 for
efficient storage
end
end
68
function [compRatio] = StoreCompData(Name, M, pridx, Oc, nr, nc, lfe,
Splmtr)
% convert to base 256, 3 digits each for nr, nc, lfe, pridx
% order of interpretation is Hfd, pridx, nr, nc, lfe, Splmtr
res = uint8(zeros(260, 3)); % the tiff file that needs to store the
additional info
end
69
b = zeros(1, 3);
rdx = 256; % the base is 256
for i = 3 : -1 : 1
b(i) = mod(n, rdx); % accumulate the remainder
n = floor(n / rdx); % n = quotient
end
b = uint8(b);
end
70
Decompression program
clc;
close all;
clear;
ks = 0 : 255;
vs = zeros(1, 256);
prbs = double(zeros(1, 256)); % double to be allowed in
huffmandict(symbols, probabilities)
for i = 1 : 256
vs(i) = Byte2Dec(DataFileContents(i, :));
end
D = containers.Map(ks, vs);
71
% forming the probability vector
for i = 0 : 255
prbs(i + 1) = D(i) / Tot;
end
Splmtr = [];
MTFDecData = MTF_inverse(RLDecData);
% 1x37201 uint8
72
tfs = isequal(RefImg, decompressedImg);
if tfs
disp("Original image is retrieved");
else
disp("Decompression unsuccessful...!!");
end
% _________________________Huffman decoding__________________________
for i = 1 : size(M, 2)
decnum = M(i); % the decimal number to be converted is decnum
end
for i = 1 : 7
else
break % break when change is detected
end
end
for i = 1 : 256
Hfd{i, 2} = double(Hfd{i, 2}); % convert to double since
huffmandeco take only double
end
73
H = double(H); % huffmandeco takes double
H = huffmandeco(H, Hfd);
end
% runlen = 1
else
runlen = 1;
M(j : j + runlen - 1) = R(i);
i = i + 1;
end
end
max(M)
M = uint8(M);
end
% _______________________Move-To-Front decoding_______________________
74
function [L] = MTF_inverse(R)
Y = 0 : 255; % same as in MTF encoding stage
L = uint8(zeros(1, size(R, 2)));
for i = 1 : size(R, 2)
R(i) = R(i) + 1; % incrementing to undo the effect of MTF
encoding stage
end
for i = 1 : size(R, 2)
L(i) = Y(R(i));
% constructing L(i) from Y and R(i)
end
for i = 2 : size(F, 2)
if F(i - 1) ~= F(i)
ks(end + 1) = F(i); % can ignore since ks is of limited
length
ps(psi) = i;
psi = psi + 1;
end
end
75
for i = 1 : size(L, 2)
T(i) = D(L(i));
D(L(i)) = D(L(i)) + 1;
end
for i = 1 : size(L, 2)
S(i) = F(T(I)); % We get S in the reverse order
I = T(I);
end
end
for i = 1 : size(ri, 2)
end
for i = 1 : nc
end
end
76
% specifying the folder and file name to store it as a tiff file
Fldr = append(Name, '_decomp');
mkdir(Fldr);
cd(Fldr);
imshow(ni);
end
n = zeros(1, 1);
rdx = 256;
for i = 1 : 3
n = n + b(i) * (rdx ^ (3 - i)); % convert byte data to decimal
end
end
77