You are on page 1of 5

Lossless Compression of Full-Surface Solar Magnetic

Field Image Based on Huffman Coding

Yue Liu Li Luo


School of Electronic and Information Engineering School of Electronic and Information Engineering
Beijing Jiaotong University Beijing Jiaotong University
Beijing, China Beijing, China
e-mail: liuyue@bjtu.edu.cn e-mail: lluo@bjtu.edu.cn

Abstract—With the development of astronomical research, amount of coding data. Huffman coding is a method proposed
more and more space telescope satellites will be launched in the by David.A.Huffman, which uses bottom-up approach to
future. High quality space telescope images would definitely cause construct the binary coding tree. Firstly, it scans the object to
huge amount of data sent back. Because of the limited satellite data gain the probability of the characters, then assigns short code
transmission resources, images’ lossless compression on satellite is words to high probability characters and long code words to low
imperative. Huffman coding is a lossless compression algorithm probability characters, thus resulting in the best coding.
which pursues the best coding based on the probability of Huffman coding is a common compression coding method and
characters. In this paper, Huffman coding algorithm is combined often used for image compression [2]. In the use of JPEG
with several preprocessing methods to compress the full-surface
pictures’ lossless compression, the maximum compression ratio
solar magnetic field image, and the compression ratio can
maximally achieve 40% higher than non-preprocessing.
can reach 1.407, and for vector maps, the maximum
compression ratio can reach 3.687. Its efficient and lossless
Keywords—full–surface solar magnetic field image; Huffman features are widely used in audio [3], images and video.
coding; lossless compression; preprocess
II. COMPRESSION ALGORITHM
I. INTRODUCTION
A. Principle of Compression
The sun is a huge magnetic plasma, on which all the
phenomena and processes are inseparable from the interaction Image compression includes lossless compression and lossy
of the magnetic field. The major achievements and main compression [4]. Lossless compression does not allow data loss
difficulties of solar physics are related to the observation and during the compression. The compressed data can be
theoretical research of solar magnetic field. But up to now, the decompressed and is 100% identical to the original data. There
intrinsic properties (such as the basic structure) of solar magnetic is no data distortion, but the compression ratio is not high. Lossy
field is still a major scientific problem that has not been compression is distorted after decompression, but with a higher
completely overcome in astrophysics. In order to observe high compression ratio. The full-surface solar vector magnetic field
resolution solar magnetic field images, for decades, solar images’ data belong to the scientific data, especially the
physicists have been trying to choose a better site, use self- measurement accuracy of the solar magnetic field image is as
adaptive optics and image restoration methods to eliminate the high as 1: 10000 (the ratio of the effective signal to the
Earth's atmospheric disturbance and observe the basic structure background signal). If the compressed data are distorted, it will
direct. However, they have not received satisfactory results. In affect the measurement accuracy. Therefore, lossless
addition, the ground-based observation cannot achieve full-time compression is the first choice for full-surface solar vector
sun observation coverage. Therefore, telescope satellite has image compression.
become an inevitable choice for the development of solar The compression ratio is an important measurement of
physics. whether the compression algorithm is good or bad. The
Telescope satellite can acquire high-quality astronomical definition of compression ratio is as the following equation
images, however, data sending back is a huge problem. For CR = (1)
example, a size of 4K * 4K high definition solar magnetic field
image is about 32M, and every two minutes 192M data are CR equals compression ratio, represents the original
produced. If the telescope satellite works continuously in 24 image’s bytes, and represents the compressed image’s bytes.
hours, 135G data would be produced. As the data transmission
resources are limited, transmitting the images after lossless Image compression is essentially to remove the internal
compression on satellite is imperative [1]. correlation of the image, to remove the duplicate part, that is,
redundancy [5]. Redundancy can be classified by the spatial
This paper attempts to adopt several preprocessing methods redundancy, time redundancy, structural redundancy, statistical
of lossless compression based on Huffman Coding to reduce the redundancy and so on. For Huffman coding, due to its best

978-1-5090-6414-4/17/$31.00 ©2017 IEEE 899

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 06,2023 at 06:18:53 UTC from IEEE Xplore. Restrictions apply.
coding characteristics, there is almost no structural redundancy. generated corresponding to the three components L, Q, and U,
So the preprocessing part needs to remove the redundancy as which means six magnetic field images.
much as possible according to the image features, in order to
achieve the purpose of de-correlation. The solar image is as shown in Fig. 1, the middle of which is
bright, and surrounded by dark.
B. Huffman Coding
Huffman coding belongs to the greedy algorithm, because
when facing various options for each step to solve the problem,
it always makes a local optimal choice and hopes that could also
be the global optimal solution [6]. Huffman coding can be very
effective on data compression, usually can save 20% to 90% of
the space depending on the characteristics of the data. Huffman
coding needs to construct Huffman tree before encoding, giving
the code value "0" or "1" from the bottom to the top according
to the probability, and then encode the data from the top to the
bottom. The specific steps to construct Huffman coding are as
follows:
Fig. 1. Full-surface solar magnetic field image.
(1) To calculate the probability of each pixel’s data and sort
them by value; On the MATLAB software, Huffman coding is used direct
to compress the original image. The results are shown in Table
(2) To set the two least probability data as “0” or “1” and I.
combine them until the probability is normalized;
(3) The data are encoded according to the constructed TABLE I. THE RESULT OF HUFFMAN CODING ON IMAGE
Huffman tree.
Bo Huffman tree bytes of Bc data bytes of Bc Bc CR
All the Huffman encoded code words are different, and any
code word will not be the front part of another code word, which 31490048 23408598 7630039 31038637 1.015
allows the code words to be transmitted together without For the image, MATLAB is used to make statistics on the
additional isolation symbols. As long as the transmission has no data, the result is shown in Fig. 2.
error, the receiver can decode them correctly.

C. Preprocessing Method
Image preprocessing is to transform and encode the image.
The transform coding uses mathematical transformation to
convert the image data into the transform domain to obtain the
transformed data of the image. The entropy coding efficiency is
improved by the decorrelation in the transformation process.
There are transform methods like discrete cosine transform
(DCT) [7], Walsh-Hadamard transform [8], wavelet transform Fig. 2. Stastistis of original image’s pixel value.
(DWT) [9]. The former two methods and early DWT used in
image transformation are limited by the hardware system data It can be seen that the data distribution is very uneven. On
bits, so that there exist data round-off errors and these belong to the left side of the histogram is the dark part of the image whose
lossy coding. In 1996, Sweldens [10] proposed integer wavelet data value is small but the data amount is large, while the rest of
transform, making wavelet transforms available for lossless the histogram is the bright part of the image whose data value is
compression. But the algorithm of it is relatively complex and large and the data amount obeys normal distribution. From the
has more transformation. Huffman compression coding table, the amount of data coding
only takes up about a quarter of the total amount of coding, while
III. FULL-SURFACE SOLAR MAGNETIC FIELD IMAGE the Huffman tree coding amount takes up three quarters of the
COMPRESSION BASED ON HUFFMAN CODING total amount of coding, resulting in the fact that the total
The object of this study is the image of the full-surface solar compression ratio is not high.
magnetic field taken by National Astronomical Observatory In this circumstance, we consider to preprocess the image
Huairou Solar Observatory on the ground. The image is and narrow the data value’s range before compressing.
recorded in Flexible Image Transport System (FITS) format file
and each file contains only one HDU (Head/Data Units) and no
IV. FULL-SURFACE SOLAR MAGNETIC FIELD IMAGE
expansion options. The image files use 32-bit memory mode and
COMPRESSION COMBINED WITH PREPROCESSING METHOD
each one of them stores two images of size 992 * 992, which
records the left and right rotation magnetic field images of one It is not efficient to only use Huffman coding on the image
component. In each two-minute observation, three FITS files are compression, so it is required to preprocess the image on time
and spatial redundancy first, and then compress so the

900

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 06,2023 at 06:18:53 UTC from IEEE Xplore. Restrictions apply.
compression ratio can be improved. Two sets of twelve images
taken at adjacent time are shown in Fig. 3.

Fig. 3. Two sets of images taken at adjacent time.

These twelve images are very similar visually, so it is


possible that there is space and time redundancy on them.
Therefore, the preprocessing proceeds on these two aspects first.
In the choice of preprocessing method, due to the limitation
of space telescope satellite equipped processor and chip power
requirements restrictions, the algorithm design has to minimize Fig. 5. A set of L component image data at latter time.
the amount of computation, improve speed and reduce power
consumption. So the preprocessing method begins with the
simplest method to achieve a balance between algorithm
complexity and higher compression ratio. Next step is the
method of eliminating redundancy from both between images
and inside the images. Several reversible preprocessing methods
are used to perform lossless compression and compare.

A. Preprocessing Method Based on Multiple Images


The L, Q and U three components’ full-surface solar
magnetic field images are similar. Taking the image of the L
component as an example, the pixel unit’s position of the two-
dimensional image is represented by the horizontal axis of one-
dimensional coordinate, and the pixel value is represented by the
vertical axis. Fig. 4 and Fig. 5 show two sets of image data at
adjacent time, Fig. 6 shows the subtraction of L component’s
two images at the same time and the subtraction of
corresponding images at adjacent time.

Fig. 6. The subtraction results of L component images.

From the vertical axis’ pixel value in Fig. 6, after processing,


the data’s value range reduces from more than three hundred
thousand to about one hundred thousand or about twenty
thousand. The reduction of the data values’ range means the
probability of the same value’s occurrence is higher so that the
correlation of the data is stronger and the compression ratio can
be increased.
Since the magnetic field images of different components at
the same time are also visually similar, these images should be
Fig. 4. A set of L component image data at former time. preprocessed as well. Taking the corresponding images of Q
component and L component at the same time as an example,

901

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 06,2023 at 06:18:53 UTC from IEEE Xplore. Restrictions apply.
Fig. 7 shows Q component image data and the image data after TABLE II. THE RESULT OF HUFFMAN CODING ON IMAGE AFTER
PREPROCESSING
preprocessing.
Huffman tree data bytes
method Bc CR
bytes of Bc of Bc
original 23408598 7630039 31038637 1.015
same component
14080905 7279981 21360886 1.474
at the same time
same component
18327211 7429564 25756775 1.223
at adjacent time
different
20371463 7570304 27941767 1.127
component
subtraction
20682826 7601434 28284260 1.113
between blocks
subtraction
20733034 7630039 28363073 1.110
between pixels
subtraction with
22054593 7497613 29552206 1.066
average
From the Table II, it can be seen that for a group of six
images of the same time, the subtraction of the same
component’s two images can be compressed to a large extent,
thus the enhancement of its compression ratio is most significant.
That is because the correlation between the left and right rotation
images is strong. The subtraction between the two components
and the subtraction between images of adjacent time are also
Fig. 7. Q component image data and subtraction result. contributing to the compression ratio, although the effect is not
as good as the subtraction in same components at the same time.
From the vertical axis’ pixel value of Fig. 7, after processing,
the data’s value range reduces from more than three hundred Three preprocessing methods for one single image also
thousand to less than two hundred thousand. Although the range increase the compression ratio. From the total coding amount,
of data value has been reduced, the effect is not significant the most significant of which is compression of the subtraction
enough. between adjacent blocks. But, by only calculating the data
coding not the Huffman tree coding, the compression of the
B. Preprocessing Method Based on Single Image subtraction between adjacent pixels is the most significant.
In addition to eliminating redundancy in multiple images, For the data preprocessed by the above preprocessing
there is also much redundancy in one single image. From Fig. 2, methods, MATLAB is used to get its extreme value and data
it can be seen that the number of small value pixels occupies a distribution characteristics, summarized as Table III.
large part, that is, the surrounding dark part of the image. As for
the bright ball part, the data change relatively slow. Therefore, TABLE III. THE CHARACTERISTICS OF PREPROCESSED IMAGE DATA
three kinds of preprocessing methods are proposed according to
the characteristics: method maximum minimum data range
(1) Divide the image into blocks and compress the original   
subtraction between adjacent blocks. As the data change same component
relatively slow, the subtraction would be relatively small. at the same time
  
same component
(2) Divide the image into blocks and compress the at adjacent time
  
subtraction between adjacent pixels. This is aiming at making different
  
the subtraction smaller. component
subtraction
(3) Divide the image into blocks and compress the   
between blocks
subtraction between the pixel value and the average value of the subtraction
  
blocks. between pixels
subtraction with
  
The purpose of these three preprocessing methods is to average
reduce the data range, the correlation within the image, so the It can be seen from the Table III, after preprocessing on
compression ratio can be increased. multiple images, the pixels’ value range of the same component
at the same time and the corresponding images at adjacent time
reduces a lot, while the range of the method in different
V. EXPERIMENTAL RESULTS
components doesn’t reduce much. For the preprocessing method
Huffman coding is used to encode after six proposed on one single image, only the subtracting average value
preprocessing methods are ready. Then it calculates the amount method’s value range reduces, the other two methods’ value
of coding, the amount of Huffman tree and the compression ratio. ranges do not change a lot.
The results are shown in Table II.
For each processing method, a histogram is drawn to see its
distribution of data value, as shown in Fig. 8.

902

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 06,2023 at 06:18:53 UTC from IEEE Xplore. Restrictions apply.
performs best; for one single image, the method of subtraction
between adjacent pixels performs best.

ACKNOWLEDGMENT
This work was supported by National Natural Science
Foundation of China (U1431119).
REFERENCES
[1] Dawood, A. S., J. A. Williams, and S. J. Visser. "On-board satellite image
compression using reconfigurable FPGAs." IEEE International
Conference on Field-Programmable Technology IEEE, 2002:306-310.
[2] K. Sahnoun and N. Benabadji, "On-board satellite image compression
using the Fourier transform and Huffman coding," 2013 World Congress
on Computer and Information Technology (WCCIT), Sousse, 2013, pp.
1-5.
Fig. 8. Distribution of preprocessed image data. [3] Brzuchalski, Grzegorz. "Huffman coding in advanced audio coding
standard." Proceedings of SPIE - The International Society for Optical
Engineering 8454(2012):1-6.
As shown in the Fig. 8, both the methods on multiple images
[4] M. Sundaresan, et al. "Lossy and Lossless Compression using various
and on one single image have the effect of concentrating the data Algorithms." International Journal of Computer Applications
values in a smaller interval, and the data distribution becomes 65.20(2013):11-14.
centralized. [5] Vahid, et al. "Image compression based on spatial redundancy removal
and image inpainting." Journal of Zhejiang Universityence C
VI. CONCLUSION 11.2(2010):92-100.
[6] D. Huffman, "A method for the construction of minimum-redundancy
Huffman coding is not the most suitable coding method for codes", Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, Sept 1952.
full-surface solar magnetic field image compression. Because of [7] K. Komatsu and K. Sezaki, "Reversible discrete cosine transform,"
its double scanning characteristics and the Huffman tree coding Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998
amount, the compression ratio is difficult to enhance. But IEEE International Conference on, Seattle, WA, 1998, pp. 1769-1772
according to the data characteristics, if code table is stable and vol.3.
encoding direct transmission is feasible, the compressing space [8] Jung Ho-Youl, "A unified mathematical form of the Walsh -Hadamard
transform for lossless image data compression, " Signal Processing, 1997,
can be pretty large. (63): 35-43.
Combined with the statistics of extreme value and data range [9] Calderbank A. R, "Lossless image compression using integer to integer
of each preprocessing method, for multiple images, the method wavelet transform, " Image Processing Proceedings, 1997, 1: 596-599.
of subtraction in two images within the same component [10] Sweldens W, "The lifting scheme: A custopm-design construction of
Biorthogonal wavelets, " Appl. Comput.Harmon, Anal, 1996.

903

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 06,2023 at 06:18:53 UTC from IEEE Xplore. Restrictions apply.

You might also like