You are on page 1of 2




Abstract This paper serves as to give an outline on, and to explain how Lossy compression is used in information technology as a data encoding method that compresses data by discarding (losing) some of it. Lossy compression is most commonly used to compress multimedia data (audio, video, and still images), especially in applications such as streaming media and internet telephony. By contrast, lossless compression is required for text and data files, such as bank records and text articles. In many cases it is advantageous to make a master lossless file that can then be used to produce compressed files for different purposes; for example, a multi-megabyte file can be used at full size to produce a full-page advertisement in a glossy magazine, and a 10 kilobyte lossy copy can be made for a small image on a web page. Index Terms- lossy compression, scalar transformation coding, vector quantization. quantization,

I. INTRODUCTION This paper serves as to give an outline on, and to explain how Lossy compression is used in information technology as a data encoding method that compresses data by discarding (losing) some of it he procedure aims to minimize the amount of data that needs to be held, handled, and/or transmitted by a computer. Typically, a substantial amount of data can be discarded before the result is sufficiently degraded to be noticed by the user.[1] Lossy compression is compression in which some of the information from the original message sequence is lost. This means the original sequences cannot be regenerated from the compressed sequence. Just because information is lost doesnt mean the quality of the output is reduced. For example, random noise has very high information content, but when present in an image or a sound file, we would typically be perfectly happy to drop it. Also certain losses in images or sound might be completely imperceptible to a human viewer (e.g. the loss of very high frequencies). For this reason, lossy compression algorithms on images can often get a factor of 2 better compression than lossless algorithms with an imperceptible loss in quality. However, when quality does start degrading in a noticeable way, it is important to make sure it degrades in a way that is least objectionable to the viewer (e.g., dropping random pixels is probably more objectionable than dropping some color information). For these reasons, the way most lossy compression techniques are used are highly dependent on the media that is being compressed. Lossy compression for sound, for example, is very different than lossy compression for images.[2] II. LOSSY COMPRESSION TECHNIQUES These are general techniques that can be applied in various contexts when compressing data.

A. Scalar Quantization A simple way to implement lossy compression is to take the set of possible messages S and reduce it to a smaller set S by mapping each element of S to an element in S. For example we could take 8-bit integers and divide by 4 (i.e., drop the lower two bits), or take a character set in which upper and lowercase characters are distinguished and replace all the uppercase ones with lowercase ones. This general technique is called quantization. Since the mapping used in quantization is many-toone, it is irreversible and therefore lossy. In the case that the set S comes from a total order and the total order is broken up into regions that map onto the elements of S, the mapping is called scalar quantization. The example of dropping the lower two bits given in the previous paragraph is an example of scalar quantization. Applications of scalar quantization include reducing the number of color bits or grayscale levels in images (used to save memory on many computer monitors), and classifying the intensity of frequency components in images or sound into groups (used in JPEG compression). In fact we mentioned an example of quantization when talking about JPEG-LS. There quantization is used to reduce the number of contexts instead of the number of message values. In particular we categorized each of 3 gradients into one of 9 levels so that the context table needs only 93 entries (actually only (93 + 1)/2 due to symmetry). The term uniform scalar quantization is typically used when the mapping is linear. Again, the example of dividing 8-bit integers by 4 is a linear mapping. In practice it is often better to use a nonuniform scalar quantization. For example, it turns out that the eye is more sensitive to low values of red than to high values. Therefore we can get better quality compressed images by making the regions in the low values smaller than the regions in the high values. Another choice is to base the nonlinear mapping on the probability of different input values. In fact, this idea can be formalizedfor a given error metric and a given probability distribution over the input values, we want a mapping that will minimize the expected error. For certain error-metrics, finding this mapping might be hard. For the root-mean-squared error metric there is an iterative algorithm known as the Lloyd-Max algorithm that will find the optimal mapping. An interesting point is that finding this optimal mapping will have the effect of decreasing the effectiveness of any probability coder that is used on the output. This is because the mapping will tend to more evenly spread the probabilities in S.. B. Vector Quantization Scalar quantization allows one to separately map each color of a color image into a smaller set of output values. In practice, however, it can be much more effective tomap regions of 3-d color space. into output values. By more effective we mean that a better compression ratio can be achieved based on an equivalent loss of quality. The general idea of mapping a multidimensional space into a smaller set of messages S is called vector quantization. Vector quantization is typically

2 implemented by selecting a set of representatives from the input space, and then mapping all other points in the space to the closest representative. The representatives could be fixed for all time and part of the compression protocol, or they could be determined for each file (message sequence) and sent as part of the sequence. The most interesting aspect of vector quantization is how one selects the representatives. Typically it is implemented using a clustering algorithm that finds some number of clusters of points in the data. A representative is then chosen for each cluster by either selecting one of the points in the cluster or using some form of centroid for the cluster. Finding good clusters is a whole interesting topic on its own. Vector quantization is most effective when the variables along the dimensions of the space are correlated. C. Transform coding More generally, lossy compression can be thought of as an application of transform coding in the case of multimedia data, perceptual coding: it transforms the raw data to a domain that more accurately reflects the information content. For example, rather than expressing a sound file as the amplitude levels over time, one may express it as the frequency spectrum over time, which corresponds more accurately to human audio perception. While data reduction (compression, be it lossy or lossless) is a main goal of transform coding, it also allows other goals: one may represent data more accurately for the original amount of space[1] for example, in principle, if one starts with an analog or high-resolution digital master, an MP3 file of a given size should provide a better representation than a raw uncompressed audio in WAV or AIFF file of the same size. This is because uncompressed audio can only reduce file size by lowering bit rate or depth, whereas compressing audio can reduce size while maintaining bit rate and depth. This compression becomes a selective loss of the least significant data, rather than losing data across the board. Further, a transform coding may provide a better domain for manipulating or otherwise editing the data for example, equalization of audio is most naturally expressed in the frequency domain (boost the bass, for instance) rather than in the raw time domain.[3]

IV. CONCLUSION With lossy compression the compressed data is not the same as the original data, but a close approximation of it and Yields a much higher compression ratio than that of lossless compression. Lossy compression formats suffer from generation loss: repeatedly compressing and decompressing the file will cause it to progressively lose quality. This is in contrast with lossless data compression, where data will not be lost via the use of such a procedure. Information-theoretical foundations for lossy data compression are provided by rate-distortion theory. Much like the use of probability in optimal coding theory, rate-distortion theory heavily draws on Bayesian estimation and decision theory in order to model perceptual distortion and even aesthetic judgment.

REFERENCES [1] [2] Introduction to Data Compression, Guy E. Blelloch Computer Science Department Carnegie Mellon University

III. TYPES There are two basic lossy compression schemes: In lossy transform codecs, samples of picture or sound are taken, chopped into small segments, transformed into a new basis space, and quantized. The resulting quantized values are then entropy coded. In lossy predictive codecs, previous and/or subsequent decoded data is used to predict the current sound sample or image frame. The error between the predicted data and the real data, together with any extra information needed to reproduce the prediction, is then quantized and coded.

In some systems the two techniques are combined, with transform codecs being used to compress the error signals generated by the predictive stage.