This action might not be possible to undo. Are you sure you want to continue?

# Discrete Cosine Transform and JPEG Files

Amber Habib

Mathematical Sciences Foundation Delhi www.mathscifound.org

Abstract A digital image can be viewed as an array of numbers, each number representing the colour value of the corresponding pixel. In the JPEG format, these numbers are stored indirectly, via their discrete cosine transform. This enables easy compression, resizing, etc. For further savings, the array produced by the discrete cosine transform is stored using Huﬀman encoding. The calculations and plotting were carried out using Mathematica. These notes were prepared for MSF’s Programme in Mathematical Simulation and IT. They provided the base for student projects in image manipulation using Matlab. The projects used Fourier analysis as well as wavelets.

Contents

1 Discrete Fourier Transform 2 Discrete Cosine Transform 3 The Two Dimensional Discrete Cosine Transform 4 Huﬀman Encoding 2 4 6 10

1

1 DISCRETE FOURIER TRANSFORM

2

1

Discrete Fourier Transform

Consider the data depicted in the following graph:

5 4 3 2 1

1

2

3

4

5

6

To represent this data in a way that can be easily manipulated for diﬀerent purposes, we wish to construct a function that passes through all the data points. More speciﬁcally, we construct a function of the form A0 f (x) = + 2

5 4

Ak cos(kx) +

k=1 k=1

Bk sin(kx),

(1)

where the Ak ’s and Bk ’s are suitably chosen constants. This function is called the discrete Fourier transform of the data. Note that we have 10 data points1 and 10 unknown constants. Further, each data point, on substitution in (1) creates a linear equation for the unknowns. So we can hope to solve this linear system and obtain a unique set of values for Ak and Bk . For our data, the Ak ’s turn out to be given by the vector A = (5.65486, −0.628319, −0.628319, and the Bk ’s by the vector B = (−1.93377, −0.864806, −0.4565, −0.204153, 0).

1

−0.628319, −0.628319, −0.314159),

Actually 11, but the last is just a repeat of the ﬁrst.

1 DISCRETE FOURIER TRANSFORM

3

Note: We have not said anything about how to ﬁnd the coeﬃcients Ak and Bk . Our immediate interest is in observing that this knowledge is useful, and then later we will see how to obtain it. The discrete Fourier transform f (x) passes exactly through the data points:

6 5 4 3 2 1

1

2

3

4

5

6

Now we investigate the contribution of the diﬀerent coeﬃcients Ak and Bk . Suppose we set A4 and B4 to zero. Then the function becomes:

5 4 3 2 1

1

2

3

4

5

6

This function doesn’t represent the data exactly but it does roughly follow the general trend. Now let us instead drop the A2 and B2 terms:

2 DISCRETE COSINE TRANSFORM

4

5 4 3 2 1

1

2

3

4

5

6

The loss in quality is much greater. This shows that the “higher order” terms contribute less than the “lower order” terms. Therefore, we need not store them to the same order of accuracy. Suppose then, that we round oﬀ the last couple of coeﬃcients of A and B: A = (2.82743, −0.628319, −0.628319, −0.628319, −0.6, −0.3),

B = (−1.93377, −0.864806, −0.4565, −0.2, 0).

**This makes no noticeable diﬀerence to the accuracy of the interpolation:
**

6 5 4 3 2 1

1

2

3

4

5

6

2

Discrete Cosine Transform

It is possible to manipulate the discrete Fourier transform of a set of data so that it consists of only cosine terms (all the Bk ’s are zero). The beneﬁts from this are simpler computational procedures, especially when we deal

2 DISCRETE COSINE TRANSFORM

5

with data which comes as arrays instead of lists. Thus, consider a string of data, such as Data = (123, 157, 142, 127, 131, 102, 99, 235). Instead of distributing these values at evenly spaced points 2πk/8, k = 0, 1, . . . , 7, we associate them to points π(2k + 1)/16. Further, we symmetrically assign the same values to the points −π(2k + 1)/16. Thus, we get a collection of data points that is symmetric with respect to the y-axis:

220 200 180 160 140 120

-3

-2

-1

1

2

3

If we calculate the discrete Fourier transform for such data, we ﬁnd that the sine terms vanish (because sine is odd) and only the cosine terms remain (because cosine is even, like the data). This special form is called the discrete cosine transform of the data. For data such as we have given (8 points), the discrete cosine transform is f (x) = A0 + 2

7

Ak cos(kx),

k=1

**where the Fourier coeﬃcients Ak are given by 1 Ak = 4
**

7

Data(n) cos (2n + 1)π

n=0

k 16

.

Note that we have numbered the data points as 0, 1, . . . , 7. For the example we have given, this formula produces the following values for the Fourier coeﬃcients: A0 = 139.5 A1 = −10.04 A2 = 24.25 A3 = −35.36 A4 = 20.51 A5 = −28.66 A6 = 6.79 A7 = −4.22

3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM

6

The corresponding cosine transform f (x) exactly passes through the data points:

250 225 200 175 150 125 100 0.5 1 1.5 2 2.5 3

3

The Two Dimensional Discrete Cosine Transform

A digital image consists of a rectangular array of closely packed pixels, each of whom is assigned a colour value. These colour values are given by numbers, and various formats exist for mapping colours to corresponding numbers. For instance, in one format, colours are broken up into their red, green and blue (RGB) components and a particular colour is chosen by assigning a corresponding intensity (via a number) to each of the RGB components. For example, the background colour of the following box is obtained by setting R = G = B = 0.8 : A Shaded Box Thus, a digital image, for the mathematician, is just an array of numbers. To manipulate this array, we use a two dimensional version of the discrete cosine transform. Consider the data depicted in the following table:

3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 123 134 135 176 137 121 117 168 157 135 144 183 149 130 151 170 142 157 159 161 154 127 160 171 127 112 108 111 126 146 181 178 131 109 112 186 185 205 250 183 102 106 118 130 146 150 161 179 99 108 109 132 131 130 134 112 235 136 126 133 132 126 125 124

7

We number the rows and columns as 0, 1, 2, . . . , 7. Thus the (0, 0) entry is 123, the (3, 7) entry is 133, etc. To this data, we apply the two dimensional discrete Cosine transform, deﬁned by: DCT(u, v) = 1 C(u)C(v) 4

7 7

x=0 y=0

Data(x, y) × u v cos π(2y + 1) 16 16 .

cos π(2x + 1)

Here Data(x, y) refers to the (x, y) entry in the data table given above. The coeﬃcients C(u) and C(v) are deﬁned by 1 C(h) = √ if h = 0 and 2 C(h) = 1 if h = 0.

The discrete cosine transform produces the following table, after rounding:2

1149 -81 14 1 44 36 -19 -5

2

39 -3 -11 -61 13 -12 -8 -14

-43 114 0 -14 36 -9 21 -11

-10 -74 -43 -12 -5 -5 -6 -18

25 -6 25 36 9 20 3 -5

-84 -2 -3 -24 -22 -29 2 -1

11 21 17 -18 6 -21 11 7

41 -6 -39 4 -8 13 -22 -5

We always round oﬀ the results of our calculations to integers, because (1) integers take less space than reals, (2) integer operations are faster, and (3) color values are usually integers. It is one of the important strengths of the discrete cosine transform that the errors introduced by the rounding oﬀ are inconsequential.

3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM

8

The ﬁrst thing is to establish that we can recover the data from its discrete cosine transform. For this purpose we deﬁne the inverse discrete cosine transform by IDCT(x, y) = 1 4

7 7

u=0 v=0

C(u)C(v)DCT(u, v) × .

u v cos π(2y + 1) 16 16 If we apply the IDCT to the DCT table, we get (after rounding): cos π(2x + 1) 123 134 135 176 137 121 117 168 157 135 144 183 149 130 151 170 142 157 159 161 154 127 160 171 127 112 108 111 126 145 181 178 131 109 112 186 185 205 250 183 102 106 118 130 146 150 161 179 99 108 109 132 131 130 134 112 235 135 127 133 132 126 125 124

Can you spot any diﬀerence between this and the original data? Suppose we store the data via its DCT. We ask if we can aﬀord to lose some of the details of the DCT without signiﬁcantly aﬀecting the quality of the data. One way to reduce the amount of space required by the DCT is to divide every entry by, say, 8 (thus saving 3 bits per entry since the numbers are stored in binary). Then the DCT becomes 144 -10 2 0 6 4 -2 -1 5 0 -1 -8 2 -2 -1 -2 -5 14 0 -2 4 -1 3 -1 -1 -9 -5 -2 -1 -1 -1 -2 3 -1 3 4 1 2 0 -1 -10 0 0 -3 -3 -4 0 0 1 3 2 -2 1 -3 1 1 5 -1 -5 0 -1 2 -3 -1

Clearly, this “Compressed DCT” occupies much less space. To recover the original data, we just uncompress by multiplying by 8, and then apply IDCT.

3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM

9

This time there is some loss: 122 142 138 175 137 119 115 174 161 132 146 184 148 132 149 173 145 156 155 163 155 126 156 171 130 116 105 110 128 147 177 180 128 109 109 190 182 204 248 186 106 107 118 127 149 149 159 178 101 108 113 132 130 133 137 114 233 132 127 134 133 129 125 123

Another approach is to compress the entries on the top left less (as these are more signiﬁcant). For example, we divide the entries in the top left 4 × 4 submatrix of DCT by 2, and all the other entries by 8:

574 -40 7 0 6 4 -2 -1

20 -2 -6 -30 2 -2 -1 -2

-22 57 0 -7 4 -1 3 -1

-5 -37 -22 -6 -1 -1 -1 -2

3 -1 3 4 1 2 0 -1

-10 0 0 -3 -3 -4 0 0

1 3 2 -2 1 -3 1 1

5 -1 -5 0 -1 2 -3 -1

We uncompress the last table by multiplying by 2 and 8 in the appropriate places. Then we apply IDCT, and we get: 120 138 135 174 139 121 114 170 159 130 145 183 149 133 148 172 144 156 156 164 154 125 156 172 129 116 106 111 127 146 177 182 127 109 109 191 183 204 248 186 105 106 117 127 149 149 158 177 100 108 113 131 128 131 136 113 234 133 127 132 129 125 123 123

This hybrid approach oﬀers almost as much compression as the previous one, with lower loss of quality.

4 HUFFMAN ENCODING

10

4

Huﬀman Encoding

The discrete cosine transform produces the numbers used to store and transmit an image. However, these numbers are not stored according to their values, but through a code that further reduces the required space. This code names numbers according to their frequency. More frequent numbers are given shorter codes. Consider the compressed DCT we had obtained in the last section: 144 -10 2 0 6 4 -2 -1 5 0 -1 -8 2 -2 -1 -2 -5 14 0 -2 4 -1 3 -1 -1 -9 -5 -2 -1 -1 -1 -2 3 -1 3 4 1 2 0 -1 -10 0 0 -3 -3 -4 0 0 1 3 2 -2 1 -3 1 1 5 -1 -5 0 -1 2 -3 -1

We will construct a binary tree out of the numbers in this grid. Step 1. List all the numbers occurring in the table, along with their frequencies: Data Freq. -10 2 -9 1 -8 1 -5 3 -4 1 -3 4 -2 7 -1 14 0 9 1 5 2 5 3 4 4 3 5 2 6 1 14 1 144 1

Step 2. Arrange the numbers in increasing order of frequency: Data Freq. -9 1 -8 1 -4 1 6 1 14 1 144 1 -10 2 5 2 -5 3 4 3 -3 4 3 4 1 5 2 5 -2 7 0 9 -1 14

Each number will become a ‘leaf’of the binary tree. This leaf will be labelled by the number and its frequency. For instance, since 5 has frequency 2, the corresponding leaf will be drawn as 5:2 . Step 3. Two leaves with the lowest frequency are combined into one node. This node is labelled by the sum of their frequencies. Thus, we get

4 HUFFMAN ENCODING

n 2 e e e

11

¡ ¡ ¡

-9:1

-8:1

We repeat this step, with the following modiﬁcation: Leaves and nodes already collected below a node are ignored while comparing frequencies. Only the top nodes and remaining leaves are taken into account. Step 4. The starting situation is: -4:1 6:1 14:1 144:1

n -10:2 2 e e e

¡ ¡ ¡

5:2

-5:3

···

-9:1

-8:1

**On collecting the lowest frequency leaves under a node, we get: 14:1 144:1
**

¡ n 2 e e e n -10:2 2 e e e

¡ ¡

¡ ¡ ¡

5:2

-5:3

···

-4:1 Step 5.

n 2 e e e ¡

6:1

-9:1

-8:1

¡ ¡ ¡

¡ ¡

n 2 e e e

¡ ¡ ¡

n -10:2 2 e e e

5:2

···

14:1

144:1

-4:1

6:1

-9:1

-8:1

Step 6.

n -10:2 2 e e e

¡ ¡ ¡

5:2

-5:3

4:3

-9:1

-8:1

¡ ¡ ¡

¡ e ¡ e ¡ e ¡ e ¡ e n n 2 2 e e e ¡ ¡ ¡

n 4

-3:4

···

e e e

14:1

144:1

-4:1

6:1

4 HUFFMAN ENCODING

¡ ¡

¡

12

64

e e e e e e

¡ 0¡ ¡

¡ ¡ ¡ n 7

0 ¡

¡

¡ e e 1 e ¡ e ¡ ¡ e n 14 -1:14

¡ n 28

¡ ¡

¡ 0¡ ¡ ¡

¡ ¡

e e1 e e

e e

0¡

e e 1 e e

e

-2:7

4:3

¡ 0¡ ¡

¡ ¡ 0¡ ¡ ¡ n 2 e e 1 e

e e 1 e n 4 e e e1 e e

-10:2

¡ 0¡ ¡

¡ e ¡ e 0¡ e 1 e 1 ¡ e ¡ e n 0:9 10 ¡ e ¡ e ¡ e 0¡ e 1 n n 8 9 ¡ e 2:5 ¡ e ¡ e 1:5 e 0¡ e1 0 ¡ 1 ¡ e ¡ e n n -3:4 3:4 5 4 ¡ e ¡ e 0¡ e 1 ¡ e ¡ e e1 0¡ 5:2 -5:3 e ¡ ¡ e n n 2 2

¡ e e 1 ¡ e ¡ e ¡ e n n 17 19

n 36

0 ¡

0¡

e e1 e

0¡

¡

¡

e e 1 e

-9:1

-8:1

14:1

144:1

-4:1

6:1

Figure 1: The binary tree for the Huﬀman code. By now, the general scheme should be clear. It is evident that we have made certain choices in each step: namely the order in which we write nodes/leaves having the same frequency. This does aﬀect the ﬁnal binary tree we obtain. However, once we have described the method of coding, it will be obvious that these choices do not aﬀect the eﬃciency of the encoding. Figure 1 shows the ﬁnal binary tree for our data. We have also labelled each branch of the tree: by 0 if it is a left branch and by 1 if it is a right branch. The encoding proceeds as follows. To obtain the code for a value, start from the root (the node labelled 64) and move down to the value, noting down each 0 or 1 label for a branch as you cross it. Thus, in moving to the leaf for the value -10, we obtain the sequence 00011. This is the code for that

4 HUFFMAN ENCODING

13

144

E5

−5

−1

'

−10

0

14

−9

2

−1

0

−5

0

−8

−2

−2

Figure 2: The sequence in which values are encoded. value. Note that the most frequent value (-1) has the shortest code (01), and the less frequent ones have progressively longer codes. A value such as 144, with frequency 1, has the longest code: 100001. The table is coded by going through the values one-by-one in the zigzag manner shown in Figure 2 and writing their codes – without any separators! For instance the starting sequence 144, 5, -10,. . . , becomes 1000011011000011. . . . (144 → 100001, 5 → 10110, −10 → 00011) To decode this string, one need only refer to the tree. We start at the root and follow the left or right branches according to whether we see a 0 or a 1. When we reach a leaf, we note the corresponding value and start again at the root. Exercise. Show that our table of values can be described by 231 binary digits if we use Huﬀman encoding. If, on the other hand, we had worked with codes of ﬁxed length, we would have needed 320 binary digits.