You are on page 1of 6

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8 No. 8, 2010

Retrieval of Bitmap Compression History

Salma Hamdy, Haytham El-Messiry, Mohamed Roushdy, Essam Kahlifa


Faculty of Computer and Information Sciences
Ain Shams University
Cairo, Egypt
{s.hamdy, hmessiry, mroushdy, esskhalifa}@cis.asu.edu.eg

Abstract—The histogram of Discrete Cosine Transform threat in the public domain. Hence, ensuring that media
coefficients contains information on the compression parameters content is credible and has not been altered is becoming an
for JPEGs and previously JPEG compressed bitmaps. In this important issue governmental security and commercial
paper we extend the work in [1] to identify previously applications. As a result, research is being conducted for
compressed bitmaps and estimate the quantization table that was
developing authentication methods and tamper detection
used for compression, from the peaks of the histogram of DCT
coefficients. This can help in establishing bitmap compression techniques. Usually JPEG compression introduces blocking
history which is particularly useful in applications like image artifacts and hence one of the standard passive approaches is
authentication, JPEG artifact removal, and JPEG recompression to use inconsistencies in these blocking fingerprints as a
with less distortion. Furthermore, the estimated table calculates reliable indicator of possible tampering [14]. These can also be
distortion measures to classify the bitmap as genuine or forged. used to determine what method of forgery was used.
The method shows good average estimation accuracy of around In this paper we are interested in the authenticity of the
92.88% against MLE and autocorrelation methods. In addition, image. We extend the work in [1] to bitmaps and use the
because bitmaps do not experience data loss, detecting proposed method for identifying previously compressed
inconsistencies becomes easier. Detection performance resulted in
bitmaps and estimating the quantization table that was used.
an average false negative rate of 3.81% and 2.26% for two
distortion measures, respectively. The estimated table is then used to determine if the mage was
forged or not by calculating distortion measures.
In section 2 we study the histogram of DCT AC
Keywords: Digital image forensics; forgery detection; compression coefficients of bitmaps and show how it differs for previously
history; Quantization tables. JPEG compressed bitmaps. We then validate that without
modeling rounding errors or calculating prior probabilities,
I. INTRODUCTION quantization steps of previously compressed bitmaps can still
Although JPEG images are the most widely used image be determined straightforward from the peaks of the
approximated histograms of DCT coefficients. Results are
format, sometimes images are saved in an uncompressed raster
discussed in section 3. Section 4 is for conclusions.
form (bmp, tiff), and in most situations, no knowledge of
previous processing is available. Some applications are II. HISTOGRAM OF DCT COEFFICIENTS IN BITMAPS
required to receive images as bitmaps with instructions for
rendering at a particular size and without further information. We studied in [1] the histogram of quantized DCT
The image may have been processed and perhaps compressed coefficients and showed how it can be used to estimate
with contain severe compression artifacts. Hence, it is useful quantization steps. Here, we study uncompressed images and
to determine the bitmap history; whether the image has ever validate that the approximated histogram of DCT coefficients
been compressed using the JPEG standard and to know what can be used to determine compression history. Bitmap image
quantization tables were used. Most of the artifact removal means no data loss and hence all what is required to build an
algorithms [2-9] require the knowledge of the quantization informative histogram is expected to be present in the
table to estimate the amount of distortion caused by coefficients histograms.
quantization and avoid over-blurring. In other applications, The first step is to decide if the test image was previously
knowing the quantization table can help in avoiding further compressed because if the image was an original
distortion when recompressing the image. Some methods try uncompressed there is no compression data to extract. When
to identify bitmap compression history using Maximum the image is decided to have a compression history, the next
Likelihood Estimation (MLE) [10-11] or by modeling the step is to estimate that history. For grayscale image,
distribution of quantized DCT coefficients, like the use of compression history mainly means its quantization table which
Benford’s law [12], or modeling acquisition devices [13]. will be the focus of this paper. For color image, this is
Furthermore, due to the nature of digital media and the extended to estimating color plane compression parameters
advanced digital image processing techniques, digital images that includes subsampling and associated interpolation.
may be altered and redistributed very easily forming a rising

141 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010

(a) (b)
(a) Lena image (b) Uncompressed

(c) (d)
Fig. 2. (a) |X*(3,3)| where Hmax occurs at Q(3,3)=6. (b) |X*(3,4)| where Hmax
(c) JPEG compressed Q(3,3)=6 (d) Previously compressed bmp occurs at Q(3,4) = 10 (c) |X*(5,4)| where Hmax occurs at Q(5,4)=22. (d)
|X*(7,5)| where Hmax occurs at Q(7,5) = 41.
Fig. 1. Histograms of X*(3,3).

Fig. 1(b) shows the approximated histogram H* of DCT estimate as many of the low frequencies and then search
coefficient at position (3,3) of the luminance channel of an through lookup tables for a matching standard table.
uncompressed Lena image and the histogram of the image Estimating the quantization table of a bitmap can help
after being JPEG compressed with quality factor 80. It is clear determine part of its compression history. If all (or most of) of
that the latter contains periodic patterns that are not present in the low frequency steps were estimated to be ones, we can
the uncompressed version. It was observed that the coefficient conclude that the image did not go through previous
is very likely to have been quantized with a step of this compression. High frequencies may bias because they have
periodic [15]. Now if that JPEG was stored in a bitmap very low contribution and do not provide a good estimate.
uncompressed form, we expect the DCT coefficients to have Moreover, this method works well also for uncompressed or
the same behavior because nothing is lost during this format lossless compressed tiff images. Fig. 3(d) shows the 96.7%
change. This is evident in Fig. 1(d) which shows an identical correctly estimated Q table using the above method of a tiff
histogram to the one in Fig. 1(c). Hence, similar to the image taken from UCID [16]. The X’s mark the
argument in [1], if we closely observe the histogram of H*(i,j) “undetermined” coefficients.
outside the main lobe, we notice that the maximum peak Now for verifying the authenticity of the image, we use the
occurs at a value that is equal to the quantization step used to same distortion measures we used in [1]. The average
quantize Xq(i,j). This observation applies to most low distortion measure is calculated as a function of the
frequency AC coefficients. Fig. 2(a) and (b) show |H|, the remainders of DCT coefficients with respect to the original Q
absolute histograms of DCT coefficients for Lena of Fig. 1(a) matrix:
at frequencies (3,3) and (3,4), respectively. As for high 8 8
frequencies, the maximum occurred at a value matching Q(i,j) B1    modD(i, j), Q(i, j) (2)
when |X*(i,j)|>B, (Fig. 2 (c) and (d)), where B is as follows: i 1 j 1
*
Γ  X (i,j) X q (i,j)  B(i,j) where D(i,j) and Q(i,j) are the DCT coefficient and the
corresponding quantization table entry at position (i,j),
( 2u  1 )iπ ( 2v  1 )jπ (1) respectively. An image block having a large average distortion
  0.5 c(u) c(v) cos
16
.cos
16 value indicates that it is very different from what it should be
u,v and is likely to belong to a forged image. Averaged over the
where Xq(i,j) is the quantized coefficient, and X*(i,j) is the entire image, this measure can be used for making a decision
approximated quantized coefficient, Γ is the round off error, about authenticity of the image.
 In addition, the JPEG 8×8 “blocking effect” is somehow
and c( )  1 2 for   0
still present in the uncompressed version and hence blocking
1 otherwise artifact measure, BAM [14], can be used to give an estimate of
See [1, 11]. the distortion of the image. It is computed from the Q table as:
Sometimes we do not have enough information to 8 8
 D(i, j ) 
determine Q(i,j) for high frequencies (i,j). This happens when B2 (n)    D(i, j)  Q(i, j) round  Q(i, j)  (3)
the histogram outside the main lobe decays rapidly to zero i 1 j 1
showing no periodic structure. This reflects the small or zero where B(n) is the estimated blocking artifact for the nth block.
value of the coefficient. At such cases, it can be useful to

142 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010

5 4 3 2 1 1 1 1
4 1 1 1 1 10 10 10
1 1 1 1 1 10 10 10
1 1 1 1 1 10 10 10
1 1 1 1 14 12 12 12
1 1 1 1 12 13 11 11
1 1 1 1 13 11 12 11
1 1 1 1 13 12 12 12
(a) Test image (b) Estimated Q for uncompressed version (most low frequencies are ones).
3 4 4 6 10 16 20 24 3 0 0 0 0 0 0
5 5 6 8 10 23 24 22 0 0 0 0 0 0 0
6 5 6 10 16 23 28 22 0 0 0 0 0 0 0
6 7 9 12 20 35 32 25 0 0 0 0 0 0 0
7 9 15 22 27 44 41 31 0 0 0 0 0 0 0
10 14 22 26 32 42 45 37 0 0 0 0 0 0 0
20 26 31 35 41 48 47 X 0 0 0 0 0 1 X
29 37 38 39 45 40 X X 0 0 0 0 0 X X
(c) Estimated Q for previously compressed version with QF = 80. (d) Difference between (c) and original table for QF=80.
Fig. 3. Estimating Q table for original and previously compressed tif image.

that of JPEG images [1]. We anticipate that because lossy


III. EXPERIMENTAL RESUTLS AND DISCUSSION compression tends to lessen available data to make a better
estimate. Average estimation time for all 64 entries of images
A. Estimation Accuracy of size 640×480 for different QFs was 52.7 seconds.
Our testing image set consisted of 550 images collected Estimating Q using MLE methods [10-11] is based on
from different sources (more than five camera models), in searching for all possible Q(i,j) for each DCT coefficient over
addition to some from the public domain Uncompressed Color the whole image which can be computationally exhaustive for
Image Database (UCID), which provides a benchmark for large size files. Another method [12] proposed a logarithmic
image processing analysis [16]. Each of these images was law and argued that the distribution of the first digit of DCT
compressed with different quality factors, [60, 70, 80, and 90]. coefficients follows that generalized Benford’s law. The
Again, each of these was uncompressed and resaved as method is based on re-compressing the test image with several
bitmap. This yielded 550×4 = 2,200 untouched images. For quality factors and fitting the distribution of DCT coefficients
each quality factor group, an image’s histogram of DCT of each version to the proposed law. The QF of the version
coefficients at one certain frequency was generated and used having the least fitting artifact is chosen and its corresponding
to determine the corresponding quantization step at that Q table is the desired one. Of course the above methods can
frequency according to section 2. This was repeated for all the only estimate standard compression tables. Although it may be
64 histograms of DCT coefficients. The resulting quantization accurate, it is time consuming. Plus it fails when the re-
table was compared to the quality factor’s known table and the compression quantization step is an integer multiple of the
percentage of correctly estimated coefficients was recorded. original compression step size. Another method [17] tends to
Also, the estimated table was used in equations (2) and (3) to calculate the autocorrelation function of the histogram of DCT
determine the image’s average distortion and blocking artifact coefficients. The displacement corresponding to the peak
measures, respectively. These values were recorded and used closest to the peak at zero is the value of Q(i,j) given that the
later to set a threshold value for distinguishing forgeries from peak is higher than the mean value of the autocorrelation
untouched images. function. The method eventually uses a hybrid approach; the
Table 1 shows the accuracy of estimating all 64 entries low frequency coefficients are determined directly from the
using the proposed method for each quality factor averaged autocorrelation function, while the higher-frequency ones are
over the whole set. It exhibits a similar behavior to JPEG estimated by matching the estimated part to standard JPEG
images; as quality factor increases, estimation accuracy tables scaled by a factor of s, which is determined from the
increases steadily with an expected drop for quality factors known coefficients.
higher than 90 as the periodic structure becomes less Table 2 shows the estimation accuracy while Table 3
prominent and the bumps are no longer separate enough . shows estimation time, for the different mentioned methods
Overall, we can see that the estimation accuracy is higher than against ours. Note that accuracy was calculated for directly
estimating only the first nine AC coefficients without
TABLE I. PERCENTAGE OF CORRECTLY ESTIMATED COEFFICIENTS matching. This is due to the methods failing to estimate high
FOR SEVERLA QFS
frequency coefficients as most of them are quantized to zero.
QF 60 70 80 90 On the other hand, the listed time is for estimating the nine
BMP 82.07% 84.80% 87.44% 89.44% coefficients and then retrieving the whole matching table from
JPEG[1] 72.03% 76.99% 82.36% 88.26% JPEG standard lookup tables. Maximum peak is faster than

143 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010

TABLE II. ESTIMATION ACCURCAY FOR THE FIRSY 3×3 AC


COEFFICIENTS FOR SEVERAL QFS

QF 50 60 70 80 90 100 Avg.
Method Acc.
MLE 75.31 83.10 90.31 96.34 93.83 59.5 83.06
Benford 99.08 87.59 80.82 93.81 59.47 31.53 75.38
Auto. 48.94 50.37 63.71 81.43 65.37 57.50 61.22
Max.Peak 97.93 97.07 99.01 97.67 89.57 76.04 92.88

TABLE III. ESTIMATION TIME IN SECONDS FOR THE FIRSY 3×3 AC


COEFFICIENTS FOR SEVERAL QFS

QF 50 60 70 80 90 100 (a) Average distortion measure (b) Blocking artifact measure


Method
Fig. 4. Distortion measures for untouched and tampered images.
MLE 38.73 37.33 37.44 37.36 37.32 34.14
Benford 59.95 58.67 58.70 58.72 58.38 80.04 shows, values from forged images tend to cluster higher than
Auto 9.23 11.11 11.10 11.12 11.24 8.96 those from untampered images. We tested the distortion
Max.Peak 11.27 11.29 11.30 11.30 11.30 11.56 measure for untouched images against several threshold values
and calculated the corresponding false positive rate FPR (the
statistical modeling and nearly as fast as that autocorrelation
number of untouched images declared as tampered), An ideal
method. However, average accuracy of our method is far
case would be a threshold giving zero false positive. However,
higher. MLE is reliable with 83% accuracy but with more than
we had to take into account the false negatives (the number of
double the time. Benford’s law based method has an accuracy
tampered images declared as untampered) that may occur
of 75 % but is the worst in time because recompressing the
when testing for forgeries. Hence, we require a threshold value
image and calculating distributions for each compressed
keeping both FPR and the FNR low. For average distortion
version may become time consuming for larger images.
measure, we selected a value that gave FPR of 10.8% and a
Images used in the experiments were of size 640×480.
lower FNR as possible for the different types of forgeries for
B. Forfery Detection average distortion. The horizontal line marks this threshold τ =
From the untouched previously compressed bitmap image 50. Similarly, we selected the BAM’s threshold to be τ = 40,
set, we selected 500 images for each quality factor, each of with a corresponding FPR of 5.6%. Table 4 shows the false
which was subjected to four common forgeries; cropping, negative rate (FNR) for the different forgeries at different
rotation, composition, and brightness changes. Cropping quality factors for bitmaps and JPEGs. As expected, as QF
forgeries were done by deleting some columns and rows from increases, a better estimate of the quantization matrix of the
the original image. An image was rotated by 270 o for rotation original untampered image is obtained, and as a result the
forgeries Copy-paste forgeries were done by randomly error percentage decreases. Notice how the values drop than
copying a block of pixels from an arbitrary image and placing those for JPEG file. Notice also that detection of cropping is
it in the original image. Random values were added to every possible when the cropping process breaks the natural JPEG
pixel of the image to simulate brightness change. The resulting grid, that it, the removed rows or columns do not fall in line
fake images were then stored in their uncompressed form for a with the 8×8 blocking. Similarly, when the pasted part fails to
total of (500×4) × 4 = 8,000 images. Next, the quantization fit perfectly into the original JPEG compressed image, the
table for each of these images was estimated as above and distortion metric exceeds the detection threshold, and a
used to calculate the image’s average distortion, (2), and the possible composite is declared. Fig. 5 shows examples of
blocking artifact, (3), measures, respectively. composites. The resulting distortion measures for each
Fig. 4(a) and (b) show values of the average distortion composite are shown in left panel. The dark parts denote low
measure and blocking artifact measure, respectively. The distortion whereas brighter parts indicate high distortion
scattered dots represent 500 untouched images (averaged for values. Notice the highest values correspond to the alien part
all quality factors for each image) while the cross marks and hence mark the forged area.
represent 500 images from the forged dataset. As the figure

TABLE IV. FORGERY DETECTION ERROR RATES FOR BITMAPS AND JPEGS

Distortion Measure Original Cropping Rotation Compositing Brightness


Average JPEG 12.6% 9.2% 7.55% 8.6% 6.45%
BMP 10.8% 3.9% 4.45% 2.0% 4.9%
JPEG 6.8% 3.3% 5.95% 3.15% 5.0%
BAM
BMP 5.6% 1.05% 3.05% 1.25% 3.7%

144 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010
IV. CONCLUSIONS Technol., vol. 5, pp. 74–82, Apr. 1995.
[7] Yang Y., N Galatsanos. P., Katsaggelos A. K., “Regularized
The method discussed in this paper is based on using the reconstruction to reduce blocking artifacts of block discrete cosine
approximated histogram of DCT coefficients of bitmaps for transform compressed images,” IEEE Trans. Circuits Syst. Video
extracting the image’s compression history; its quantization Technol., vol. 3, pp. 421–432, Dec. 1993.
table. Also the extracted table is used to expose image [8] Luo J., Chen C.W., Parker K. J., Huang T. S., “Artifact reduction in low
bit rate dct-based image compression,” IEEE Trans. Image Process., vol.
forgeries. The method proved to have practically high 5, pp. 1363–1368, 1996.
estimation accuracy when tested on a large set of image from [9] Chou J., Crouse M., Ramchandran K., “A simple algorithm for removing
different sources compared to other statistical approaches. blocking artifacts in block-transform coded images,” IEEE Signal
Moreover, estimation times proved to be faster than statistical Process. Lett., vol. 5, pp. 33–35, 1998.
methods while maintaining very good accuracy for lower [10] Fan Z., de Queiroz R. L., “Maximum likelihood estimation of jpeg
quantization table in the identification of bitmap compression history”,
frequencies. Experimental results also showed that in Proc. Int. Conf. Image Process. ’00, 10-13 Sept. 2000, 1: 948–951.
performance for bitmaps surpasses that of JPEGs because of [11] Fan Z., de Queiroz R. L., “Identification of bitmap compression history:
their lossy nature but on the other hand, it takes more time to jpeg detection and quantizer estimation”, in IEEE Trans. Image
process a bitmap. Process., 12(2): 230–235, February 2003.
[12] Fu D., Shi Y.Q., Su W., “A generalized benford's law for jpeg
coefficients and its applications in image forensics”, in Proc. SPIE
REFERENCES Secur., Steganography, and Watermarking of Multimed. Contents IX,
[1] Hamdy S., El-Messiry H., Roushdy M. I., Kahlifa M. E., “Forgery vol. 6505, pp. 1L1-1L11, 2007.
detection in JPEG compressed images”, JAR-Unpublished, 2010. [13] Swaminathan A., Wu M., Ray Liu K. J., “Digital image forensics via
[2] Rosenholtz R., Zakhor A., “Iterative procedures for reduction of intrinsic fingerprints”, IEEE Trans. Inf. Forensics Secur., 3(1): 101-117,
blocking effects in transform image coding,” IEEE Trans. Circuits Syst. March 2008.
Video Technol., vol. 2, pp. 91–94, Mar. 1992. [14] Ye S., Sun Q., Chang E.-C., “Detection digital image forgeries by
[3] Fan Z., Eschbach R., “JEPG decompression with reduced artifacts,” measuring inconsistencies in blocking artifacts”, in Proc. IEEE Int.
Proc. IS&T/SPIE Symp. Electronic Imaging: Image and Video Conf. Multimed. and Expo., July, 2007, pp. 12-15.
Compression, San Jose, CA, Feb. 1994. [15] J. Fridrich, M. Goljan, and R. Du, "Steganalysis based on JPEG
[4] Fan Z., and F. Li, “Reducing artifacts in JPEG decompression by compatibility", SPIE Multimedia Systems and Applications, vol. 4518,
segmentation and smoothing,” Proc. IEEE Int. Conf. Image Processing, Denver, CO, pp. 275-280, Aug. 2001.
vol. II, 1996, pp. 17–20. [16] Schaefer G., Stich M., “UCID – An Uncompressed Color Image
[5] Tan K. T., Ghanbari M., “Blockiness detection for MPEG-2-coded Database”, School of Computing and Mathematics, Technical. Report,
video,” IEEE Signal Process. Lett., vol. 7, pp. 213–215, Aug. 2000. Nottingham Trent University, U.K., 2003.
[6] Minami S., Zakhor A., “An optimization approach for removing [17] Petkov A., Cottier S., “Image quality estimation for jpeg-compressed
blocking effects in transform coding,” IEEE Trans. Circuits Syst. Video images without the original image”, EE398 Projects - Image and Video
Compression, Stanford University, March 2008.

145 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8 No. 8, 2010

(a) Three composite bitmap images.

(b) Distortion measure for the three images in (a).


Fig. 5. Distortion measures for some composite bitmap images. The left panel represents the average distortion measure while the right panel represents the
blocking artifact measure.

146 http://sites.google.com/site/ijcsis/
ISSN 1947-5500