You are on page 1of 5

Volume 3, Issue 3, March 2013 ISSN: 2277 128X

International Journal of Advanced Research in


Computer Science and Software Engineering
Research Paper
Available online at: www.ijarcsse.com
Special Issue: Computing Terminologies and Research Development
Conference Held at SCAD College of Engineering and Technology, India
A Forensic Method for Detecting Image Forgery
Using Codebook
E. Agnes (PG Scholar) 1, S. Devi Mahalakshmi (Assistant Professor) 2, Dr. K. Vijayalakshmi (Professor)3
1, 2
Department of Computer Science and Engineering
3
Department of Information Technology
Mepco Schlenk Engineering College, Sivakasi
Affiliated to Anna University, Chennai, India
Abstract—An Extensive growth in software technologies available on Internet results in tampering of images. The
main problem in the real world is to determine whether an image is real or forged. In this paper we
propose a new method that makes use of the codebook which is generated from the set of image features to
determine the geometric manipulations that are occurred in the received image. In the proposed method, an image
hash based on bag of visual words attached as signature to the image before transmission. At the destination the
forensic hash is compared to detect the geometric manipulations that are applied to the received image. The
spatial distribution of image features is encoded to deal with highly textured and contrasted tampering
patterns. The proposed method detects all types of tampering.
Keywords—Bag of Words, Code book, Forensic Hash, Geometric manipulations, Image tampering.

I. INTRODUCTION
THE widespread use of image processing software leads to forgery of images. An example of a digital forgery is shown
in Figure 1&2.

Fig. 1 Original image Fig. 2 Tampered image


Figure 1 is the original image which shows the natural yellow flower.This original image is tampered with one red
flower to get the tampered image which is shown in Figure 2. Most of the existing methods uses the watermarking based
approaches. Here the information is attached in the LSB bits of the image.There are two types of watermark, visible
watermarking and invisible watermarking. The watermarking has one major disadvantage. It will distort the content. Some
of the existing methods[1]-[14] uses hash signature based methods. In these methods, a hash code is embedded with the
image before transmission. At the destination the hash code is used for verification. The image hash is the signature which
represent the visual content. The image hash should be robust against malicious operations. This forensic method provide
scientific evidence against the malicious manipulations. The codebook[16] is generated from the set of training images. If
the source image is not known means we cant able to detect the geometric manipulations. The locations of forger is
detected by extracting the gradient features. The main contribution of this paper is to detect and locate the image forgery.
To locate the image forgery the receiver should identify all the geometric manipulations(rotation,scaling,translation) that
are occurred in the receive image.
There are three types of image tampering.
 Enhancing
 Compositing
 Copy Move
In Enhancing[18], the weather conditons has been changed and the image objects were blurred out. In compositing[19],
two or more imges were combined to produce forged image. In copy move[20], one region of the original image is copied
and pasted into the other area of the original image to create a fogred image.
© 2013, IJARCSSE All Rights Reserved Page | 1
Agnes et al., International Journal of Advanced Research in Computer Science and Software Engineering 3 (3),
March- 2013, pp. 1-5
BOF [2] model is used to represent the image features as words. Here the codebook is generated using the Bag of
Feature representation. Existing methods used BOF model to reduce the size of the image hash. Many existing methods
propose image signature based methods. If the attacker knows the signature [15] means it is relatively easy to do the image
forgery. To avoid that, our proposed method does not uses signature. The proposed method extract the SIFT features based
on the spatial distribution and contrast properties. Authors of [1] proposed a new method to detect the forgery. They will
detect the geometric manipulation in the images by comparing the hash values that are generated in the sender and receiver
side. Sender assumes that image is transmitted via untrusted network. The receiver generates the hash value for the
received image and then it will compare the hash value with the sender’s hash value. If both matches means the image is
not forged. If both do not match means the image is forged. Authors of [3] proposed a method called multimedia forensic
based on visual words. Here forensic hash generated based on SIFT features and visual words representation [17]. They
first extract the most stable SIFT points from an image and encode its visual words representation as the alignment
component of the forensic hash.In that method the image features selected by considering only contrast properties. This
selection strategy is not robust against some malicious attacks. It simply considers only the single matching in the first
estimation and refine the results later consider the remaining once. Authors of [4] proposed a classical registration
approach. Error measure is minimized by using direct method. Here the image information is directly collected from all
pixels in the image. The disadvantage in classical image registration is that it cannot directly employed. Because only
limited amount of information is used while generating the signature. Authors of [5] proposed robust image alignment
algorithm for video stabilization purposes. They propose a fast and robust image alignment algorithm for video
stabilization purposes. They use two methods: block based local motion estimator and robust alignment algorithm based
on voting. They also use a transformation model to detect the geometric manipulations. The author of [6] proposed a new
method to detect forgery. They use three techniques to detect forgery: lightning, cloning and retouching. This method is
efficient but it requires more time to detect the forgery. It requires complete knowledge about each light source in scene.
Authors of [7] proposed a new spatial pyramid method to recognize the scene categories. In that method first the image
regions are partitioned into fine sub regions. Then the histograms were constructed inside each sub region. The resultant is
called as spatial pyramid. This will be computationally efficient and also it is efficient extension of bag of feature
representation. The disadvantage here is, it produce inconclusive results and impractical for large training sets. Authors of
[8] proposed a novel signature based approach for localizing tampering in images. The signature contains two things:
content and the alignment information. These two should be short in size (<1KB). In this method, tampering is detected
using a small signature but that is not embedded into the image. This method has two steps: hash generation and hash
verification. Disadvantage of this method is, search complexity is high and if the hash information doesn’t contain relevant
and necessary information means it leads to high false positive error. Authors of [9] proposed a method that will detect the
forgery by comparing the codebook which is generated from set of SIFT features. The pre-computed codebook is shared
between the sender and the receiver. Here the sender extracts the SIFT feature and then they will sort the features in
descending order according to their contrast properties. Here the disadvantage is, for the codebook generation they are
using large amount of SIFT features. So the space complexity is high. The proposed method has high accuracy compared
to the existing methods. After extracting the SIFT features, Clustering is performed on the set of images and the centroid
of each cluster is taken to generate the codebook. The security is based on the codebook. Like [1] & [3], the codebook is
shared between the sender and the receiver. To avoid complexity the codebook is built only once. The codebook will be
shared between the sender and the receiver to reduce the communication overhead.

II. PROPOSED WORK


The proposed method uses codebook to detect the image forgery. The codebook is generated from the set of training
images, by extracting the SIFT features and clustering. The centroids are taken to generate the codebook to improve the
accuracy. The overall scheme of the proposed work is given in the figure 3.
The proposed work consists of five steps. They are,
 SIFT feature Extraction
 Clustering
 Codebook Generation
 Tampering detection
 Locating the image forgery
A. SIFT Feature Extraction :
Only stable points are extracted. Edge points and the points that are sensitive to noise are eliminated. We have to choose
a region around each point. The SIFT feature extraction itself has four steps. They are, Scale space extrema detection
 Key point localization
 Orientation assignment
 Key point descriptor
Scale space extrema detection is the first stage that it searches over all scales and image locations and constructs scale
space. To identify potential interest points that are invariant to scale and orientation, a difference of Gaussian function is
computed using the below two steps.
 Local Maxima/Minima over DoG images
 Find sub pixel Maxima/Minima

© 2013, IJARCSSE All Rights Reserved Page | 2


Agnes et al., International Journal of Advanced Research in Computer Science and Software Engineering 3 (3),
March- 2013, pp. 1-5

Fig. 3 Overall Schema of Tampering Detection

Next step is Orientation Assignment. There are more number of key points in the previous step. We have to reduce it by
eliminating some unwanted key points. Edge points and the points that are sensitive to noise are eliminated.
The final step is Key point descriptor. After selecting the orientation, the feature descriptor is computed as a set of
orientation histograms along with gradient magnitude and orientation. According to the range of orientation from 0 – 360
degrees, the key points are added to different bins. Once 128 numbers are obtained it is normalized. The local image
gradients around each key points are transformed into a representation. Such representation significantly allows distortion
of shape and illumination change.

B. Clustering:
K means clustering is used to partition n observations into k clusters in which each observation belongs to the
cluster with the nearest mean.
Steps in k means clustering are,
1. Choose the k value
2. Assign each partition to its closest cluster center.
3. New cluster centers are assigned as centroids of the clusters.
4. Repeat Steps 1 and 2 until there is no change in the membership (also cluster centers remain the same) K value
is selected from the below formula,
𝑘 ≈ 𝑛/2 (1)
In that formula n denotes the number of key points.

C. Codebook Generation :
Codebook [3] is generated from the set of training images. From the training images the features were extracted using
SIFT algorithm. After extracting the features clustering is done. Then the centroids are extracted from the clusters to
generate the codebook. First the codebook is generated for the set of all training images. Then for the test image the
codebook is generated using the same procedure.

D. Tampering Detection :
After generating the test image codebook, this will be compared with the training image’s codebook. After
comparison, geometric manipulation will be detected.

E. Locating the image forgery :


The image forgery has been located by extracting the gradient features. One of the fundamental building blocks
in image processing is the gradient of the image.
Gradient of a function F(x, y) is calculated as,
𝜕𝐹 𝜕𝐹
𝐹 = 𝑖+ 𝑗 (2)
𝜕𝑥 𝜕𝑦
It can be thought of as a collection of vectors pointing in the direction of increasing values of F. For a function
of N variables, the gradient value is calculated as F(x, y, z),

𝜕𝐹 𝜕𝐹 𝜕𝐹
∇𝐹 = 𝑖+ 𝑗+ 𝑘+⋯ (3)
𝜕𝑥 𝜕𝑦 𝜕𝑧

© 2013, IJARCSSE All Rights Reserved Page | 3


Agnes et al., International Journal of Advanced Research in Computer Science and Software Engineering 3 (3),
March- 2013, pp. 1-5
III. DATASET AND EXPERIMENTS
The training Images were taken from DBForgery 1.0[10] and from [7] & [11]. The test consists of 2500 images. From
the set of training images the codebook is generated. The set of experiment results are shown in figure 4. Figure 4(a) is
the original image, figure 4(b) is the forged image. In the figure 4(b) the telephone is hided. In the figure 4(c), the SIFT
features are extracted.

(a) (b) (c)

(d) (e) (f)


Fig. 4 Experimental results
In the figure 4(d), clustering is performed under the set of image features. Figure 4(e) shows the gradient feature
extracted from the image, and figure 4(f) shows the exact tampering location. Table I explains about the performance
analysis. TPR stands for True Positive Rate and FPR stands for False Positive Error.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑚𝑎𝑔𝑒𝑠 𝑑𝑒𝑡𝑒𝑐𝑡𝑒𝑑 𝑎𝑠 𝑓𝑜𝑟𝑔𝑒𝑑 𝑏𝑒𝑖𝑛𝑔 𝑓𝑜𝑟𝑔𝑒𝑑


𝑇𝑃𝑅 = (4)
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑜𝑟𝑔𝑒𝑑 𝑖𝑚𝑎𝑔𝑒𝑠
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑚𝑎𝑔𝑒𝑠 𝑑𝑒𝑡𝑒𝑐𝑡𝑒𝑑 𝑎𝑠 𝑓𝑜𝑟𝑔𝑒𝑑 𝑏𝑒𝑖𝑛𝑔 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙
𝐹𝑃𝑅 = (5)
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑖𝑚𝑎𝑔𝑒𝑠
TABLE I.
PERFORMANCE ANALYSIS
Existing Method Proposed Method
Threshold, T
TPR(%) FPR(%) TPR(%) FPR(%)
8 73.6 9.3 77.6 5.3
16 84.8 17.3 83.2 14.6
32 88.8 20.0 91.2 17.3

The training set for deciding the threshold value in tampering detection 32 is taken as the optimum value and is used
as threshold for the test.

IV. CONCLUSION AND FUTURE WORKS


The proposed method is used to find whether the image is forged one or not. This work deals with detection of
geometric manipulation that is occurred during the transmission. The codebook is generated using the set of training
image features. For the test image also one codebook is generated. By comparing both codebooks, the geometric
manipulations were detected. Gradient feature is used to extract the information from the images. After extracting the
gradient features, the tampered regions are localized. The proposed work detects all type of tampering by comparing the
generated codebook. Codebook complexity is very less compared to the existing methods. This work has been
experimentally tested on different datasets (dbforgey 1.0, UCID v2) .While doing experiments on many datasets,
proposed approach obtains good performances. The proposed work does not identify splicing attack, if the source image
is not found. The future work is to identify such attack even when the source is not found. Another idea over this work is
to reduce the codebook complexity. Without decreasing the accuracy, the codebook complexity can still reduced. It can
be done by reducing the number of centroid values that are taken for codebook generation.

REFERENCES
[1] S. Battiato, G. M. Farinella, E. Messina, and G. Puglisi ,‖Robust Image Alignment for Tampering Detection‖, in
IEEE Transactions on Information Forensics and Security, vol 7,no. 4,August 2012.S.
[2] S. Battiato, G. M. Farinella, E. Messina, and G. Puglisi, ―Under-standing geometric manipulations of images
through BOVW-based hashing,‖ in Proc. Int. Workshop Content Protection Forensics (CPAF 2011), 2011.
© 2013, IJARCSSE All Rights Reserved Page | 4
Agnes et al., International Journal of Advanced Research in Computer Science and Software Engineering 3 (3),
March- 2013, pp. 1-5
[3] W. Lu and M. Wu, ―Multimedia forensic hash based on visual words,‖in Proc. IEEE Computer Soc. Int. Conf.
Image Processing, 2010, pp.989–992.
[4] M. Irani and P. Anandan, ―About direct methods,‖ in Proc. Int. Work-shop Vision Algorithms, held during ICCV,
Corfu, Greece, 1999, pp.267–277.
[5] G. Puglisi and S. Battiato, ―A robust image alignment algorithm for video stabilization purposes,‖ IEEE Trans.
Circuits Syst. Video Technol., vol. 21, no. 10, pp. 1390–1400, 2011.
[6] H. Farid, ―Digital doctoring: How to tell the real from the fake,‖ Sig-nificance, vol. 3, no. 4, pp. 162–166, 2006.
[7] S. Lazebnik, C. Schmid, and J. Ponce, ―Beyond bags of features: Spa-tial pyramid matching for recognizing
natural scene categories,‖ in Proc. IEEE Computer Soc. Conf. Computer Vision Pattern Recogni-tion, 2006, pp.
2169–2178.
[8] S.Royand , Q.Sun, ―Robust hash for detecting and localizing image tampering,‖ in Proc. IEEE Computer Soc. Int.
Conf. Image Processing, 2007, pp. 117–120.
[9] S. Battiato, G. M. Farinella, E. Messina, and G. Puglisi, ―Under-standing geometric manipulations of images
through BOVW-based hashing,‖ in Proc. Int. Workshop Content Protection Forensics (CPAF 2011), 2011.
[10] S. Battiato and G. Messina, ―Digital forgery estimation into DCT do-main—A critical analysis,‖ in Proc. ACM
Conf. Multimedia 2009, Mul-timedia in Forensics (MiFor’09), 2009.
[11] L.Fei-Fei, R.Fergus, and P. Perona. Learning generative visual models from f ew training examples: an
incremental Bayesian approach tested on 101 object categories. In IEEE CVPR Workshop on Generative-Model
Based Vision, 2004. http://www.vision.caltech.edu/Image Datasets/Caltech101.
[12] M. Szummer and R. Picard. Indoor-outdoor image classifi-cation. In IEEE International Workshop on Content-
Based Access of Image and Video Databases, pages 42–51, 1998
[13] C. Wallraven, B . C aputo, and A. Graf. Recognition with local features: t he kernel recipe. I n Proc. ICCV,
volume 1, pages 257–264, 2003.
[14] R. Szeliski, ―Image Alignment And Stitching: A Tutorial,‖ Foundations Trends In Computer Graphics
Computer Vision, Vol. 2, No. 1, Pp.1–104, 2006.
[15] Y.-C. Lin, D. Varodayan, And B. Girod, ―Image Authentication Based On Distributed Source Coding,‖ In Proc.
Ieee Computer Soc. Int. Conf. Image Processing, 2007, Pp. 3–8.
[16] J. Zhang, M. Marszalek, S . L Azebnik, And C . S Chmid. Local Features And Kernels For Classifcation Of
Texture And Object Categories: An In-Depth Study. Technical Report Rr-5737, Inria R H One-Alpes, 2005.
[17] A.C. Popescu And H. Farid, ―Exposing Digital Forgeries By Detecting Traces Of Re-Sampling,‖ Ieee Trans. On
Signal Pro-Cessing, Vol. 53, No. 2, Pp. 758–767, 2005.
[18] D. G. Lowe, ―Distinctive Image Features From Scale-Invariant Key-Points,‖ Int. J. Computer Vision, Vol. 60,
No. 2, Pp. 91–110, 2004.
[19] L.Gruber,S.Zollmann,D.Wagner,D.Schmalstieg,Andt.Hollerer, ―Optimization Of Target Objects For Natural
Feature Tracking,‖ In Proc.20th Int. Conf. Pattern Recognition (Icpr 2010), Washington, Dc,2010, Pp. 3607–3610.
[20] H. Farid, Exposing Digital Forgeries From Jpeg Ghosts," Ieee Trans. Information Forensics And Secu-Rity, Vol.
4, Pp. 154-160, 2009.

© 2013, IJARCSSE All Rights Reserved Page | 5

You might also like