You are on page 1of 16

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO.

4, APRIL 2018 825

A Fast Forgery Detection Algorithm Based on


Exponential-Fourier Moments for
Video Region Duplication
Lichao Su , Cuihua Li, Yuecong Lai, and Jianmei Yang

Abstract—Region duplication is one of the most common


methods of video forgery. Existing forgery detection algorithms
generally suffer from inefficiency and are not effective for the
forged regions with mirroring. To address these problems, we
present a fast forgery detection algorithm based on Exponential-
Fourier moments (EFMs) for detecting region duplication in
videos. The algorithm first extracts EFMs features from each block
in the current frame, and performs a fast match to find potential
matching pairs. Then, a postverification scheme is designed to Fig. 1. An example of video intra-frame region duplication: authentic (top)
eliminate falsely matched pairs and locate the altered regions and tampered video (bottom).
in the current frame. Finally, an adaptive parameter-based fast
compression tracking algorithm is used to track the tampered
regions in the subsequent frames. The experimental results show
that our proposed algorithm has higher detection accuracy and authentication is included, i.e., digital watermarks, digital signa-
computational efficiency than those of previous algorithms. tures and fingerprinting [2], [3]. In passive forensics, the veracity
Index Terms—Video forgery, region duplication, passive and integrity of a video will be authenticated without any vali-
forensics, fast algorithm. dation information, which is more practical in application than
active forensics [4].
I. INTRODUCTION Passive forensics researches mainly focus on digital images in
ITH the rapid development of multimedia technology recent years [5], [6], but video passive forensics has gradually
W and user-friendly editing softwares (e.g., Photoshop,
Premiere by Adobe, and Mokey by Imagineer Systems), manip-
become a hot research. Recent works in passive forensics in-
clude the following studies. In [7], the authors propose a method
ulating videos and changing their content is becoming a trivial for detecting forged video regions based on inconsistencies in
task. For example, you can add or delete significant informa- the noise characteristics of the forged areas. In [8], the au-
tion, such as an object, from a video, without leaving any visible thors suggest a method for detecting video tampering using the
signs of such tampering. Sometimes, these manipulations are not noise correlation properties between spatially collocated blocks.
innocent, involving for example tampering videos acquired in Other forgery detection techniques are also proposed in [9] and
surveillance systems and used as evidence. Consequently, there [10]. These techniques use double quantization coefficients to
is an increasing research interest in video forensics, which is detect forgeries; however, they only work within a limited range
used to authenticate the veracity and integrity of videos [1]. of compression rates. A new algorithm to detect frame dupli-
Video forensics primarily falls into two categories: active cation forgery based on Tamura texture features is proposed in
forensics and passive forensics. In active forensics, during [11]. The eigenvector matrix of the video is populated by ex-
the generation of a video, the validation information used for tracting the Tamura texture features of each frame, and their
distances to a threshold are compared to detect copy-move
sequences. In [12], an effective similarity-analysis-based
Manuscript received January 25, 2017; revised June 3, 2017 and July 25,
2017; accepted August 31, 2017. Date of publication October 6, 2017; date of method for frame duplication detection is proposed, which has
current version March 15, 2018. This work was supported in part by the Na- outstanding performance in terms of computational efficiency.
tional Natural Science Foundation of China under Grant 61373077, in part by Using video-editing software to copy and paste specific ex-
the Specialized Research Fund for the Doctoral Program of Higher Education of
China under Grant 20110121110020, and in part by the National Defense Basic isting contents from one region to another disjoint region in
Scientific Research Program of China. The associate editor coordinating the re- the same frame is one of the most common methods for video
view of this manuscript and approving it for publication was Prof. Balakrishnan forgery. Fig. 1 shows an example of region duplication in a video
Prabhakaran. (Corresponding author: Lichao Su.)
The authors are with the School of Information Science and Engineering, Xi- downloaded from the Internet. In the figure, the top and bottom
amen University, Xiamen 361005, China (e-mail: 651424071@qq.com; fisher- rows show the authentic sequences and their forged version,
slc@163.com; sa@1l.com; 786313129@qq.com). respectively. In the forged sequences, the cars are selected and
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. duplicated to the top left corner, concealing the real information
Digital Object Identifier 10.1109/TMM.2017.2760098 of the video.
1520-9210 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
826 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO. 4, APRIL 2018

Relatively few studies have realized the detection of re- performs favorably against state-of-the-art trackers in terms of
gion duplication effectively. Wang et al. propose a divide-and- efficiency, accuracy, and robustness. In this section, we briefly
conquer approach to detect duplication wherein the entire video present the principle and basic steps of the FCT.
is split into subparts, and different types of correlation coeffi- First, assuming that the object location at the (τ − 1)-frame
cients are computed to highlight their similarities [13]. How- (i.e., Iτ −1 ) has been determined and record the current frame
ever, the detection effectiveness is unacceptable if the forged as Iτ . After sampling a set of image patches from the tracking
region is small. In [14], a region duplication forgery detection location at the (τ − 1)-th frame [18], a very sparse random

algorithm is proposed based on Histogram of Oriented Gradi- Gaussian matrix R ∈ Rm s (m << s) is adopted to extract a

ents (HOG) feature matching and video compression properties. low-dimensional measurement vector v ∈ Rm 1 from the high-

This HOG-based algorithm is effective and robust against vari- dimensional Haar-like feature vector x ∈ Rs 1 of these image
ous signal processing manipulations; however, it is not suitable patches, and the extracting process can be defined as:
for long videos because of its unacceptable computational com-
v = Rx (1)
plexity. In [15], a novel copy–move forgery detection scheme
is proposed using adaptive over-segmentation and feature point where each element R(i, j) in R can be defined as:
matching. The algorithm combines a block-based method with ⎧

⎪ 1 1
with a probability of 2ρ
a feature points-based method and has good performance. How-
√ ⎨
ever, to reduce computational complexity, the test video in [15] R (i, j) = rij = ρ × 0 with a probability of 1 − 1


ρ
was processed by the down-sampling, which complicated the ⎩ −1 with a probability of 1
extraction of features and influenced the detection accuracy. 2ρ
(2)
In video forensics, the issues of detection accuracy and com-
where ρ = s/(glog10 (s)) with s = 106 ∼ 1010 and g = 0.4.
putational cost are central to the design of the algorithms, be-
Second, use classifier H(v) to each feature vector v and find
cause a video of even modest length may run into thousands of
the tracking location at the τ -th frame with the maximal classi-
frames. And most of the previous algorithms could not achieve
fier response [19]. H(v) can be calculated with a naive Bayes
both satisfactory detection accuracy and computational effi-
classifier, defined as
ciency, not even mentioning for detecting forged regions with  s 
mirroring, as reported in [13]–[15]. p(vi |y = 1)p(y = 1)
H (v) = log s i=1
i=1 p(vi |y = 0)p(y = 0)
To address this problem, in this paper, we propose a fast
forgery detection algorithm based on Exponential-Fourier mo-
s  
ments for region duplication in videos. In our algorithm, the p(vi |y = 1)
= log (3)
current frame is first split into overlapping blocks, from which i=1
p(vi |y = 0)
the proposed algorithm extracts EFMs features, and performs a
fast match. Then, a post-verification scheme (PVS) is designed where p(y = 1) = p(y = 0) = 0.5, and y ∈ {0, 1} is a binary
to eliminate falsely matched pairs, and locate the altered regions variable that represents the sample label.
in the current frame. Finally, AFCT algorithm is used to track The conditional distributions p(vi |y = 1) and p(vi |y = 0) in
the forged regions in the subsequent frames. the classifier H(v) are assumed to be Gaussian distributed [20]
The main contributions of this paper address the following with four parameters (μ1i , σi1 , μ0i , σi0 ) where:
elements: p(vi |y = 1) ∼ N (μ1i , σi1 ), p(vi |y = 0) ∼ N (μ0i , σi0 ) (4)
1. EFMs are first employed to detect region duplication in
video, realizing the detection for forged regions with Third, sample two sets of image patches from Iτ , extract the
mirroring. Moreover, we optimize the process of EFMs features with these two sets of samples, and update the classifier
extraction and block matching, which improve the com- parameters according to:
putational efficiency greatly. μ1i ← λμ1i + (1 − λ)μ1 (5)
2. Object tracking technique is first introduced into digi-

tal video forensics, avoiding the deficiency of detecting 2 2


σi1 ← λ(σi1 ) + (1 − λ)(σ 1 )2 + λ(1 − λ)(μ1i − μ1 ) (6)
videos frame by frame. Furthermore, we improved the al-
gorithm [16] with two aspects by the optimization of learn- where λ (λ > 0) is a learning parameter, which is set to 0.85 in
ing rate and measurement matrix to increase the tracking the FCT and

accuracy. 1 s−1 2
3. Our algorithm achieves the detection of multiple regions
1
σ = k =0|y =1 (vi (k) − μ1 ) (7)
s
forgery which is more practical in applications.
1
s−1
μ1 = vi (k) (8)
II. PRELIMINARIES s
k =0|y =1
A. Fast Compressive Tracking Algorithm
B. Exponential-Fourier Moments
The Fast Compressive Tracking (FCT) algorithm is a real-
time object tracking algorithm proposed by Kaihua Zhang in Exponent-Fourier moments (EMFs), proposed by Hai-tao Hu
2014 [16], [17]. The FCT algorithm runs in real-time and in 2014, are a set of moments based on the exponent function
SU et al.: FAST FORGERY DETECTION ALGORITHM BASED ON EFMs FOR VIDEO REGION DUPLICATION 827

that are suitable for image analysis and rotation invariant pattern performed:
recognition [21], [22]. Exponent-Fourier moments have a strong

ability to describe an image because the new radial functions ri,j = (c1 j − c2 )2 + (c2 − c1 i)2 (17)
have more zeros, and these zeros are evenly distributed. In this
section, we briefly introduce the Exponential-Fourier Moments c2 − c1 i
θi,j = arctan (18)
as follows: c1 j − c2
In the polar coordinate system, an orthogonal function set
where c1 and c2 can be calculated by (19) and (20):
Pn m (r, θ) is defined:

P n m (r, θ) = Qn (r) exp (tmθ) (9) 2
c1 = (19)
N
where exp(tmθ) is the Fourier function, n and m are both N +1
integers, and Qn (r) is the radial function which is defined as: c2 = √ (20)
2N

1 Finally, (13) becomes:
Qn (r) = exp (t2nπr) (10)
r
1
N N
The function set Pn m (r, θ) is orthogonal within 0 ≤ r ≤ 1 EF M sn m = f (i, j)Qn (ri,j ) exp(−tmθi,j )
and 0 ≤ θ ≤ 2π; that is, 2πN 2 i=1 j =1
2π 1 (21)
P n m (r, θ)P k l (r, θ)rdrdθ = 2πδn m k l (11) where f (i, j) represents the pixel value of the image at (i, j).
0 0 The image moment invariants, ||EF M sn m ||, keep good
where δn m k l is the Kronecker symbol. According to the theory invariance under translation, scaling, rotation [22], [23] and
of orthogonal functions, an image f (r, θ) in the polar coordinate mirror with detailed proofs in [23], Low-order moments
systems can be decomposed as: (i.e., ||EF M s(0, 0)||, ||EF M s(1, 0)||, ||EF M s(0, 1)||, and
||EF M s(1, 1)||) retain the low frequency information of the

+∞
+∞
image. Furthermore, the zero order moment ||EF M s(0, 0)||
f (r, θ) = EF M sn m Qn (r) exp(tmθ) (12)
is generally used to represent the “quality” of an im-
n =−∞ m =−∞
age. In this paper, we use ||EF M s(0, 0)||, ||EF M s(1, 0)||,
Therefore, Exponential-Fourier moments are defined using ||EF M s(0, 1)||, and||EF M s(1, 1)|| as the feature vector of
the radial function Qn (r) in the polar system as: each image sub-block, which is defined as:
2π 1
EF M sn m =
1
f (r, θ)Qn (r) exp(−tmθ)rdrdθ V = { EF M sn m |0 ≤ n, m ≤ 1} (22)
2π 0 0
2π 1 III. PROPOSED METHOD
1 1
= f (r, θ) exp(t(2nπr − mθ))rdrdθ (13)
2π 0 0 r Our algorithm primarily consists of the following four stages:
1) feature extraction, 2) block matching, 3) locating the tam-
In the proposed video region duplication detection algorithm
pered areas in the current frame, and 4) tracking the forged
in this paper, Exponential-Fourier moments are calculated in
areas in the subsequent frames.
the Cartesian coordinate systems. The formulas for converting
The first stage of the algorithm extracts EFMs from each block
any point on a two dimensional coordinate plane between polar
using an improved feature extraction method. The second stage
coordinates (r, θ) and Cartesian coordinates (x, y) are given in
applies a new block-matching method to search for potential
(14) and (15):
matching blocks, which significantly reduces the computational
x = r cos θ, y = r sin θ (14) cost. In the third stage, PVS method is designed to eliminate
y falsely matched pairs and locate the tampered areas in the current
rx,y = x2 + y 2 , θx,y = arctan (15) frame. In the final stage, AFCT algorithm is adapted to learn
x
the tampered areas identified in the current frame and track the
The Exponential-Fourier moments expressed in the Cartesian forged regions in subsequent frames.
coordinate systems can be obtained from the above equations:
EF M sn m A. Feature Extraction
In intra-frame region duplication forgery, transformations
1
= f (x, y)Qn (r) exp(−tmθ)dxdy (16) such as scaling, rotation and mirroring (post-processing) are
2π x 2 +y 2 ≤1
often conducted in duplicated regions. Scale Invariant Feature
A digital image of size N × N is an array of pixels f (i, j), Transform (SIFT) [24], [25] and Histograms of Oriented Gra-
where i and j denote rows and columns, respectively. Since dients (HOG) [14], [26], [27] are often used to detect dupli-
EFMs are computed over a unit disk, the center of the image as cation forgeries. However, those features have the following
the origin, and then the following coordinate transformation is shortcomings:
828 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO. 4, APRIL 2018

TABLE I based on a 640 × 480 pixel image. The results indicate that the
TIME COMPARISON OF TWO METHODS FOR EXTRACTING EFMS
proposed improved algorithm significantly increases the com-
putational efficiency. The information about the computer and
Block size software has been given in Section IV.
Algorithm 8×8 24 × 24 Additionally, the improved algorithm can be expanded to
the extraction of other image moments, such as the Zernike
Conventional method 31.12 s 217.55 s
Improved algorithm 2.51 s 5.13 s
moment, principal component transformation (PCT), and phase
stretch transform (PST). A detailed comparison of EFMs with
these image moments is provided in Section IV-B.

a) They cannot detect tampered areas subjected to mirror B. Block Matching


operation because neither SIFT nor HOG has mirror in-
variance [28]. In this section, we will provide a brief introduction to 2NN
b) SIFT and its improved methods (such as Flip-Invariant algorithm and gNN algorithm, and then present the rationale of
SIFT [28]) normally lead to considerable computational the proposed algorithm in detail.
complexity because of their high dimensions, especially 2NN algorithm [24] first calculates the Euclidean distance
in the analysis of high-resolution videos. for every pair of blocks, and then sorts them in ascending order
To greatly improve the efficiency of the whole algorithm, we to obtain D = {d1 , d2 , . . .}. Finally, the ratio of the distance
employ EFMs, which have better performance under scaling, to the nearest and second nearest neighbor is calculated. This
rotation, and mirroring, method requires a large amount of computation and can only
After the suspicious video is input, the current frame is ex- find one matching pair. On the other hand, gNN algorithm [25],
pressed as Icu r r en t . We transform Icu r r en t into a gray frame and which is an improvement upon 2NN algorithm, considers the
divide it into overlapping blocks (by one pixel) of size 8 × 8. ratios of di and di+1 rather than that of the nearest and second
Then, the EFMs (i.e., EF M s(0, 0) , EF M s(0, 1) , nearest neighbors only. Although gNN algorithm finds multiple
EF M s(1, 0) , and EF M s(1, 1) ) of the Icu r r en t are ex- matching pairs, the amount of computation it requires exceeds
tracted. In the conventional method, the moduli of the EFMs that of 2NN algorithm.
for each block must be calculated according to (21), which re- In this section, we design a new block-matching algorithm
quires an enormous amount of computational, especially for which can not only find multiple matching pairs, but also in-
high-resolution videos. crease the computational efficiency greatly. Because of the high
In this section, we propose an improved algorithm to extract similarity between the source and duplicated regions, similarity
EFMs that reduces the computational cost and improves the analysis is a feasible method for detecting the intra-frame region
efficiency significantly. Expanding (21) into matrix form, we duplication forgery. In this paper, the Euclidean distance is used
obtain: as a measure of similarity.
Assuming that xik refers to the k-th component of the EFMs
1
N N
EF M s(m, n) = mi,j (23) extracted from the i-th block, similarly, xj k refers to the k-th
2πN 2 i=1 j =1 component of the EFMs extracted from the j-th block. The
similarity between the i-th and j-th blocks is given by
where N is the size of each block, mi,j is the entry of the i- k =D  12
th row and j-th column of matrix M , and matrix M can be
expressed as, (24) shown at the bottom of the page, where “.∗” SIM (i, j) = disij = (xik − xj k )2 (26)
is the dot-product. For convenience, we refer to the matrix on k
the right side of “.∗” as AR and the one on the left side as AL. where D = 4 in this paper. Given a pre-defined threshold Tdis ,
Thus, (24) becomes: any two blocks with a similarity above Tdis are considered
candidates for region duplication.
M E = AL.∗ AR (25)
After Tdis is set, reducing the complexity of verification pro-
We observed from (24) that for each block, if N is fixed, AR is cess as much as possible will greatly improve the efficiency.
fixed and only AL changes. That is, calculating the AR for each Therefore, a simple yet effective block-matching algorithm is
block repeatedly is not necessary. In our improved algorithm, proposed.
AR will be calculated only once for Icu r r en t . We denote B(i) as the i-th block in Icu r r en t , B(i).f t as the
The times required for feature extraction by the proposed al- features (i.e., EFMs) of B(i), B(i).loc as the coordinate of the
gorithm and the conventional method were compared in Table I pixel at the upper-left corner in B(i), B(i).sum as the sums

⎡ ⎤ ⎡         ⎤
f (1, 1) , . . . , f (1, N ) Qn r1,1 exp −tmθ1,1 , . . . , Qn r1,N exp −tmθ1,N
⎢. . . ⎥ ⎢. . . ⎥
ME = ⎢
⎣. . .
⎥. ∗ ⎢
⎦ ⎣. . .

⎦ (24)
       
f (N, 1) , . . . , f (N, N ) Qn rN ,1 exp −tmθN ,1 , . . . , Qn rN ,N exp −tmθN ,N
SU et al.: FAST FORGERY DETECTION ALGORITHM BASED ON EFMs FOR VIDEO REGION DUPLICATION 829

TABLE II
PSEUDO CODE OF THE IMPROVED BLOCKS-MATCHING ALGORITHM

Fig. 2. Schematic of the first step of PVS.

algorithm has higher computational efficiency than gNN when


applied to other features (i.e., SURF).

C. Locating the Tampered Area in the Current Frame


As its name implies, the potential pair is only likely, but not
necessarily, to be a real copy-move pair. In fact, such pairs can
sometimes even be detected from videos without any copy-move
forgery, since many natural video frames may contain several
pairs of highly similar patches. However, copy-move pairs usu-
ally exhibit some unique characteristics that can distinguish
them from falsely matched ones [29].
To eliminate falsely matched pairs, PVS method is described
TABLE III in this section to locate the tampered regions in Icu r r en t . Assume
COMPARISON OF THE CALCULATION TIMES USING TWO ALGORITHMS that (M Q(k), M W (k)) is the potential matching pair, where
M Q(k).loc and M W (k).loc represent the upper-left corner
Algorithm Calculation counts coordinates of the corresponding blocks, respectively.
Firstly, since Icu r r en t is divided into overlapping blocks, both
gNN 15052
Proposed block matching algorithm 282 of the source and duplicated regions should cover a number of
neighboring patches [30]. Accordingly, (M Q(k), M W (k)) will
be retained only when there are at least two other matching pairs
within a circle of radius Tr . A detailed schematic is shown in
of the EFMs of B(i), and Nb as the total number of blocks in Fig. 2.
Icu r r n et . Secondly, the spatially adjacent blocks in an image are usu-
Firstly, update B by sorting the sums in descending order; ally quite similar, and they should be falsely matched pairs.
that is, B(i).sum > B(i + 1).sum. We specify that the straight-line distance of (M Q(k), M W (k))
Secondly, for any value of i, let j = i + 1 to Nb . If must be greater than TL , which is formulized as:
|B(i).sum − B(j).sum| <= sqrt(D) × Tdis is true, then cal-
Distance (M Q (k) .loc, M W (k) .loc) > TL (27)
culate the Euclidean distance between B(i) and B(j) to deter-
mine whether they are a potential matching pair; otherwise, let To improve the experimental accuracy, we fix the parame-
i = i + 1 and then repeat this step. ters of Tr = 30 and TL = 30 based on numerous experimental
Thirdly, output the potential matching sets of M Q and M W . analyses.
The pseudo code of the block-matching algorithm is given in Thirdly, due to the duplicate nature of the copy-move forgery,
Table II. the synchronization between the blocks in the source and du-
Assume that B(1).f t and B(2).f t are the features of plicated regions is preserved, even though transformations are
the i-th block and the j-th block, which are expressed as conducted in the duplicated regions. Hence, the synchroniza-
{H1 , H2 , . . . , HD } and {L1 , L2 , . . . , LD }, respectively. The tion is evaluated by analyzing the geometrical relationships be-
key component of the block-matching algorithm is the judg- tween neighboring patches [29]. For the potential matching pair
ing criterion: if the condition √ of |(H1 + H2 + . . . + HD ) (M Q(k), M W (k)), we assume that (M Q(i), M W (i)) (i =
− (L1 + L2 + . . . + LD )| > D × Tdis is tenable, then the 1, 2, . . . , K) are the K neighboring pairs within the circle of
conclusion of SIM(B(1).f t, B(2).f t) > Tdis can be derived. radius Tr and that ai (i = 1, 2, . . .) are the angles between
The mathematical proof is derived in detail in the Appendix. M Q(k) and M Q(i). Similarly, a
i (i = 1, 2, . . .) are the angles
The proposed block-matching algorithm can be expanded to between M W (k) and M W (i). Both ai and a
i meet the follow-
other feature-matching applications. Using a 320 × 240 pixel ing criteria:ai ∈ [0, π2 ], and a
i ∈ [0, π2 ]. Let i = |ai − a
i | (i =
image as an example, the Speeded-Up Robust Features (SURF) 1, 2, . . . , K) and var be the variance of [1 , 2 , . . . , i , . . .].
of the image was extracted, yielding a total of 174 feature points. (M Q(k), M W (k)) will be retained when var meets the condi-
The threshold was set as Tdis = 0.006, and the calculation tion var ≤ π4 , which is illustrated in Fig. 3.
times of Euclidean distance using two types of algorithms were Finally, the remaining matching pairs in M Q and M W ,
compared. Table III shows that the proposed block-matching which are denoted as Im atch , are output in binary image. A
830 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO. 4, APRIL 2018

some falsely matched pairs in the binary image. These pairs


usually appear in the authentic regions which are highly similar.
While in (d), the falsely matched pairs have been removed. It
proves the effectiveness of the PVS method.
Assume that num and Floc represent the number of white
patches in Im atch and their locations, respectively. In general,
the forgery occurs in successive video frames and that the num-
Fig. 3. Illustrations of source and tampered areas: (a) the source area, ber of frames that will be tampered in a video will cover more
(b) the tampered area with rotation degree da n g l e , and (c) the tampered area than one second. If num ≥ 2, then Icu r r en t will be processed as
with mirroring. described in III-D; otherwise, Icu r r en t is considered authentic.
Accordingly, we hold that the next few frames are also unal-
TABLE IV
PSEUDO CODE OF STEP 4 tered with large probability. Meanwhile, an interval parameter
(i.e., Tin ter v al ) is defined; let current = current + Tin ter v al
and then jump back to III-A to continue with detection.
In the real world, Tin ter v al can be adjusted easily as needed.
As the value of Tin ter v al increases, the speed of the whole
algorithm will also increase, but the precision will decrease. To
balance the run-time and precision, we set Tin ter v al = 5 in this
paper.

D. Tracking the Forged Area in the Subsequent Frames


In this section, we will describe how to locate the tampered
areas in the subsequent frames in detail. To avoid detecting the
duplicated regions frame by frame, we introduce the target track-
ing technique in our algorithm. AFCT algorithm is proposed in
this section, which is improved on the FCT algorithm with two
aspects by the optimization of learning rate and measurement
matrix.
1) AFCT Algorithm: The FCT algorithm is a simple and
highly efficient target tracking algorithm based on compression
sensing and has a significant advantage in terms of compu-
tational efficiency. However, it is very sensitive to appearance
models, which significantly impact the tracking accuracy. When
the target appearance models undergo great variation, such as
interference or drift, the target can be easily lost. One important
reason for this phenomenon is that the learning rate λ (λ = 0.85)
of the FCT algorithm is fixed. We conclude from (5) and (6) that
λ determines the refresh rates of ui , σi , u, and σ. Once classifi-
cation errors have occurred, parameter updating that occurs too
fast will impact the classification of the next frame.
To overcome these problems, firstly we make λ to be an
adaptive parameter. When a classification error happens, the
AFCT algorithm rapidly increases λ to improve the learning
speed of the existing targets. Thus, λ will be adaptively refreshed
according to (28), as in

Fig. 4. The results using PVS method to eliminate these falsely matched pairs:
− 5 tan(α1 x+β ) 0 < λ ≤ 1
λ= (28)
(a) the original untampered video clip; (b) the video clip tampered via intra- 1 λ>1
frame region duplication; (c) the binary image after using block matching; and
(d) the binary image of using PVS method to remove the falsely matched pairs. where

|ui − u| when refreshing u
structuring element of square with 10∗10 is constructed to di- x=  2 
σ − σ 2  when refreshing σ 2
late the Im atch . Then erase the regions where the pixels are less i

than 2000 in Im atch and the white patches represent the tam- where α = 1.4, β = 2.3, ui and σi represent the mean and vari-
pered areas. The pseudo code for this step is given in Table IV. ance of the existing targets, while u and σ represent that of the
An example of using PVS to eliminate falsely matched pairs current targets, respectively. The current targets’ extent of de-
is shown in Fig. 4. We observe from Fig. 4(c) that there exist viation from existing targets is judged by differences in mean
SU et al.: FAST FORGERY DETECTION ALGORITHM BASED ON EFMs FOR VIDEO REGION DUPLICATION 831

TABLE V
COMPARISON OF THE SUCCESS RATES

Video FCT AFCT

Bolt 74 92
Yarn 65 96
Biker 80 88
David indoor 92 93
Occluded face 97 97
Shaking 76 87

We use a success rate metric to evaluate the performances


of the AFCT and FCT algorithms. This metric is defined as
follows:
area (ROIT ∩ ROIG )
score = (30)
Fig. 5. The plot of (28): the x-axis represents the variable x in (28) and the area (ROIT ∪ ROIG )
y-axis represents the learning rate λ.
where ROIT is the tracking bounding box, and ROIG is the
ground truth bounding box. If the score exceeds 0.5 in one frame,
and variance of the current and existing targets. The higher the the tracking result is considered to be successful. Experiments
extent of deviation, the less reliable the current classification comparing the AFCT and FCT algorithms were performed us-
result is. Meanwhile, the learning speed of ui (or σi ) will be ing the experimental dataset downloaded from [31]. The success
improved, or the learning speed of u (or σ) will be increased. rates are compared in Table V and the screenshots of two of
The plot of λ is presented in Fig. 5. This figure shows that the the sampled tracking results (i.e., the results of Bolt and Yarn,
latter half of the curve is steeper than the first half, which is more as shown in Table V) for the FCT and AFCT algorithms are
conducive to decreasing the impact of erroneous classification presented in Figs. 6 and 7, respectively. Table V shows that the
on the parameter refreshment. success rate of the AFCT algorithm is higher than that of the
The measurement matrix used in the FCT impacts the com- FCT algorithm. Furthermore, Fig. 6 indicates that when using
puting complexity and the original information included in each the FCT algorithm, the deviation results in tracking failure.
measurement element. In the AFCT, a random Gaussian matrix However, this problem is solved when using the AFCT algo-
is adopted as the measurement matrix, which is defined as: rithm, as shown in Fig. 7. The results demonstrate that AFCT
algorithm has higher tracking accuracy than FCT algorithm.

R (i, j) = rij = γ/ log γ 2) Forged Region Location in the Subsequent Frames: As re-
⎧ ported in Section III-C, num is the number of tampered regions
⎪ log γ
⎨1 with the probability of 2γ in Icu r r en t , and Floc reflects their locations (Floc (i) refers to the
× 0 with the probability of 1 − logγ γ location of the i-th forged area in Icu r r en t , i = 2, 3, . . . , num).

⎩−1 with the probability of log γ The AFCT algorithm stops when it encounters one of the fol-

lowing conditions and denotes the frame number as stop:
(29)
Condition #1: The tracking areas go beyond a frame.
Condition #2: The size of the tracking target is less than
where γ is the dimension of the Haar-like feature vector. Tr∗ Tr .
The measuring complexity and carried original information Condition #3: The straight-line distance between two track-
represent a pair of ambivalent problems. The newly defined mea- ing targets is less than TL .
surement matrix R in the AFCT algorithm differs from that in Condition #4: The currently tracking frame is the last frame
the FCT algorithm in terms of three aspects. First, the FCT algo- of the video, which is denoted as last.
rithm adopts a fixed measurement matrix for all tracked objects, If stop = last is true, the whole algorithm stops. Otherwise,
while the AFCT algorithm adopts an adaptive measurement let current = stop and return to III-A to continue with detection.
matrix for each tracked object. Second, the number of non-zero
elements in each row of the newly defined measurement matrix
E. The Whole Algorithm
in the AFCT algorithm is greater than that in the FCT algorithm,
and thus, more original information is preserved, and the accu- Input the suspicious video, Tin ter v al , the value first of the first
racy and robustness are improved. Third, the number of rows frame of the video and the value last of the last frame. Fig. 8
of the measurement matrix in the FCT algorithm, m, is fixed shows the flow chart of the proposed algorithm.
as 100, whereas in the AFCT algorithm, it varies with different Step 1: Let current = first;
tracked targets; a value of m that is less than 20 can result in Step 2: If current is greater than last, the whole algorithm
more stable tracking in the AFCT algorithm than in the FCT stops; otherwise, extract the EFMs in Icu r r en t as
algorithm. described in III-A;
832 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO. 4, APRIL 2018

Fig. 6. Experimental results of the FCT algorithm. Fig. 7. Experimental results of the AFCT algorithm.

Step 3: Obtain the potential matching set of MQ and MW by


adopting the block-matching algorithm described in The computer used for all the experiments in our paper was
III-B; configured as follows:
Step 4: Obtain num and Floc after eliminating the falsely
matched pairs described in III-C. If Icu r r en t is au- CPU: Intel(R) Core(TM) i7-4700 3.6 GHz.
thentic, let current = current + Tin ter v al and re- Memory Size: 16 GB.
turn to Step-2; otherwise, continue the steps; Video Card: NVIDIA GeForce GT 970M.
Step 5: Locate the tampered areas in the subsequent frames OS: Microsoft Windows 7.
by adopting the AFCT algorithm described in III-D, Coding: MATLAB Version 7.12.0.635 (R2011a).
and record the frame number stop where the AFCT
algorithm stops;
Step 6: Let current = stop and return to Step-2. A. Parameter Discussion
As described previously, Tdis is an important parameter in
IV. EXPERIMENTS AND ANALYSIS our algorithm and primarily determines the number of potential
To evaluate the performance of the proposed algorithm, we matching pairs. A larger value of Tdis corresponds to a larger
conducted a number of experiments. The generation of all videos number of potential matching pairs and higher computational
used in our experiments can be clustered into three classes: complexity. Conversely, a smaller value of Tdis corresponds
1) videos recorded by fixed/hand-held cameras, 2) videos down- to fewer potential matching pairs as well as fewer real forged
loaded from the Internet, and 3) videos downloaded from the ones. To determine Tdis , we conducted a number of experiments
Surrey University Library for Forensic Analysis (SULFA) [32]. by selecting 30 original images and their corresponding forged
SU et al.: FAST FORGERY DETECTION ALGORITHM BASED ON EFMs FOR VIDEO REGION DUPLICATION 833

TABLE VI
COMMON IMAGE MOMENTS AND THEIR RADIAL BASIS FUNCTIONS

Image moment Radial basis function (Q n (r))


(n −|l |)/ 2
Q n l (r) = s= 0
(−1)s r n −2 s
Zernike moment [33]
(n −s )!
×

s !((n + |l |)/ 2 −s )!((n −|l |)/ 2 −s )!

⎪ √1
⎨ r ,n = 0
RHFMs [34] Q n (r) = √2 cos(πnr), n is an even number
⎪ r
⎩ √2 sin(π(n + 1)r), n is an odd number
r
PCT [35] Q n (r) = cos(πnr 2 )
PST [35] Q n (r) = sin(πnr 2 )

TABLE VII
COMPARISON OF THE AVERAGE EXTRACTION TIME OF
COMMON IMAGE MOMENTS

Image moments Feature dimension Extraction time

Zernike 12 6.32 s
RHFMs 7 5.43 s
PCT 8 5.97 s
PST 9 7.21 s
EFMs 4 2.51 s
Fig. 8. Flow chart of the proposed approach.

versions (60 images in total) from CASIA 2.0.1 We built and TABLE VIII
COMPARISON OF THE AVERAGE EXTRACTION TIME OF EFMS,
analyzed the distribution of the Euclidean distances between all SIFT, AND FISIFT
matching pairs for each image.
In Fig. 9, we illustrated the distribution of Euclidean distances
Algorithm 320∗ 240 640∗ 480 1024∗ 768
between all matching pairs for both the original and forged im-
ages with “Euclidean distance” on the y-axis and “serial number EFMs 0.71 2.51 8.43
of matching pairs” on the x-axis. In this figure, (a) and (c) are the SIFT 1.04 2.76 6.47
FISIFT 1.76 5.87 17.66
distributions of Euclidean distances from two different original
images, while (b) and (d) are the distributions obtained from
their corresponding forged versions. The forged region in (b)
was not subjected to post-processing, while the one in (d) was in Table VI. The EFMs are compared with other image
mirrored. (Note: Euclidean distance values above 0.05 are not moments in terms of their feature dimensions and extraction
shown in Fig. 9) time in Table VII. The comparison experiments were performed
Fig. 8(a) shows that nearly every matching pair’s Euclidean on 50 images at a resolution of 640 × 480 pixels, which were
distance exceeds 0.008, while in Fig. 8(b), some matching pairs divided into blocks of 8 × 8 pixels each. Clearly, the EFMs
exist whose Euclidean distances are distributed in [0, 0.008]. A are superior to the other moments in terms of both the feature
similar phenomenon is evident in Fig. 8(c) and (d). Accordingly, dimension and extraction time.
we inferred that the matching pairs whose Euclidean distances Similarly, we also compared EFMs with SIFT and FISIFT
are below 0.008 are potential matching pairs in the forged im- in terms of the average extraction time. The comparison exper-
ages. Finally, considering the computational complexity and iments were performed on 50 images at resolutions of 320 ×
precision, we fixed the parameter as Tdis = 0.006, which made 240 pixels, 640 × 480 pixels and 1024 × 768 pixels (50 images
our algorithm convenient to use. were used for each resolution). Table VIII shows that FISIFT
requires the highest computation complexity at the various res-
B. Performance Comparison of Different Features olutions, while the EFMs exhibited the best performance at low
resolution. As the image resolution increased, the performance
1) Comparison of the Extraction Time: As described in
of SIFT became superior to that of the EFMs.
Section III-A, we expanded the improved feature extraction al-
2) Comparison of the Accuracy: We also compared the ac-
gorithm to the extraction of other image moments, as shown
curacy of the EFMs with those of other image moment fea-
tures (shown in Table VI) by employing the methods proposed
1Credit for the use of the CASIA Image Tempering Detection Evaluation in Sections III-B and III-C. The experiments were conducted
Database (CAISA TIDE) V2.0 is given to the National Laboratory of Pattern
Recognition, Institute of Automation, Chinese Academy of Science, Corel Im- by selecting 100 forged images from CASIA V2.0. Instead of
age Database and the photographers. http://forensics.idealtest.org the information of original areas, the forged images selected in
834 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO. 4, APRIL 2018

Fig. 9. The distributions of the Euclidean distances between all matching pairs for both the original and forged images: (a) and (b) are the distributions of
Euclidean distances from two different original images, and (c) and (d) are the distributions from their corresponding forged versions.

TABLE IX C. Detection of Tampered Areas Without Post-Processing


ACCURACY COMPARISON OF THE EFMS AND OTHER
IMAGE MOMENT FEATURES In the experiments in this section, all the duplicated areas were
created without post-processing; that is, they were identical to
image moment features TPR the original areas. Fig. 10 presents screenshots of some detection
results.
PST 0.80
PCT 0.78 In Fig. 10, videos 1 and 3 were downloaded from the Internet
Zernike 0.83 while video 2 was recorded by fixed cameras. (a) and (b) are
RHFMs 0.87 screenshots from the original videos and their tampered ver-
EFMs 0.93
sions, respectively. (c) shows the detection results in the current
frames in binary images where the white patches represent the
duplicated areas. (d) and (e) are the duplicated regions in the
CASIA V2.0 provided the information of duplication areas only. subsequent frames located by the AFCT algorithm, and they
Therefore, to evaluate the accuracy performance of these image are marked by red boxes in the same frames in (f).
moment features, we consider the performance indices of de- Fig. 10 shows that the proposed algorithm accurately located
tection rate (T P R), which is defined as follows: the tampered areas in the videos, demonstrating its effectiveness
in detecting tampered areas without post-processing.
P Rde Detection of Tampered Areas with Post-Processing
TPR = (31)
P Rap Generally, transformations will be conducted in duplicated
regions, or a combination of multiple geometric transforma-
where P Rde is the number of pixels detected in the forged areas, tions. In this section, all the duplicated areas in the experiments
and P Rap is the actual number of pixels in the duplicated areas. were geometrically transformed. In Fig. 11, the duplicated ar-
Theoretically, the higher detection rates should correspond to eas in videos 4 and 5 were scaled down and scaled up, respec-
better accuracy. Table IX shows accuracy comparison of the tively, and those in video 6 were rotated 180°. In Fig. 12, the
EFMs and other image moment features. The detection rate duplicated regions in videos 7–9 were mirrored, while video
(T P R) of the EFMs is 0.93, which is higher than those of other 7 was recorded by hand-held cameras with a slight shaking.
image moment features. Figs. 11 and 12 show that our algorithm can detect duplication
SU et al.: FAST FORGERY DETECTION ALGORITHM BASED ON EFMs FOR VIDEO REGION DUPLICATION 835

Fig. 12. Test video snapshots: (a) the original untampered video clip, (b) the
tampered video clip with intra-frame region duplication, (c) the binary images
of the tampered areas, (d) and (e) the video clips showing the locations of the
Fig. 10. Test video snapshots: (a) the original untampered video clip, (b) the tampered areas determined by the AFCT algorithm, and (f) the tampered areas
tampered video clip with intra-frame region duplication, (c) the binary images in subsequent frames.
of the tampered areas, (d) and (e) the video clips showing the locations of the
tampered areas determined by the AFCT algorithm, and (f) the tampered areas
in subsequent frames. forgery under geometrical transforms accurately, demonstrating
the EFMs’ good invariance under translation, scaling, rotation,
and mirroring.

D. Multiple Forgery Detection


Multiple forgeries are frequently conducted in videos. In this
section, several forgeries were implemented in the test videos.
In Fig. 13, an object, such as the ceramic pot in video 10 or
the house in video 11, was selected and duplicated to different
locations in the same frame and several subsequent frames.
In video 12, the trash can was post-processed and duplicated
several times. While in Fig. 14, objects in different regions,
such as the streetlights, were selected and pasted to different
locations. The detection results indicate that the algorithm can
detect multiple duplication forgeries effectively.

E. Moving Target Forgery Detection


In real life, the objects in most videos are in motion. In this
section, duplicated areas consisting of moving targets in the test
videos were considered. In Fig. 15, the peach, as a stationary
target, was selected and duplicated to different locations in dif-
ferent frames, making it appear to be a moving target. While
Fig. 11. Test video snapshots: (a) the original untampered video clip, (b) the in Fig. 16, the moving Scotch tape was selected and duplicated
tampered video clip with intra-frame region duplication, (c) the binary images to different locations, leading to the appearance of two mov-
of the tampered areas, (d) and (e) the video clips showing the locations of the
tampered areas determined by the AFCT algorithm, and (f) the tampered areas
ing objects in the forged video. The detection results shown in
in subsequent frames. Figs. 15 and 16 prove the effectiveness of our algorithm for
detecting the moving target forgery.
836 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO. 4, APRIL 2018

Fig. 15. Test video snapshots (a stationary target is duplicated to different


areas): (a) the original video clip; (b), (c), and (d) the tempered video clips with
intra-frame region duplication; (e) the binary images of the tampered areas;
(f)–(i) the video clips showing the locations of the tampered areas determined
by the AFCT algorithm; and (j)–(l) the tampered areas in subsequent frames.

TABLE X
DETAILS OF SOME OF THE TEST VIDEOS USED IN
SECTION IV-G

Fig. 13. Test video snapshots (a region in the video was copied and pasted
to multiple locations): (a) the original video clip; (b) the tampered video clip Video Length Resolution
with intra-frame region duplication; (c) the binary images of the tampered areas;
(d)–(f) the video clips showing the locations of the tampered areas determined Video 1 100 320∗ 240
by the AFCT algorithm; and (g) the tampered areas in subsequent frames. Video 2 100 640∗ 480
Video 3 150 1280∗ 720
Video 7 100 1024∗ 768
Video 8 150 1024∗ 768
Video 9 150 1024∗ 768

F. Comparison of the Execution Time


In this section, we compared the execution time of our algo-
rithm to those proposed in [14] (denoted as A.V.), [13] (denoted
as Wang), and [15] (denoted as Pun). In addition, we replaced
the EFMs in our algorithm with SIFT or FISIFT [28], which
are typical characteristics used in image or video processing.
Table X shows the details of some test videos used in this ex-
periment; the execution time is provided in Table XI.
Table XI clearly shows that our proposed algorithm exhibits
Fig. 14. Test video snapshots (multiple areas in the video are copied and outstanding performance in terms of the computational effi-
pasted to different locations): (a) the original untampered video clip; (b) the ciency. First, our algorithm has higher computational efficiency
tampered video clip with intra-frame region duplication; (c) the binary images than those proposed in [13] (denoted as Wang), [14] (denoted as
of the tampered areas; (d)–(g) the video clips showing the locations of the
tampered areas determined by the AFCT algorithm; and (h) the tampered areas A.V.) and [15] (denoted as Pun), which are frame-by-frame al-
in subsequent frames. gorithms. Next, because of the high feature dimensions of SIFT
SU et al.: FAST FORGERY DETECTION ALGORITHM BASED ON EFMs FOR VIDEO REGION DUPLICATION 837

Fig. 17. Test video snapshots: (a) the original untampered video clip, (b) the
tampered video clip with intra-frame region duplication, (c) the binary images
of the tampered areas, (d) and (e) the video clips showing the locations of the
Fig. 16. Test video snapshots (a moving target in different frames is duplicated tampered areas determined by the AFCT algorithm, and (f) the tampered areas
to different locations): (a) the original video clip; (b)–(d) the tampered video in subsequent frames.
clips with intra-frame region duplication; (e) the binary images; (f)–(i) the video
clips showing the locations of the tampered areas determined by the AFCT TABLE XII
algorithm; and (j)–(l) the tampered areas in subsequent frames. PERFORMANCE EVALUATION OF THE
PROPOSED ALGORITHM
TABLE XI
COMPARISON OF THE EXECUTION TIMES OF DIFFERENT ALGORITHMS
Positive Negative

Video Proposed Wang A.V Pun SIFT FISIFT True 96.9% 89.3%
False 3.1% 10.7%
Video 1 42 68 206 711 52 80
Video 2 96 222 785 >1000 112 157
Video 3 431 671 >1000 >1000 444 582
Video 7 192 464 >1000 >1000 249 308
Video 8 243 550 >1000 >1000 312 401 F N indicates that a forged frame is detected as authentic.
Video 9 224 538 >1000 >1000 280 396 Theoretically, if the algorithm used to detect frame duplica-
tion achieves higher precision and recall, its detection rate is
considered better. (T P + T N ) represents the total number of
detections, and (T P + T N + F P + F N ) represents the total
and FISIFT, the time required for block matching is higher than
number of frames in the experiments. Therefore, DA is the per-
that needed by EFMs, resulting in increases in the total time.
centage of correct detection. It is assumed that a higher DA
corresponds to a better detection rate using the proposed algo-
G. Comparison of the Detection Accuracy
rithm. Fig. 17 and Table XII show the performance evaluations
To evaluate the capability of the proposed algorithm, we con- of the proposed algorithm.
sider three performance indices, i.e., precision rate (P R), recall We compared the proposed algorithm results with those of
rate (RR), and detection accuracy (DA), which are defined as other methods in terms of the accuracy and mirror invariance.
follows: The comparison experiments were performed on the experi-
TP TP mental dataset of 50 videos downloaded from SULFA (23,000
PR = RR = (32) frames in total). Region duplication was simulated by selecting
TP + FP TP + FN
an area in a frame and duplicating it to another non-overlapping
TP + TN
DA = (33) position in the same frame and several subsequent frames.
TP + TN + FP + FN The experimental dataset was divided into two parts: the 35
where T P indicates that an authentic frame is detected as au- videos (13,000 frames in total) in which region duplication
thentic, T N indicates that a forged frame is detected as a forgery, was simulated without post-processing and the remaining 15
F P indicates that an authentic frame is detected as a forgery, and videos (10,000 frames in total) in which duplicated regions were
838 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO. 4, APRIL 2018

TABLE XIII
COMPARISON OF THE DETECTION ACCURACY AMONG ALGORITHMS

Algorithm Detection accuracy Mirror invariant

Pun 91.4% NO
A.V. 88.3% NO
Wang 69.7% NO
SIFT 87.9% NO
FISIFT 92.2% YES
Proposed 93.1% YES

TABLE XIV
COMPARISON OF THE DETECTION ACCURACY AMONG ALGORITHMS FOR
DETECTING DUPLICATED REGIONS SUBJECTED TO GEOMETRICAL
TRANSFORMATION

Algorithm Scale up Scale down Rotation

Pun 84.3% 87.6% 92.5%


A.V. 78.6% 79.1% 87.9%
Wang 62.1% 62.8% 72.3%
SIFT 73.7% 77.2% 87.7%
FISIFT 81.2% 86.1% 91.8%
Proposed 86.7% 90.2% 95.6%
Fig. 18. Examples of the experimental results of the method proposed in
[15]: (a) the original video snapshots, (b) the forged video snapshots, and
(c) the detection results for the different videos.

geometrically transformed, such as by scaling up, scaling down,


rotation and mirroring. The results are presented in Tables XIII
and XIV.
According to Table XIII, when detecting forged regions with-
out post-processing, the DA of the proposed algorithm is 93.1%,
which is higher than those of the other methods. And only
the proposed algorithm and FISIFT maintain good invariance
for mirroring. Similarly, Table XIV shows that the respective
detection accuracies of the proposed algorithm for detecting
duplicated regions subjected to geometrical transformation are
86.7%, 90.2%, and 95.6% for scaling up, scaling down and ro-
tation, respectively. These values are higher than those of other
algorithms. As shown in Table XIV, the SIFT/FISIFT algorithm
is sufficient for detecting videos without post-processing but not
suitable for those subjected to geometrical transformations.
Some typical experimental results obtained using the method
proposed in [15] (denoted by Pun) are shown in Fig. 18. In
Fig. 18, the first column is the original video snapshots, the
second column is the forged versions, and the last column is
the corresponding detection results. Fig. 18 demonstrates that Fig. 19. Examples of the experimental results obtained with SIFT and FISIFT:
this method is not only ineffective for detecting videos with (a) the original untampered video clip, (b) the tampered video clip with intra-
mirroring but also does not perform well in areas with similar frame region duplication, (c) the detection result of SIFT, and (d) the detection
result of FISIFT.
textures.
The experimental results obtained by replacing the EFMs
in our algorithm with SIFT or FISIFT are shown in Fig. 19.
cation. Our algorithm first extracts EFMs features from each
This figure shows that when SIFT or FISIFT is adopted, only
block in the current frame, and performs a fast match to find
a few matching points remain after removing falsely matched
potential matching pairs. Then PVS method is designed to elim-
pairs, resulting in being unable to locate the duplicated areas
inate falsely matched pairs and locate the altered regions in the
accurately. However, FISIFT is superior to SIFT in area locating.
current frame. Finally, AFCT algorithm is used to track the
tampered regions in the subsequent frames. The experimental
V. CONCLUSIONS AND FUTURE WORK
results show that our proposed algorithm has higher detection
In this paper, we present a fast forgery detection algorithm accuracy and computational efficiency than those of previous
based on EFMs for the detection of intra-frame region dupli- algorithms.
SU et al.: FAST FORGERY DETECTION ALGORITHM BASED ON EFMs FOR VIDEO REGION DUPLICATION 839

TABLE XV [10] W. Chen and Y. Shi, “Detection of double MPEG compression based on
PROOF OF THE JUDGING CRITERION DESCRIBED IN SECTION III-B first digit statistics,” in International Workshop on Digital Watermarking
(Lecture Notes in Computer Science), vol. 5450. New York, NY, USA:
Springer, 2009, pp. 16–30.
[11] S.-Y. Liao and T.-Q. Huang, “Video copy-move forgery detection and
localization based on Tamura texture features,” in Proc. 2013 6th Int.
Congr. Image Signal Process., 2013, pp. 864–868.
[12] J. Yang, T. Huang, and L. Su, “Using similarity analysis to detect frame
duplication forgery in videos,” Multimedia Tools Appl., vol. 75, pp. 1793–
1811, 2014.
[13] W. Wang and H. Farid, “Exposing digital forgeries in video by de-
tecting duplication,” in Proc. 9th workshop Multimedia Security, 2007,
pp. 35–42.
[14] A. V. Subramanyam and S. Emmanuel, “Video forgery detection using
HOG features and compression properties,” in Proc. IEEE Int. Workshop
Multimedia Signal Process., 2012, pp. 89–94.
[15] C.-M. Pun, X.-C. Yuan, and X.-L. Bi, “Image forgery detection using
adaptive oversegmentation and feature point matching,” IEEE Trans. Inf.
Forensics Security, vol. 10, no. 8, pp. 1705–1716, Aug. 2015.
[16] Z. Kaihua, Z. Lei, and Y. Ming-Hsuan, “Fast compressive tracking,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 36, no. 10, pp. 2002–2015, Oct.
2014.
[17] K. Zhang, L. Zhang, and M.-H. Yang, “Real-time compressive tracking,”
in Proc. Eur. Conf. Comput. Vis., 2012, pp. 864–877.
[18] E. J. Candes and T. Tao, “Near optimal signal recovery from random pro-
jections: Universal encoding strategies,” IEEE Trans. Inf. Theory, vol. 52,
no. 12, pp. 5406–5425, Dec. 2006.
[19] A. Jordan, “On discriminative vs. generative classifiers: A comparison
of logistic regression and naive bayes,” Adv. Neural Inf. Process. Syst.,
vol. 14, pp. 841–842, 2002.
[20] P. Diaconis and D. Freedman, “Asymptotics of graphical projection pur-
suit,” Ann. Statist., vol. 12, pp. 793–815, 1984.
[21] Z. Ping, “FFT algorithm of complex exponent moments and its application
in image recognition,” in Proc. SPIE, Int. Soc. Opt. Eng., 2014, vol. 9159,
pp. 4177–4180.
[22] H.-T. Hu, Y.-D. Zhang, C. Shao, and Q. Ju, “Orthogonal moments based
on exponent functions: Exponent-Fourier moments,” Pattern Recognit.,
vol. 47, pp. 2596–2606, 2014.
[23] Y. Jiang, “Exponent moments and its application in pattern recognition,”
In future research, we will extend our method to detect more Ph.D. dissertation, Beijing Univ. Posts Telecommun., Beijing, China,
challenging types of video forgery. 2011.
[24] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
Int. J. Comput. Vis., vol. 60, pp. 91–110, 2004.
APPENDIX [25] I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, and G. Serra, “A SIFT-
based forensic method for copy–move attack detection and transformation
Mathematical proof of the judging criterion described in recovery,” IEEE Trans. Inf. Forensics Security, vol. 6, no. 3, pp. 1099–
Section III-B is shown in Table XV. 1110, Sep. 2011.
[26] N. Dalal and B. Triggs, “Histograms of oriented gradients for human de-
tection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.,
REFERENCES 2005, pp. 886–893.
[27] J. C. Lee, C. P. Chang, and W. K. Chen, “Detection of copy–move im-
[1] H. Yin, W. Hui, H. Li, C. Lin, and W. Zhu, “A novel large-scale digital age forgery using histogram of orientated gradients,” Inf. Sci., vol. 321,
forensics service platform for internet videos,” IEEE Trans. Multimedia, pp. 250–262, 2015.
vol. 14, no. 1, pp. 178–186, Feb. 2012. [28] W. L. Zhao and C. W. Ngo, “Flip-invariant sift for copy and object de-
[2] T. Stütz, F. Autrusseau, and A. Uhl, “Non-blind structure-preserving sub- tection,” IEEE Trans. Image Process., vol. 22, no. 3, pp. 980–991, Mar.
stitution watermarking of H.264/CAVLC inter-frames,” IEEE Trans. Mul- 2013.
timedia, vol. 16, no. 5, pp. 1337–1349, Aug. 2014. [29] Y. Li, “Image copy-move forgery detection based on polar cosine trans-
[3] M. Kobayashi, T. Okabe, and Y. Sato, “Detecting video forgeries based form and approximate nearest neighbor searching,” Forensic Sci. Int.,
on noise characteristics,” in Proc. 3rd Pacific Rim Symp. Adv. Image Video vol. 224, pp. 59–67, 2013.
Technol., Tokyo, Japan, 2009, pp. 306–317. [30] Y. Lai and T. Huang, “Image region copy-move of forgery detection based
[4] S. Milani et al., “An overview on video forensics,” APSIPA Trans. Signal on exponential-Fourier moments,” J. Image Graph., vol. 20, pp. 1212–
Inf. Process., vol. 1, pp. 1229–1233, 2012. 1221, 2015.
[5] X. Feng, I. J. Cox, and G. Doerr, “Normalized energy density-based [31] Fast Compressive Tracking. 2013. [Online]. Available: http://www4.
forensic detection of resampled images,” IEEE Trans. Multimedia, comp.polyu.edu.hk/∼cslzhang/FCT/FCT.htm
vol. 14, no. 3, pp. 536–545, Jun. 2012. [32] G. Qadir, S. Yahaya, and A. T. Ho, “Surrey university library for forensic
[6] S. A. H. Tabatabaei, O. Ur-Rehman, N. Zivic, and C. Ruland, “Secure and analysis (SULFA) of video content,” in Proc. IET Conf. Image Process.,
robust two-phase image authentication,” IEEE Trans. Multimedia, vol. 17, 2012, pp. 1–6.
no. 7, pp. 945–956, Jul. 2015. [33] S. J. Ryu, M. Kirchner, M. J. Lee, and H. K. Lee, “Rotation invariant
[7] M. Kobayashi, T. Okabe, and Y. Sato, “Detecting video forgeries based on localization of duplicated image regions based on Zernike moments,”
noise characteristics,” in Advances in Image and Video Technology. New IEEE Trans. Inf. Forensics Security, vol. 8, no. 8, pp. 1355–1370, Aug.
York, NY, USA: Springer, 2009, pp. 306–317. 2013.
[8] C.-C. Hsu, T.-Y. Hung, C.-W. Lin, and C.-T. Hsu, “Video forgery detec- [34] H. Ren, Z. Ping, W. Bo, W. Wu, and Y. Sheng, “Multidistortion-invariant
tion using correlation of noise residue,” in Proc. IEEE 10th Workshop image recognition with radial harmonic Fourier moments,” J. Opt. Soc.
Multimedia Signal Process., 2008, pp. 170–174. Amer. A, vol. 20, pp. 631–637, 2003.
[9] W. Wang and H. Farid, “Exposing digital forgeries in video by detecting [35] P. T. Yap, X. Jiang, and A. Chichung Kot, “Two-dimensional polar har-
double quantization,” in Proc. 11th ACM Workshop Multimedia Security, monic transforms for invariant image representation,” IEEE Trans. Pattern
2009, pp. 39–48. Anal. Mach. Intell., vol. 32, no. 7, pp. 1259–1270, Jul. 2010.
840 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 20, NO. 4, APRIL 2018

Lichao Su received the B.E. and M.E. degrees in Yuecong Lai was born in Ganzhou, China, in 1991.
computer science and technology from Fujian Nor- He received the Bachelor’s Degree of Science in
mal University, Fuzhou, China, in 2011 and 2014, information and computing sciences from Henan
respectively. He is currently working toward the University of Engineering, Zhengzhou, China, in
Doctoral degree in computer science at Xiamen Uni- 2013, and the Master’s Degree of Science in software
versity, Xiamen, China. His research interests include engineering from Fujian Normal University, Fuzhou,
video processing, multimedia forensics, and informa- China, in 2016. His research interests include image
tion security. processing and multimedia forensics.

Cuihua Li received the B.S. degree in compu-


tational mathematics from Shandong University,
Jinan, China, in 1983, the M.S. degree in computa-
tional mathematics, and the Ph.D. degree in automatic
Jianmei Yang received the B.E. degree in computer
control theory and engineering from Xi’an Jiaotong
science and technology and the M.E. degree in soft-
University, Xi’an, China, in 1989 and 1999, respec-
ware engineering from Fujian Normal University,
tively. Before 1999, he was an Associate Professor
with the School of Science, Xi’an Jiaotong Univer- Fuzhou, China, in 2012 and 2015, respectively. Her
research interests include data mining, multimedia
sity. He is currently in the Department of Computer
forensics, and information security.
Science, Xiamen University, Xiamen, China. His re-
search interests include computer vision, video and
image processing, and superresolution image reconstruction algorithms. He is a
member of the editorial boards of both the Chinese Science Bulletin and Journal
of Xiamen University Natural Science.

You might also like