You are on page 1of 14

applied

sciences
Article
DRRU-Net: DCT-Coefficient-Learning RRU-Net for Detecting
an Image-Splicing Forgery
Youngmin Seo and Joongjin Kook *

Department of Information Security Engineering, Sangmyung University, 31 Sangmyungdae-gil,


Dongnam-gu, Cheonan-si 31066, Republic of Korea
* Correspondence: kook@smu.ac.kr

Abstract: In this paper, we propose a lightweight deep learning network (DRRU-Net) for image-
splicing forgery detection. DRRU-Net is an architecture that combines RRU-Net for learning the
visual content of images and image acquisition artifacts, and a JPEG artifact learning module for
learning compression artifacts in the discrete cosine transform (DCT) domain. The backbone model
of a network based on pre-training, such as CAT-Net, a representative network for image forgery
detection, has a relatively large number of parameters, resulting in overfitting in a small dataset,
which hinders generalization performance. Therefore, in this paper, the learning module is designed
to learn the characteristics according to the DCT domain in real time without pre-training. In the
experiments, the proposed network architecture and training method of DRRU-Net show that the
network parameters are smaller than CAT-Net, the forgery detection performance is better than that
of RRU-Net, and the generalization performance for various datasets can be improved.

Keywords: image-splicing forgery; discrete cosine transform; DCT; RRU-Net; DRRU-Net

1. Introduction
As the development of digital image-editing technology and image-editing software
make them accessible to the public, the detection of the increasing number of forgery
images has emerged as an important issue, and research in the field of digital forensics for
detecting a digital image forgery has been actively conducted as well [1–4]. Image forgery
Citation: Seo, Y.; Kook, J. DRRU-Net:
methods include splicing, copy-move, object removal, and morphing [5–7]. Typical image
DCT-Coefficient-Learning RRU-Net
forgery methods and characteristics are as follows:
for Detecting an Image-Splicing
Forgery. Appl. Sci. 2023, 13, 2922. • Splicing: copying parts of one image and pasting them into another image;
https://doi.org/10.3390/app13052922 • Copy-Move: copying parts of one image and pasting them into the same image;
• Object Removal: removing objects within an image and filling the space to fit the
Academic Editor: Yu-Dong Zhang
surrounding background;
Received: 19 January 2023 • Morphing: combining the features of two images and creating a distorted and mixed image.
Revised: 22 February 2023
Image splicing pastes parts of a picture using image-editing software under the condi-
Accepted: 23 February 2023
tion of two or more images prepared. At this time, different images contain information
Published: 24 February 2023
about the environment in which the picture was taken (illuminance, lens, noise, etc.), and
that information may be impossible to visually recognize. The development of image-
editing software makes more sophisticated splicing forgery possible by adjusting the
Copyright: © 2023 by the authors.
illuminance of the background image, adjusting the direction of shadows, and retouching
Licensee MDPI, Basel, Switzerland. boundaries, beyond simply pasting images. Figure 1 shows an example of splicing forgery.
This article is an open access article The detection method of the image-splicing forgery regions has evolved from the
distributed under the terms and traditional method of detecting with a mathematical algorithm based on discrete cosine
conditions of the Creative Commons transform (DCT) to a method of detecting by using a deep learning network, Convolutional
Attribution (CC BY) license (https:// Neural Network (CNN) [8–10]. DCT is one of the digital orthogonal transform coding
creativecommons.org/licenses/by/ methods for image coding and is also used for JPEG compression. DCT decomposes a block
4.0/). of pixels into high-frequency and low-frequency components. Human vision is insensitive

Appl. Sci. 2023, 13, 2922. https://doi.org/10.3390/app13052922 https://www.mdpi.com/journal/applsci


Appl. Sci. 2023, 13, 2922 2 of 14

to the high-frequency information of an image, resulting in compressing the image by


removing high-frequency components. When an edited image is compressed and saved
Appl. Sci. 2022, 12, x FOR PEER REVIEW
again, a compression trace remains on the image. This enables single and double2JPEG
of 17

compression to be detected.

(a) (b) (c) (d)


Figure
Figure 1.
1. Example
Example of
of an
an image-splicing
image-splicing forgery:
forgery: (a)
(a) object
object image;
image; (b)
(b) target
target background;
background; (c)
(c) splicing
splicing
forgery; (d) ground truth for forgery region.
forgery; (d) ground truth for forgery region.

The
It hasdetection
already method
been proven of thethat image-splicing
DCT can beforgery used toregions has evolved
find forensic clues in fromimagethe
traditional method
forensics through theoftraditional
detecting splicing
with a mathematical
forgery detection algorithm
methodbased [11,12]. on However,
discrete cosine
it has
transform
not achieved (DCT)
notableto aresults
method on ofthedetecting
localization by ofusing a deep learning
the forgery region. Innetwork,
recent years, Convolu-
CNN
tional Neuralto
is confirmed Network (CNN) [8–10].
have excellent performance DCT isfor onethe
of image
the digital
feature orthogonal
extraction transform
[13–15],cod-and
ing methods
research for image
is being conducted coding to and
classifyis also used images,
forgery for JPEGtocompression.
detect forgery DCT decomposes
regions in pixel
aunits,
blockandof to
pixels into them
localize high-frequency
by using CNN and [16,17].
low-frequency components.
The segmentation Human
method forvision
forgery is
insensitive to the high-frequency
region localization using CNN has information
been studiedof byan image,image
learning resulting in compressing
acquisition the
artifacts that
image by removing
can be obtained from high-frequency
images, by learning components.
compression When an edited
artifacts from DCT image is compressed
domains [18–20],
or bysaved
and learning
again,domains that can trace
a compression be collected
remains from
on theimages
image.together.
This enables single and dou-
ble JPEGMantranet [21] and
compression RRU-Net
to be detected. [22], released in 2019, perform forgery detection and
semantic
It hassegmentation
already beenby inputting
proven that only
DCTthe canRGB streams
be used of images.
to find forensicThe accuracy
clues in image of the
fo-
detection of the public dataset used in that study was 80%,
rensics through the traditional splicing forgery detection method [11,12]. However, it has which is not considered high
performance
not achieved notableas of today,resultsbuton it the
waslocalization
the performance accuracy
of the forgery whenInusing
region. recent only theCNN
years, RGB
stream.
is Therefore,
confirmed to have it isexcellent
expectedperformance
to be appliedfor in the
variousimage ways for further
feature extraction research because
[13–15], and
it is excellent
research in feature
is being conductedextraction for the
to classify forgeryimages,
forgery regions. to detect forgery regions in pixel
units,CAT-Net [23], released
and to localize them byinusing 2022,CNN further searches
[16,17]. The the compression
segmentation artifacts.
method It is a
for forgery
network that uses HRNet [24], which has proven its excellent
region localization using CNN has been studied by learning image acquisition artifacts performance in computer
vision
that cantasks, such as object
be obtained from detection
images, by and humancompression
learning pose estimation, as a backbone
artifacts from DCTand learns
domains
by inputting RGB and DCT streams. It showed
[18–20], or by learning domains that can be collected from images together. significantly higher performance than
otherMantranet
networks for [21]eight
and out of the nine
RRU-Net [22], public
released datasets
in 2019, used in thatforgery
perform study. However,
detection and this
comparison is not considered appropriate. While the
semantic segmentation by inputting only the RGB streams of images. The accuracy of the HRNet of CAT-Net uses weights
pre-trained
detection of with the ImageNet
the public dataset used [25] dataset,
in that studythe JPEGwas artifact
80%, which learningis not module
considered(JALM) highis
pre-trained with a customized dataset. In the paper,
performance as of today, but it was the performance accuracy when using only the RGBthe CAT-Net w/o D.P learned with
the weights
stream. of CAT-Net,
Therefore, initialized
it is expected randomly,
to be applied showedways
in various significantly
for further lower
research localization
because
performance than the pre-trained CAT-Net.
it is excellent in feature extraction for the forgery regions. Pre-training on the double JPEG compression
of JALM is recommended
CAT-Net [23], released toinimprove localization
2022, further searches performance
the compression in further studies.
artifacts. It is a net-
work that uses HRNet [24], which has proven its excellent performance indoes
Since the HRNet of CAT-Net is a method of learning RGB streams, it not learn
computer vi-
meaningful features of the forgery regions. Considering the results of the performance
sion tasks, such as object detection and human pose estimation, as a backbone and learns
experiment on whether or not JALM learning in the DCT domain is pre-training, it can be
by inputting RGB and DCT streams. It showed significantly higher performance than
seen that CAT-Net is highly dependent on the DCT domain.
other networks for eight out of the nine public datasets used in that study. However, this
In the study of [22], the effectiveness of the ringed residual architecture has been
comparison is not considered appropriate. While the HRNet of CAT-Net uses weights
demonstrated, but there are limitations in using only RGB streams to detect forgery regions.
pre-trained with the ImageNet [25] dataset, the JPEG artifact learning module (JALM) is
To overcome this, the best network for image forgery detection was created in the study
pre-trained with a customized dataset. In the paper, the CAT-Net w/o D.P learned with
of [23], but there is no public dataset for the pre-training of JALM, making it difficult to
the weights of CAT-Net, initialized randomly, showed significantly lower localization
compare between studies. A dataset needs to be prepared for pre-training and a network
performance than the pre-trained CAT-Net. Pre-training on the double JPEG compression
separately built for pre-training JALM. In addition, since the performance is dependent on
of JALM is recommended to improve localization performance in further studies.
the pre-training, the pre-training is compulsory.
Since
In thisthe HRNet
paper, of CAT-Net
we propose DRRU-Net,is a method whichofcombines
learningthe RGB JALM streams, it doeswith
of CAT-Net not learn
RRU-
meaningful features of the forgery regions. Considering
Net as the backbone. DRRU-Net is an architecture that combines RRU-Net for learning the results of the performance
experiment on whether or not JALM learning in the DCT domain is pre-training, it can be
seen that CAT-Net is highly dependent on the DCT domain.
In the study of [22], the effectiveness of the ringed residual architecture has been
demonstrated, but there are limitations in using only RGB streams to detect forgery re-
gions. To overcome this, the best network for image forgery detection was created in the
Appl. Sci. 2023, 13, 2922 3 of 14

the visual contents of images and image acquisition artifacts, and JALM for learning
compression artifacts in the DCT domain. Since the down-sampling path is the same as
RRU-Net, the weights of the down-sampling path of DRRU-Net can be initialized after
pre-training with RRU-Net. An easy pre-training method different from RRU-Net for
improved results was described. Pre-training also works even when using the same dataset
as the proposed network. Experiments verified high generalization performance for other
datasets even with learning a small dataset.
The main contributions of this paper are the following:
• A network of a universally available simple architecture was designed to compensate
for the above disadvantages of RRU-Net and CAT-Net.
• A new network, DRRU-Net, which combines the JALM of CAT-Net with RRU-Net as
the backbone was proposed and the architecture of DRRU-Net and an easy pre-training
method different from CAT-Net for improved results was described.
• Pre-training also works even when using the same dataset as the proposed network.
• Experiments verified high generalization performance for other datasets even with
learning a small dataset.
The remainder of this paper is structured as follows: Section 2 describes the dataset
used in DRRU-Net, and the underlying networks. In Section 3, the architecture and
algorithm of DRRU-Net are explained, and in Section 4, the performance evaluation
experiments of DRRU-Net and their results are explained. In the first experiment in
Section 4, a pre-training method to enhance the search for visual cues and image acquisition
artifacts in RGB streams is proposed. In the second experiment, a method to enhance
the search algorithm for the intrinsic property difference of DCT streams for the training
of DRRU-Net with randomly initialized weights without pre-training is proposed. The
domain of the training dataset used in the experiment is Splicing, and the performance
evaluation is performed using Splicing datasets, Copy-Move datasets, and datasets that
include both of them. In the experiments, the generalization performance differentiated
from the previously released network is verified by comparing the copy-move forgery
detection performance, not the training domain.

2. Related Works
DEFACTO [26] is a collection of forgery image datasets including the image forgery
domains Splicing, Copy-Move, Removal, and Morphing. Some of the splicing datasets
and copy-move datasets among DEFCATO are used in the experiment. CASIAv2 [27] is a
collection of datasets that include a set of authenticated images before forgery and a set of
forgery images in which splicing and copy-move images are mixed.
U-Net [24], an underlying model of RRU-Net, outperformed existing methods in the
International Symposium on Biomedical Imaging (ISBI) challenge for the segmentation
of neural structures in electron microscopy stacks in 2015. U-Net is divided into the
contracting path and expanding path, and the two paths are symmetrical. The output of
each contracting-path block is followed by the up-sampling result of the previous output
of the symmetric expanding-path block, and then connecting to the input. This allows
high-resolution capabilities to achieve accurate localization while reducing the loss of
detailed information.
ResNet [28] is the network which won the ImageNet Classification Challenge in 2015.
Residual mapping was first proposed in ResNet and is defined as Equation (1).

y = F(x) + x (1)

In Equation (1), x is the input of the layer and y is the output. F ( x ) + x performs
addition by connecting the input and its operation result with a shortcut (skip) connection.
Residual mapping and shortcut connection have been actively used in CNN research to
relieve the gradient descent problem.
Appl. Sci. 2023, 13, 2922 4 of 14

RRU-Net applies the ringed residual architecture proposed in the paper of the U-Net
architecture. The ringed residual architecture utilizes the residual architecture of ResNet.
It is a building block architecture in which two residual propagations (forward direction)
and one residual feedback propagation (reverse direction) are performed instead of one
residual propagation in a block where two convolution layers are connected. This can be
expressed as shown in Equations (2) and (3).

y f = F ( x, Wi ) + Ws × x (2)

yb = (s( G (y f ) + 1)) × x (3)


y f is the output of the building block, Wi is the weight of layer i, and F ( x, Wi ) repre-
sents the residual mapping to be learned. The function F is W2 σ (W1 × x ). σ represents
the activation function ReLU. Ws × x is a linear projection operation, which is executed in
the shortcut connection and element units, and sets the dimension to the next input with
kernel size 1. yb becomes the enhanced input x of the next y f with the residual feedback
operation. In the Equation (3), yb , y f is the output of the residual propagation. Function s
is a sigmoid activation function, and function G sets the dimension of y f to the following
inputs by linear projection. This method enables it to reduce the gradient descent problems
and amplify the differences of the image’s intrinsic properties between the normal regions
and the forgery regions in the forgery image.
Although the validity of the ringed residual architecture in RRU-Net was proven, the
detection accuracy was limited to about 80% when only RGB streams were used to detect
forgery regions. Therefore, in DRRU-Net, in this paper, the weights of the generation are
saved before the verification scores increase and then decrease in the training process of
RRU-Net, and a transfer learning model that trains only the blocks of the down-sampling
path and up-sampling path, which is connected to JALM, is designed. By doing so, the
performance could be improved.
CAT-Net learns without losing the distribution and spatial characteristics of DCT
coefficients for the first time. The RGB stream passes through HRNet and learns image
acquisition artifacts from the images such as sensor pattern noise, EXIF metadata, and visual
content. DCT coefficients are collected from the Y channel representing the luminance
of the YCbCr color space and applied as non-overlapping 8 × 8 blocks. According to
that paper, the reason for considering the Y channel is that it is more useful for finding
forensic clues in image forensics. The DCT stream is used as the input to the JPEG artifact
learning module. In the JPEG artifact learning module, 1-channel DCT coefficients are
converted into 21-channel volume representations to learn the distribution. Then, after the
8 × 8 dilated convolution operation and the 1 × 1 convolution operation, the quantization
table, which is repeatedly arranged to have the same resolution as the output, is multiplied
element by element. This output is combined with the output before quantization.
The RGB stream of CAT-Net maintains a high-resolution representation and is trained
by HRNet. The JPEG artifact learning module method based on the architecture of HRNet
maintains the same resolution as the RGB stream learning method to output a feature map.
Down sampling is performed with stride-2 convolution instead of pooling.
CAT-Net requires datasets for pre-training, and has the disadvantage of separately
building a network for pre-training JALM. In addition, since the performance variation is
wide depending on whether or not the pre-training is performed, the pre-training is needed.
This may reduce the flexibility of the forgery detection networks. Therefore, by applying
leaky ReLU in DRRU-Net, valid clues for detecting a forgery which are commonly used in
the DCT domain can be found, allowing us to show a good generalization performance on
other datasets even though training only a few datasets.
Appl.Sci.
Appl. Sci.2022,
2022,12,
12,xxFOR
FORPEER
PEERREVIEW
REVIEW 55 ofof 17
17

Appl. Sci. 2023, 13, 2922 commonlyused


usedin
inthe
theDCT
DCTdomain
domaincancanbe
befound,
found,allowing
allowingus
usto
toshow
showaagood 5 of 14
goodgenerali-
generali-
commonly
zationperformance
zation performanceononother
otherdatasets
datasetseven
eventhough
thoughtraining
trainingonly
onlyaafew
fewdatasets.
datasets.

3.3.DCT-Coefficient-Based
DCT-Coefficient-BasedRRU-Net
DCT-Coefficient-Based RRU-Net
RRU-Net
Inthis
In
In thischapter,
this chapter,the
chapter, thearchitecture
the architectureof
architecture ofDRRU-Net
of DRRU-Netcombined
DRRU-Net combinedwith
combined withJALM
with JALMwith
JALM withRRU-Net
with RRU-Net
RRU-Net
asthe
as
as thebackbone
the backboneisis described.
isdescribed. DRRU-Net
described.DRRU-Net
DRRU-Net modified
modified
modified some
some ofthe
some
of the ringed
of ringed
the residual
ringed blocks
residual
residual con-
blocks
blocks con-
nected
connected to the
to learning
the module
learning to
module efficiently
to learn
efficiently JALM
learn JALMwithout
withoutpre-training.
nected to the learning module to efficiently learn JALM without pre-training.pre-training.

3.1. DRRU-Net
3.1.DRRU-Net Architecture
DRRU-NetArchitecture
Architecture
3.1.
DRRU-Net
DRRU-Netis an architecture
architecturethat
thatcombines
combinesRRU-Net
RRU-Netfor forlearning
learningthe visual contents
DRRU-Net isisan
anarchitecture that combines RRU-Net for learning thevisual
the visualcontents
contents
of
ofimages
images and
and image
image acquisition
acquisition artifacts,
artifacts, and
and JALM
JALM for
forlearning
learning compression
compression artifacts
artifactsin
in
of images and image acquisition artifacts, and JALM for learning compression artifacts in
the
the DCT
DCT domain.
domain. Since
Since the
the down-sampling
down-sampling path
path isisthe
the same
same asasRRU-Net,
RRU-Net, the
the weights
weights of
of
the DCT domain. Since the down-sampling path is the same as RRU-Net, the weights of
the down-sampling
thedown-sampling
down-samplingpath path
pathofof DRRU-Net
ofDRRU-Net
DRRU-Netcan can
canbebe initialized
beinitialized
initializedafterafter pre-training
afterpre-training
pre-trainingwithwith RRU-Net.
withRRU-Net.
RRU-Net.
the
In the
Inthe DRRU-Net
theDRRU-Net
DRRU-Netof of Figure
of Figure 2,thetheinput
inputconsists
consistsofof two streams: the RGB stream
In Figure 2,2,the input consists two
of two streams:
streams: theRGB
the RGB stream
stream and
and
and
the the
DCT DCT stream.
stream. The The
RGB RGB stream
stream searches
searches for for
visual visual contents
contents and and
image image acquisition
acquisition arti-
the DCT stream. The RGB stream searches for visual contents and image acquisition arti-
artifacts through
factsthrough
through the down-sampling
thedown-sampling
down-sampling pathpath of RRU-Net.
ofRRU-Net.
RRU-Net. TheDCTThe DCT
DCT stream stream is the of
input of
facts the path of The stream isisthe
theinput
input JALM.
of JALM.
JALM. The architecture
Thearchitecture
architectureand and
andprocedure procedure
procedureof ofJALM of
JALMare JALM
areshown are
shownin shown
inFigurein Figure
Figure33[23].[23].3 [23].
The

Figure
Figure2.
Figure DRRU-Net
2.2.DRRU-Net Architecture.
DRRU-NetArchitecture.
Architecture.

Figure3.3.JPEG
Figure JPEG artifactlearning
learning modulearchitecture.
architecture.
Figure 3. JPEG artifact
artifact learning module
module architecture.
Theoutput
The
The outputof
output ofeach
of eachdown-sampling
each down-samplingblock
down-sampling blockof
block ofthe
of thelearning
the learningmodule
learning moduleis
module isisup-sampled
up-sampledand
up-sampled and
and
added to the input of
added to the input of the the up-sampling
the up-sampling block. The
up-sampling block. The output
The output passing
output passing through
passing through
through the the down-
the down-
down-
sampling block of the RGB stream is subjected to identity mapping through a skip con-
nection to the input of the corresponding up-sampling block. The out-convolution block
Appl. Sci. 2023, 13, 2922 6 of 14

outputs one channel with the same resolution as the preprocessed image and calculates the
probability of the forgery regions in pixel units.

3.2. DRRU-Net Algorithm


The down-sampling block and up-sampling block of DRRU-Net are the same as RRU-
Net. When the outputs of JALM are connected to the first and second up-sampling blocks,
they are combined with the previous inputs. In this process, it is added to the input by
adjusting the resolution in the same way as the up-sampling of the last down-sampling
output. The down-sampling block connected to JALM is implemented with the equations
(Equations (1)–(3)) of RRU-Net shown in Section 2. However, this method is researched
for searching visual contents and image acquisition artifacts, so it requires the verification
of whether it is appropriate in the DCT domain. In this study, an activation function was
designed to find a stable learning method using JALM as a ringed residual block. The
function s in Equation (3) is sigmoid, the function F in Equation (2) is W2 σ (W1 × x ), and
σ is ReLU. In the ringed residual architecture, the sigmoid function is used to amplify
the difference in image intrinsic properties, but it is necessary to verify whether it is as
an appropriate activation function in the DCT domain. Therefore, they were compared
by using leaky ReLU. Leaky ReLU has a slower computational speed than ReLU, but it
reflects negative output and is considered suitable for learning the expressions inherent
in DCT streams. In order to maintain negative output results, the functions s and σ of the
ringed residual architecture of the down-sampling block connected to JALM are redefined
as Function l, which means leaky ReLU. The modified equations are as follows.

x, if x ≥ 0
l= (4)
x × 0.01, otherwise

F = W2 l (W1 × x ) (5)

y f = F ( x, Wi ) + Ws × x (6)

y b = ( l ( G ( y f ) + 1) × x (7)
The JALM down-sampling blocks of DRRU-Net implemented with the above equa-
tions construct a ringed residual architecture in the same order as RRU-Net.

3.3. Differences between RRU-Net and DRRU-Net


RRU-Net finds forensic clues for the detection of splicing forgery by searching for
visual artifacts. In addition, DRRU-Net can search for the compression artifacts and finds
additional forensic clues about forgery during compression. However, in DRRU-Net, the
horizontal and vertical lengths of the input resolution must be fixed in multiples of eight
to obtain the DCT coefficient. Even though images of different resolutions are used for
learning, the nature of CNN requires fixed-resolution images for the input to the network.
RRU-Net used he CASIA [27] and COLUMB [29] datasets for training and evaluation. To
increase the learning rate for that dataset, the resolution of the RRU-Net input is fixed
at 384 × 256. This is the average image resolution of CAISA and is not appropriate for
evaluating other high-resolution datasets. In the future, for the evaluation of more different
datasets, the input resolution of DRRU-Net is 512 × 512. For the fixed input of DRRU-Net,
the grid-aligned cropping scheme in [23] is used. The authors of RRU-Net used Resize
and Crop as the image preprocessing for the input, but Resize can damage the intrinsic
properties of an image in the process of reducing or increasing the image resolution. In this
experiment, the grid-aligned cropping method is also used for the RGB stream input of
RRU-Net in order to learn while keeping the intrinsic properties of the image.
Appl. Sci. 2023, 13, 2922 7 of 14

4. Experiments and Results


Two experiments are conducted for the performance evaluation of DRRU-Net. First,
the down-sampling architectures for learning RGB streams of DRRU-Net and RRU-Net
are the same. Therefore, DRRU-Net can perform transfer learning by learning the RGB
stream from RRU-Net. In the training of RRU-Net, it saves the weight of the generation
before the verification score increases and then decreases. After that, DRRU-Net initializes
with the weights of the blocks corresponding to the down-sampling path of the weights
and fixes them. Afterwards, the semantic segmentation performances are compared with
RRU-Net and with DRRU-Net without transfer learning, by training only the blocks of the
down-sampling path and up-sampling path connected to the JPEG learning module. This
work enables the effect of pre-training in the RGB domain to be explained.
In the second experiment, we compared the performance when leaky ReLU is used or
not used for efficient learning of DRRU-Net in a randomly initialized state. The performance
is compared with CAT-Net learned under the same conditions.

4.1. Experimental Environment


The GPU used in the experiment is NVIDIA GTX 1660, and DRRU-Net is implemented
with Pytorch. All models used for performance comparison learn 10,000 images of DE-
FACTO’s Splicing set [30]; 10,000 images of DEFACTO’s Splicing set that are not used as
training and validation, 10,000 sets of DEFACTO’s Copy-Move, and 3000 forgery image
sets of the CASIAv2 dataset [31] are used as the test dataset for evaluation.
DEFACTO’s datasets only contain images in TIFF format. CASIAv2 saves images in
TIFF and PNG formats, not JPEG files. To collect compression artifacts on the deep learning
network, JPEG compression is performed with quality of 100 for non-JPEG format images
of the training and test datasets.
Normalization of the image is required for the RGB stream input to pass through
the network. The normalization method used for image X in [13] is X/255, and in [14], it
is (X − 127.5)/127.5, with the image pixel values of 0~1 and −1~1, respectively. In this
experiment, we tried to compare them by normalizing in the method of [5], but it was
confirmed that training was not performed at all when the RRU-Net was normalized in
the method of [14]. Therefore, RRU-Net normalizes in the existing [4] method. DRRU-Net
normalizes in the method of [14] and keeps the input.
As in the experiment in [13], the loss function of DRRU-Net uses binary cross entropy
to obtain the loss of images in pixel units. The optimizer uses Adam [32] with a learning
rate of 0.001. The batch size is 1, and 764 images that are not used in training are used
as validation data, so that the verification score does not increase during the training
loop. The training is stopped after 6 epochs, and the epoch’s models with the highest
verification score are saved and used for evaluation. The verification score is calculated
with Dice coefficient. Dice coefficient is a set operation that calculates the degree of overlap
between the correct answer and the prediction region. The equation can be calculated with
a confusion matrix, but it can be obtained without a confusion matrix by using the library
of Pytorch and can be used as a loss function because it is differentiable.

2(| A| ∩ | B|) 2TP


Dice = = (8)
| A| + | B| 2TP + FP + FN

Dice coefficient and F1-score have different meanings but can be obtained with the
same equation. Since the task of detecting image forgery regions can be binary-classified
into forgery (P) and normal (N) per pixel, accuracy can be calculated in pixel units. However,
since the ratio of the number of pixels in the forgery regions to the number of pixels in the
normal regions in each image of the dataset is not fixed and unbalanced, the performance
of forgery pixel detection may not be properly evaluated only with that accuracy. For
example, on the premise that only 30 pixels are forged in an image of 100 × 100 resolution,
if the training is not done properly and all the pixels in all images are predicted as negative
(normal), the accuracy would be 0.997, which seems like an excellent model. Therefore, in
Appl. Sci. 2023, 13, 2922 8 of 14

order to evaluate reliable classification in the unbalanced distribution of positive pixels and
negative pixels, the F1-score obtained from the confusion matrix is used, denoted as Dice.

TP + TN
DACC p = (9)
TP + TN + FP + FN

TP
PRE p = (10)
TP + FP
TP
DREC p = (11)
TP + FN

2 PRE p × REC p 2TP
F1 p = = = Dice (12)
PRE p + REC p 2TP + FP + FN
In Equation (9), the subscript p means that the equations are applied based on pixels.
TP, TN, FP, and FN correspond to the correct detection of forgery pixels, the correct
detection of normal pixels, the false detection of normal pixels as forgery pixels, and no
detection of forgery pixels, respectively. The Dice coefficients obtained from each image are
added together and the average value is taken. All results are rounded off from the fourth
digit and used up to the third digit.

4.2. RRU-Net-Transfer-Learning DRRU-Net


The pre-training method of DRRU-Net, which is different from the JPEG learning
module pre-training method of [14], is described, and the performance difference between
RRU-Net and DRRU-Net without pre-training is verified. In [14], the JPEG learning module
is pre-trained by using the DCT stream. This is because pre-training of RGB streams in the
forgery domain of HRNet is not meaningful for forgery region detection.
As in HRNet, the combination of a typical CNN and a JPEG learning module detects
forgery depending on the DCT domain. When RRU-Net, which can detect differences in
properties of forgery regions in image acquisition artifacts using RGB streams, is combined
with JPEG learning module, the possibility that the weights are biased and optimized
depending on the DCT domain is a concern. If the weights of DRRU-Net are trained at
once, learning is done to rely on the DCT stream, which can relatively easily find clues of
the forgery regions. To prevent this, the RRU-Net that learns RGB streams is trained, and
training at epochs where the verification scores no longer increase is stopped. The weight
of the down-sampling section of this RRU-Net is transfer-learned to the DRRU-Net. The
corresponding weights are fixed without being updated through training, and in DRRU-
Net, only the layers of the up-sampling section and of the down-sampling path connected
to JALM are learned. This operation enables meaningful features to be extracted from RGB
streams, and since JALM does not fix the weights of the included layers, it can show higher
feature extraction performance than when the weights are fixed.
In this experiment, RRU-Net, DRRU-Net, and DRRU-Net TL are compared. The time
for 1 epoch of RRU-Net is 7335 s on average, including training and verification tasks.
After training progressed up to 19 epochs, the verification scores do not increase during
6 epochs. The final verification score is 0.681, and the final training loss is 0.308. The time
for 1 epoch of DRRU-Net is 9780 s, and the verification scores do not increase for 6 epochs
since 6 epochs. The final verification score is 0.903, and the final training loss is 0.090.
Compared to RRU-Net, the time required per epoch is longer, but training is terminated in
fewer epochs with lower training loss and higher verification scores. DRRU-Net TL is a
model in which the weights of the down-sampling blocks of the trained RRU-Net above are
transfer-learned and fixed. The average time for 1 epoch is 8997 s. The verification scores
do not increase further from 0.901 for 6 epochs since 6 epochs, and the final training loss
is 0.090. Although the verification scores and losses of DRRU-Net and TL models trained
from scratch are similar, in testing, DRRU-Net TL showed the highest detection rate and
the generalization performance for different datasets throughout all experiments.
Appl. Sci. 2023, 13, 2922 9 of 14

The Dice coefficient of RRU-Net is 0.306 in DEFACTO Splicing set, 0.062 in DEFACTO
Copy-Move, and 0.081 in CASIAv2. Pixel-by-pixel accuracies are 0.988, 0.963, and 0.914 for
each dataset. The reason for the lower score than expected is the difference in training and
evaluation methods in [13]. In the experiment of [13], which used CASIA [18] for training
and evaluation, the training datasets were increased with the compression and random
reversal techniques to better train the RRU-Net. CASIA and COLUMB contain originals of
forgery images. Since the images for training were randomly selected, it is highly likely
that the originals of forgery images were used for training or evaluation. In addition, it is
difficult to consider it as a clear performance indicator for Splicing detection because more
JPEG compression and noise-added images randomly generated using that dataset are
used than typical Splicing forgery images of the test dataset. The reason for high accuracy
in pixel units of RRU-Net is that RRU-Net classifies most of the pixels in the image as
normal because it did not find the forgery regions in a situation where most regions of the
image are not forgery regions. This can be seen from the Dice value in Table 1.

Table 1. Comparison of splicing and copy-move forgery detection performance in pixel units.

DEFACTO DEFACTO
Type CASIAv2
Splicing Copy-Move
Dice 0.306 0.062 0.081
RRU-Net
Acc 0.988 0.963 0.914
Dice 0.830 0.652 0.547
DRRU-Net
Acc 0.997 0.984 0.864
Dice 0.830 0.716 0.830
DRRU-Net TL
Acc 0.997 0.992 0.967

When the proposed DRRU-Net is trained with weights initialized randomly, the Dice
coefficient is 0.830 in the DEFACTO Splicing set, 0.652 in the DEFACTO Copy-Move set,
and 0.547 in CASIAv2. The accuracies are 0.997, 0.984, and 0.864 for each dataset, which is
better than the results of RRU-Net, but not satisfactory.
Table 2 shows the results predicted with the RRU-Net, DRRU-Net, and DRRU-Net
TL models by taking one Copy-Move forgery image from each set of CASIAv2 (up) and
DEFACTO Copy-Move (down). As shown in Figure 2, RRU-Net fails to accurately identify
forgery regions in most images of the test dataset. DRRU-Net shows unstable predictions in
CASIAv2, which has undergone different processing from the training dataset. It predicts
the forgery regions, but its prediction includes normal regions very often. Compared to
other networks, DRRU-Net TL detects forgery regions with relatively high accuracy.
In this experiment, RRU-Net, which finds clues that can only be found in RGB streams,
showed poor performance, but DRRU-Net TL, which transfer-learned the weights of the
poorly performing RRU-Net down-sampling section, showed the highest performance
among the compared networks. Compared to DRRU-Net TL, DRRU-Net can be seen as
having optimized weights to search mainly for DCT streams rather than RGB streams, as
with CAT-Net. Therefore, DRRU-Net TL, which transfer-learns RRU-Net, can be interpreted
as relatively balanced search in RGB and DCT streams without depending on DCT streams
as in DRRU-Net.

4.3. DRRU-Net Using Leaky ReLU


The performance of DRRU-Net LR to which the enhanced learning method of the
proposed JPEG learning module was applied was evaluated, and the performance was
compared by learning the same dataset as CAT-Net with randomly initialized weights.
CAT-Net took 6590 s for 1 epoch, which was shorter than RRU-Net. The verification
score did not increase during 6 epochs after 13 epochs, the final verification score was
0.927, and the final training loss was 0.010. CAT-Net had the highest verification score,
and training was terminated when its training loss was about 9 times lower than that
of DRRU-Net.
xAppl.
EWFORSci.
FOR PEER
PEER REVIEW
2022, 12,
Sci.xSci.
REVIEW
Appl.
Appl. FOR
2023, PEER
2022,
13, 12,REVIEW
2922x FOR PEER REVIEW 10 of1014of 17

Tableof2.
2.the
Comparison of2.
Table Table
the 2. Comparison
segmentation
Comparison of the of the segmentation
performance of RRU-Net,
segmentation RRU-Net, performance
DRRU-Net,
performance of RRU-Net,
and
of RRU-Net, DRRU-NetDRRU-Net,
DRRU-Net,TL.
and and DRRU-Net
DRRU-Net TL.
Table Table
arison2.ofComparison
the segmentation Comparison of
segmentation
performance the
of segmentation
performance performance
RRU-Net,
Table 2.DRRU-Net,
RRU-Net, Comparison of
and of
DRRU-Net,
the
DRRU-Net TL.DRRU-Net,
and DRRU-Net
segmentation and
TL.
performance DRRU-Net
of TL.
RRU-Net, DRRU-Net, andTL.
DRRU-Net TL.

Method
Method TypeMethod Type Type Type
Method
Method Type Method Type
Method Type
Input
Input Input InputInput Ground Truth RRU-Net DRRU-Net DRRU-Net TL
Ground Truth Ground Truth
Ground TruthGround Truth GroundRRU-Net
RRU-Net Truth RRU-Net
RRU-Net
Ground Truth DRRU-Net RRU-Net
DRRU-Net DRRU-Net
DRRU-Net
RRU-Net DRRU-Net DRRU-Net
TL TLD
DRRU-NetDRR

CASIAv2 CASIAv2
CASIAv2

DEFACTODEFACTO
DEFACTO

DRRU-Net LR took 9785 s for 1 epoch, showing a training speed similar to that of
DRRU-Net in Experiment 4.1. The verification score was 0.900, and the training loss
was 0.073. When trained under the same conditions without pre-training, DRRU-Net LR
showed higher performance than CAT-Net as shown in Table 3.

Table 3. Comparison among CAT-Net, DRRU-Net, and DRRU-Net LR without transfer learning.

DEFACTO DEFACTO
Type CASIAv2
Splicing Copy-Move
Dice 0.802 0.657 0.586
CAT-Net
Acc 0.997 0.991 0.832
Dice 0.830 0.652 0.547
DRRU-Net
Acc 0.997 0.984 0.864
Dice 0.840 0.666 0.693
DRRU-Net LR
Acc 0.997 0.993 0.937

CAT-Net learns both high and low resolution. The benefit of CAT-Net is that it is not
only advantageous for pixel-by-pixel detection, but also predicts to make the boundary
of the detection areas smoother. However, it sometimes identifies normal regions as
forgery regions and the generalization performance of predicting forgery regions was lower
than DRRU-Net LR. In Copy-Move, which is not a trained forgery domain, other forgery
techniques including splicing and copy-move also exist, and the Dice coefficient gradually
decreased in CASIAv2, which has a different forgery processing process. On the other hand,
DRRU-Net LR had a higher Dice coefficient in CAISAv2 than in DEFACTO Copy-Move.
DRRU-Net LR using leaky ReLU can find valid clues for the detecting a forgery commonly
used in the DCT domain, resulting in excellent generalization performance in other datasets
even though it has learned fewer datasets.
When training from the beginning based on Experiment 4.2, the JPEG learning module
is configured as a ringed residual block connected to the learning module by using leaky
ReLU. Therefore, the method of strengthening learning in the DCT domain is confirmed to
be effective in improving performance in other datasets.
Table 4 is a picture comparing the images of CASIAv2 of DRRU-Net LR with those
of other networks based on normal prediction. These three images are splicing forgeries.
CAT-Net sometimes displays the possibility of forgery even in the normal regions and does
not properly display probability of forgery in the forgery regions. DRRU-Net recorded a
value similar to the average of Dice coefficient as with CAT-Net, but the false detection rate
Appl. Sci. 2023, 13, 2922 11 of 14

22,
EW12,
22,
PEER
IEW 12, xREVIEW
Appl.x FOR
Sci. PEER
FOR 2022, REVIEW
PEER12, x FOR PEER REVIEW
REVIEW
13
for pixels was high in those pictures. DRRU-Net LR showed accurate predictions in all
13 of 17
of
13 of 17
17
pictures in Table 4.

Table 4. Table 4. Comparison of CAT-Net, DRRU-Net, and DRRU-Net LR forgery regions detection results.
.Table 4. Comparison
Comparison
Table of
4. Comparison
Table 4. Comparison
CAT-Net, Table
Comparison 4.of
DRRU-Net,
of DRRU-Net,
CAT-Net, CAT-Net,
LR DRRU-Net,
ofComparison
and DRRU-Netof CAT-Net,
DRRU-Net
DRRU-Net,
CAT-Net, LR
DRRU-Net,
and and
forgery
DRRU-Net
and DRRU-Net
DRRU-Net,
regions
LR
DRRU-Net LR
and forgery
LRDRRU-Net
detection
forgery regions regions
results.
forgery
detection
regions detection
LR results.
forgery
detection results.
regions detection
results. results.
..Table 4.
Comparison
Comparison
Comparison of of CAT-Net,
of
of CAT-Net, CAT-Net,
DRRU-Net,
CAT-Net, DRRU-Net,
DRRU-Net, and
and
and DRRU-Net
and DRRU-Net
DRRU-Net LR
LR
forgery
LR forgery
forgery
regions
forgery regions
regions
detection
regions detection
detection
results.
detection results.
results.
results.
REVIEW
OR
22, pl. 12,
Sci.x2022,
PEER FOR
Appl. 12,
REVIEW
PEER
Sci.x2022,
FOR PEER
REVIEW
12, REVIEW
x FOR PEER REVIEW Method
Method Type Type Type
Method Method Type Method Type 13
ut Input Input Method Type
Method Type
Method Type
Input Ground Truth Method Type
CAT-Net DRRU-Net DRRU-Net LR
GroundGround Ground Truth
Truth Truth GroundCAT-Net Ground Truth
Ground Truth
Ground TruthTruth CAT-Net CAT-Net CAT-Net
CAT-Net TruthCAT-Net CAT-Net DRRU-Net
CAT-Net
Ground DRRU-NetDRRU-Net
DRRU-Net
DRRU-Net DRRU
DRRU-NetDRRU-Net
LR LR CAT-Net DRRU-NetDRRU-Net DRRU-Net DRRU-Net LR
DRRU
Ground
Ground Truth
Truth CAT-Net DRRU-Net
DRRU-Net DRRU-Net
DRRU-Net LR
LR
W 13 of 17

e 4. Table 4. Table
Comparison
Comparison of 4. of 4.
Table
CAT-Net, CAT-Net,
Table
Comparison 4.DRRU-Net,
Comparison
of CAT-Net,
Comparison
DRRU-Net, of 4.
and
Table and
CAT-Net,
DRRU-NetDRRU-Net
of CAT-Net,
DRRU-Net, and
DRRU-Net,
LR
Comparison LR
forgery
of forgery
DRRU-Net,
DRRU-Net
and regions
LRand
DRRU-Net
regions
CAT-Net, LR
detection
DRRU-Net, detection
DRRU-Net
forgery LR
regions
forgery
results.
and results.
forgery
detection
regions
DRRU-Net regions
forgerydetection
results.
LRdetection regions results. results.
results. detection

MethodMethod Type Method


Type Method Type Method Type Type
Type Method
ble Input Input
put4. Comparison Input
of CAT-Net, DRRU-Net, and DRRU-Net LR forgery regions detection results.
GroundGround Truth Ground
Truth Ground Truth Ground
Truth CAT-Net
Truth
CAT-Net
Ground CAT-Net
Truth CAT-Net
CAT-Net DRRU-Net
DRRU-Net
CAT-NetDRRU-Net DRRU-Net
DRRU-Net
DRRU-Net
DRRU-Net LR DRR
DRRU-Net
Method Type
Ground Truth CAT-Net DRRU-Net DRRU-Net LR

...Table
Comparison of
Comparison of the prediction
of the
5. Comparison
Comparison
Comparison the
prediction performance
of
performance of
the prediction
prediction
prediction
of forgery
forgery regions
performance
regions among
ofregions
forgery
among all
all networks.
regions
networks.
among all networks.
networks.
Table 5. Table Table
5. Comparison
of the
Table 5.performance
5. Comparison
of the of
offorgery
the
prediction
performance
Table
Comparison 5. prediction
theperformance
of
ofComparison forgery
prediction among allamong
performance
regions
of the
performancenetworks.
ofprediction
forgery of forgery
regions
all regions
among
performance
of forgery among
allforgery
of
regions all
all networks.
networks.
amongregions among all networks.
networks.
Table 5 shows the results
Method Type of comparing the prediction performance of the splicing
Method
Method Type
Methodof
Type Type
nd Truth RRU-Net forgery regions
DRRU-Net for all Method
networks Type
Method
RRU-Net,
DRRU-Net Method
LR Type Type
DRRU-Net, DRRU-Net
Method
CAT-Net TypeLR, CAT-Net,
DRRU-Netand
TL
Input
nd
nd Truth
Truth
Ground Input RRU-Net
Truth RRU-Net
RRU-Net DRRU-Net
DRRU-Net DRRU-Net DRRU-Net
DRRU-Net LR
LR
DRRU-Net LR CAT-Net
CAT-Net CAT-Net DRRU-Net
DRRU-Net TL
TL
DRRU-Net
Ground Truth GroundRRU-Net
Ground Truth Truth DRRU-Net
Ground
RRU-Net TL.DRRU-Net
TruthRRU-NetIn orderRRU-Net
from the top,
DRRU-Net 4 images DRRU-Net
DRRU-Net
DRRU-Net of CASIAv2,
LR
DRRU-Net LR2 images
DRRU-Net of
LR
CAT-Net DEFACTO
DRRU-Net Copy-
LR
CAT-Net CAT-Net
DRRU-Net
D
Move, and 2 images of DEFACTO Splicing were used.

e 5. Table 5. Table
Comparison
Comparison of 5.
the of 5.
Table theComparison
prediction
Table
Comparison
prediction performance
5. Table
of theComparison
prediction
of 5.
performance theof ofregions
of the forgery
prediction
performance
prediction
forgery
Comparison
5. Comparison regions
performance
of the
of theall ofamong
performance
ofamong
forgery
allregions
prediction all
forgeryof
networks.
prediction networks.
forgery
among all
regions
performance regions
ofamong allamong
networks.
forgery all
networks.
regions networks.
among all networks.
Table
ble 5. Comparison of the prediction performance of forgery regions among networks.performance of forgery regions among all networks.
Method
Method Method
Type
Type Type Method
Method
Method Type
Type Method Type Type
Type Method
Input Input Input
Input
oundGround
Truth
Ground Truth Truth
Ground Truth
Ground
RRU-Net RRU-Net
Ground
Truth
Ground
RRU-Net Truth
RRU-Net
Ground
Truth RRU-NetDRRU-Net
RRU-Net
DRRU-Net
Truth
RRU-Net
DRRU-Net DRRU-Net
DRRU-Net
RRU-Net
DRRU-Net
DRRU-Net LR DRRU-Net
DRRU-NetDRRU-Net
LR
DRRU-Net LR
DRRU-Net
DRRU-Net
LR CAT-Net
CAT-Net CAT-Net
CAT-Net DRRU-Net
LR
DRRU-Net CAT-Net
DRRU-Net
DRRU-Net TL DRRU
LR DRRU-Net
LR DRRU-Net CAT
CAT-Net
LR
TL TL

ppl.
22, Sci.xx2022,
pl. 12,
REVIEW
OR
22, PEER
12,
Sci.
REVIEW 2022,
FOR
Appl. 12,
PEER
Sci.
REVIEW
FOR
Appl. xx2022,
PEER
12,
Sci. FOR PEER
REVIEW
2022, 12,
REVIEW
FOR PEER
12, REVIEW
xx FOR
FOR PEER REVIEW
REVIEW
PEER REVIEW 14 of 17 14
2022, 12, x FOR PEER REVIEW
W 14 of 17
Appl. Sci. 2023, 13, 2922 12 of 14

Table 5. Cont.

Method Type
Input
Ground Truth RRU-Net DRRU-Net DRRU-Net LR CAT-Net DRRU-Net TL

REVIEW
OR
22,
OR
ppl. PEER
12,
REVIEW
PEER
Sci. REVIEW
x2022,
FOR
Appl.
Appl. PEER
REVIEW
12, REVIEW
Sci.x2022,
Sci. FOR
2022, PEER
12,
12, REVIEW
xx FOR
FOR PEER
PEER REVIEW
REVIEW 15
W 15 of
15 of 17
17

om the top, 4 pieces of CASIAv2, 2 pieces of DEFACTO copy-move, and 2 pieces of DEFACTO splicing images.
m the
m theFrom
top,
top, 44the
From pieces
the top,ofof
top,
From
From
pieces CASIAv2,
44the
pieces
the top,of
From
pieces
top, of
CASIAv2, 2CASIAv2,
44the pieces
pieces
2top,
pieces
From of
of
4the
CASIAv2,
of
pieces DEFACTO
22top,
pieces
CASIAv2,
pieces
CASIAv2,
From
From
of of
pieces
4the of
pieces
the
DEFACTO copy-move,
DEFACTO
22top,
pieces
CASIAv2,
of pieces
DEFACTO
top, of of and
DEFACTO
2 pieces
of DEFACTO
CASIAv2,
444pieces
pieces
copy-move, of
of 2and
2pieces
copy-move,
of ofand
DEFACTO
copy-move, 22DEFACTO
copy-move,
2copy-move,
pieces
CASIAv2,
CASIAv2, and
of
pieces and
pieces
DEFACTO
pieces
pieces
of and
of splicing
22copy-move,
pieces of
of
of DEFACTO
DEFACTO
DEFACTO images.
DEFACTO
22copy-move,
pieces
and
pieces of splicing
DEFACTO
2copy-move,
DEFACTO
of pieces
DEFACTO
and of
copy-move,
splicing images. images.
splicing
DEFACTO
splicing images.
splicing
2 2pieces
and
and of images.
splicing
images.
DEFACTO
22 pieces
pieces of images.
splicing
of DEFACTO
DEFACTO images.
splicing
splicing imag
imag
From the top, pieces of CASIAv2, 2 pieces of DEFACTO copy-move, and pieces of DEFACTO splicing images.

The combination of the results of Experiments 4.1 and 4.2 shows that RRU-Net has
difficulty in detecting forgery regions only by learning RGB streams. DRRU-Net shows
better results but has a problem with frequent false detection. DRRU-Net TL, trained with
a relatively high dependency on RGB streams, can compensate for the problem of frequent
false detection. However, the comparison between the transfer learning model and the
model with randomly initialized weights is not considered to be appropriate. Therefore,
for comparison based on the same conditions, increasing the learning rate of DRRU-Net’s
JALM with randomly initialized weights was attempted. DRRU-Net LR showed higher
performance than other networks under the same conditions by tuning ringed residual
blocks connected to JALM.

5. Conclusions
In this paper, DRRU-Net for image-splicing forgery detection is proposed. DRRU-Net
can be learned with an easy and simple pre-training method by effectively applying CAT-
Net’s JALM to RRU-Net’s ringed residual architecture. Experiment 4.1 demonstrates that it
is possible to effectively improve the performance while using the same dataset image by
simply pre-training the RGB stream, without using the JALM pre-training method, which
requires using a different dataset presented in [5]. In further studies, RRU-Net can be used
as a module to improve the performance of DRRU-Net, which has been expanded in the
form of learning RGB streams.
Appl. Sci. 2023, 13, 2922 13 of 14

Compared to CAT-Net and RRU-Net for detecting forgery regions, DRRU-Net showed
the highest generalization performance for datasets of other forgery domains (Copy-Move)
when trained under the same conditions without pre-training. It also showed the best
performance in CAISAv2, which included forgery domains different from the training
dataset and had a different forgery process. In this paper, enough datasets were not trained,
as shown in the experiment in [5]. However, if training is done with a large number
of datasets, DRRU-Net is expected to show the best performance among image-splicing
forgery region segmentation models.

Author Contributions: Conceptualization, Y.S. and J.K.; methodology, Y.S. and J.K.; software, Y.S.;
validation, Y.S. and J.K.; formal analysis, Y.S.; investigation, Y.S.; resources, Y.S.; data curation, Y.S. and
J.K.; writing—original draft preparation, Y.S. and J.K.; writing—review and editing, J.K.; visualization,
Y.S.; supervision, J.K.; project administration, J.K.; funding acquisition, J.K. All authors have read and
agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: This research was funded by a 2021 research grant from Sangmyung University.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Sacchi, D.L.M.; Agnoli, F.; Loftus, E.F. Changing history: Doctored photographs affect memory for past public events. Appl. Cogn.
Psychol. 2007, 21, 1005–1022. Available online: https://api.istex.fr/ark:/67375/WNG-HHX7ZQ5M-K/fulltext.pdf (accessed on
20 January 2023). [CrossRef]
2. Mishra, M.; Adhikary, F. Digital Image Tamper Detection Techniques-A Comprehensive Study. arXiv 2013, arXiv:1306.6737.
3. Bharti, C.N.; Tandel, P. A Survey of Image Forgery Detection Techniques; IEEE: Piscataway, NJ, USA, 2016; pp. 877–881.
4. Nathalie Diane, W.N.; Xingming, S.; Moise, F.K. A Survey of Partition-Based Techniques for Copy-Move Forgery Detection. Sci.
World J. 2014, 2014, 975456. [CrossRef] [PubMed]
5. Ansari, M.D.; Ghrera, S.P.; Tyagi, V. Pixel-Based Image Forgery Detection: A Review. IETE J. Educ. 2014, 55, 40–46. [CrossRef]
6. Birajdar, G.K.; Mankar, V.H. Digital image forgery detection using passive techniques: A survey. Digit. Investig. 2013, 10, 226–245.
[CrossRef]
7. Qazi, T.; Hayat, K.; Khan, S.U.; Madani, S.A.; Khan, I.A.; Kołodziej, J.; Li, H.; Lin, W.; Yow, K.C.; Xu, C. Survey on blind image
forgery detection. IET Image Process. 2013, 7, 660–670. [CrossRef]
8. Liu, T.; Wang, J.; Yang, B.; Wang, X. NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation
and on-task behavior understanding in the classroom. Neurocomputing 2021, 436, 210–220. [CrossRef]
9. Liu, H.; Nie, H.; Zhang, Z.; Li, Y. Anisotropic angle distribution learning for head pose estimation and attention understanding in
human-computer interaction. Neurocomputing 2021, 433, 310–322. [CrossRef]
10. Liu, H.; Liu, T.; Chen, Y.; Zhang, Z.; Li, Y. EHPE: Skeleton Cues-based Gaussian Coordinate Encoding for Efficient Human Pose
Estimation. IEEE Trans. Multimed. 2022, 2, 1–12. [CrossRef]
11. Shi, Y.; Chen, C.; Chen, W. A Natural Image Model Approach to Splicing Detection; ACM: New York, NY, USA, 2007; pp. 51–62.
12. He, Z.; Lu, W.; Sun, W.; Huang, J. Digital image splicing detection based on Markov features in DCT and DWT domain. Pattern
Recognit. 2012, 45, 4292–4299. [CrossRef]
13. Liu, H.; An, Q.; Liu, T.; Huang, Z.; Deng, Q. An infrared image denoising model with unidirectional gradient and sparsity
constraint on biomedical images. Infrared Phys. Technol. 2022, 126, 104348. [CrossRef]
14. Liu, H.; Fang, S.; Zhang, Z.; Li, D.; Lin, K.; Wang, J. MFDNet: Collaborative Poses Perception and Matrix Fisher Distribution for
Head Pose Estimation. IEEE Trans. Multimed. 2022, 24, 2449–2460. [CrossRef]
15. Liu, T.; Yang, B.; Liu, H.; Ju, J.; Tang, J.; Subramanian, S.; Zhang, Z. GMDL: Toward precise head pose estimation via Gaussian
mixed distribution learning for students’ attention understanding. Infrared Phys. Technol. 2022, 122, 104099. [CrossRef]
16. Velliangiri, S.; Premalatha, J. A Novel Forgery Detection in Image Frames of the Videos Using Enhanced Convolutional Neural
Network in Face Images. Comput. Model. Eng. Sci. 2020, 125, 625–645. [CrossRef]
17. Mo, H.; Chen, B.; Luo, W. Fake Faces Identification via Convolutional Neural Network; ACM: New York, NY, USA, 2018; pp. 43–47.
18. Liu, T.; Liu, H.; Chen, Z.; Lesgold, A.M. Fast Blind Instrument Function Estimation Method for Industrial Infrared Spectrometers.
IEEE Trans. Ind. Inform. 2018, 14, 5268–5277. [CrossRef]
Appl. Sci. 2023, 13, 2922 14 of 14

19. Liu, T.; Liu, H.; Li, Y.; Zhang, Z.; Liu, S. Efficient Blind Signal Reconstruction with Wavelet Transforms Regularization for
Educational Robot Infrared Vision Sensing. IEEE/ASME Trans. Mechatron. 2019, 24, 384–394. [CrossRef]
20. Liu, T.; Liu, H.; Li, Y.; Chen, Z.; Zhang, Z.; Liu, S. Flexible FTIR Spectral Imaging Enhancement for Industrial Robot Infrared
Vision Sensing. IEEE Trans. Ind. Inform. 2020, 16, 544–554. [CrossRef]
21. Wu, Y.; AbdAlmageed, W.; Natarajan, P. ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries
with Anomalous Features; IEEE: New York, NY, USA, 2019; pp. 9535–9544.
22. Bi, X.; Wei, Y.; Xiao, B.; Li, W. RRU-Net: The ringed residual U-Net for image splicing forgery detection. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–19 June 2019.
23. Kwon, M.; Nam, S.; Yu, I.; Lee, H.; Kim, C. Learning JPEG Compression Artifacts for Image Manipulation Detection and
Localization. Int. J. Comput. Vis. 2022, 130, 1875–1895. [CrossRef]
24. Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution
Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3349–3364. Available online:
https://ieeexplore-ieee-org.libproxy.smu.ac.kr/document/9052469 (accessed on 20 January 2023). [CrossRef] [PubMed]
25. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al.
ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. Available online: https://link.
springer.com/article/10.1007/s11263-015-0816-y (accessed on 20 January 2023). [CrossRef]
26. Mahfoudi, G.; Tajini, B.; Retraint, F.; Morain-Nicolier, F.; Dugelay, J.L.; Pic, M. DEFACTO: Image and Face Manipulation Dataset;
EURASIP: Piscataway, NJ, USA, 2019; pp. 1–5.
27. Dong, J.; Wang, W.; Tan, T. CASIA Image Tampering Detection Evaluation Database; IEEE: Piscataway, NJ, USA, 2013; pp. 422–426.
28. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778.
29. Hsu, Y.; Chang, S. Detecting Image Splicing Using Geometry Invariants and Camera Characteristics Consistency; IEEE: Piscataway, NJ,
USA, 2006; pp. 549–552.
30. DEFACTO Splicing Image Set. Available online: https://www.kaggle.com/datasets/defactodataset/defactosplicing (accessed
on 20 January 2023).
31. CASIA v2 Dataset. Available online: https://www.kaggle.com/datasets/divg07/casia-20-image-tampering-detection-dataset
(accessed on 20 January 2023).
32. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like