You are on page 1of 17

J. Vis. Commun. Image R.

24 (2013) 1276–1292

Contents lists available at ScienceDirect

J. Vis. Commun. Image R.


journal homepage: www.elsevier.com/locate/jvci

Computer generated images vs. digital photographs: A synergetic feature


and classifier combination approach
Eric Tokuda, Helio Pedrini ⇑, Anderson Rocha
Institute of Computing, University of Campinas (Unicamp), Av. Albert Einstein, 1251, Campinas, SP 13083-852, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: The development of powerful and low-cost hardware devices allied with great advances on content edit-
Received 21 December 2012 ing and authoring tools have promoted the creation of computer generated images (CG) to a degree of
Accepted 23 August 2013 unrivaled realism. Differentiating a photo-realistic computer generated image from a real photograph
Available online 4 September 2013
(PG) can be a difficult task to naked eyes. Digital forensics techniques can play a significant role in this
task. As a matter of fact, important research has been made by the scientific community in this regard.
Keywords: Most of the approaches focus on single image features aiming at detecting differences between real
Digital forensics
and computer generated images. However, with the current technology advances, there is no universal
Feature fusion
Photorealism
image characterization technique that completely solves this problem. In our work, we (1) present a com-
Classifier combination plete study of several CG versus PG approaches; (2) create a large and heterogeneous dataset to be used
Image descriptors as a training and validation database; (3) implement representative methods of the literature; and (4)
Synthetic images devise automatic ways for combining the best approaches. We compared the implemented methods
Voting method using the same validation environment showing their pros and cons with a common benchmark protocol.
Feature extraction We collected approximately 4850 photographs and 4850 CGs with large diversity of image content and
quality. We implemented a total of 13 methods. Results show that this set of methods can achieve up to
93% of accuracy when used without any form of machine learning fusion. The same methods, when com-
bined through the implemented fusion schemes, can achieve an accuracy rate of 97%, representing a
reduction of 57% of the classification error over the best individual result.
Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction emergence of models previously unthinkable, such as representa-


tions of real scenes with a high degree of realism (photorealism).
The cost reduction of digital imaging equipment and the con- Amid such progress in the quality of the images, identifying a
stant technology advancements have boosted the popularity of photo-realistic computer generated image (CG) may represent a
digital photographs allowing even non-specialized users to have complex task to naked eyes. Although the human visual system
access to high-definition cameras, camcorders and scanners. Cur- is unrivaled in aspects such as high precision and quick under-
rently, there is a considerable amount of Photographs (PG) online, standing of the scene, it is ineffective in tasks such as identifying
with various content sources and levels of professionalism. inconsistencies of lighting, perspective or colors [4]. Fig. 1 illus-
In addition, the technological advances in hardware and soft- trates two images that can offer difficulties in distinguishing be-
ware have allowed constant progress on Computer Graphics. Com- tween CG and PG.
panies such as Nvidia [1] manufactures graphics processing units In the legal scenario, a photograph can be considered an impor-
designed to deal with large amounts of parallel tasks. These com- tant criminal evidence. For example, the United States of America
panies report that their products already reach over 500 billion has a strict policy to combat child abuse. In 1996, it was estab-
Floating-Point Operations per Second (FLOPs). Graphical software lished the Child Pornography Prevention Act (CPPA) [5], which
tools such as Maya [2] and 3DS Max [3] are constantly being im- aimed at combating virtual child pornography. This amendment
proved to keep up with the high performance hardware. incriminated any multimedia containing child pornography related
The area of Computer Graphics has a variety of industrial and contents, even computer graphics based ones. Six years later, the
scientific applications, such as the creation of animations, com- U.S. Supreme Court reviewed the amendment and considered it
puter-aided design, simulation systems, crime scene reconstruc- as a constitutional violation. Since then, the creation and distribu-
tion, among others. The advances in the field prevented the tion of synthetic images with pornographic content was decrimi-
nalized. Nowadays, in the United States, images associated with
⇑ Corresponding author. child abuse are considered legally protected if it is proved the syn-
E-mail address: helio@ic.unicamp.br (H. Pedrini). thetic origin of the image. The problem is that some offenders

1047-3203/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jvcir.2013.08.009
E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292 1277

 implementation of existing methods from the literature


and composition of a reference basis;
 formulation and implementation of new methods, through
the extension of existing image processing descriptors to
our scope;
 proposition and implementation of approaches to features
and classifiers combination, in order to reduce the dimen-
sionality of the feature space and increase the accuracy of
the existing methods.

Fig. 1. (a) A computer generated image (CG) and (b) a photograph (PG). We organize the remainder of this paper into five sections. In Sec-
tion 2, we analyze the literature methods which address the prob-
lem of PG versus CG and, additionally, expose image descriptors
started to acquire normal child pornography photographs and alter that may be used in the formulation of new methods focused on
them digitally in such a way they seem to be computer generated. the problem of this paper. In Section 3, we present our methodology
This is a scenario where digital forensics can be very helpful to dis- and implementation of each of them. In Section 4, we describe and
tinguish with confidence between CG and PG. Other typical sce- analyze the results of each approach. Finally, in Section 5, we pres-
nario in which computer generated images can play damaging ent some concluding remarks and suggestions for future work.
roles is image doctoring for political propaganda. In all cases, the
validation of authenticity of the images, therefore, is a major chal-
lenge in forensics [6]. 2. Related work
In this problem, we consider a photograph any image originated
from an acquisition device (e.g., camera, scanner) capturing a High quality computer graphics images emerged with the evo-
scene. In turn, a synthetic image is any scene partially or totally lution of the field of Computer Graphics. Methods for distinguish-
rendered by a computer software. This is a standard definition in ing PGs and CGs are of mid-2000s [7] and often use concepts of
the literature. Some authors, such as [7] (c.f., Section 2.2.4), have related areas [7,15–17].
also considered the actual content of the scene photographed/rep- Most of the existing proposals for distinguishing CGs and PGs in
resented. However, in forensics, we can classify such situations as a the literature contemplate two steps:
recapture attack. Recapture attacks might be approached by using
liveness detection, for instance, such as the work we have devel-  identification and extraction of features that reveal the dif-
oped in [8,9]. One aspect that needs to be clear is that, under the ferences between the two classes (CG vs. PG);
CPPA terms, we need to identify the situation in which one cap-  classification of images based on the set of obtained charac-
tures a photography involving child pornography and then alters teristics (features).
it in the computer in such a way it resembles a computer-gener-
ated one. The purpose of the content creator is to avoid someone In addition to the concepts already explored in the area of identifi-
to think the altered content is photography. cation of CG, many descriptors from related areas in image process-
For differentiating PGs and CGs, if each photograph had a reli- ing are promising for use in the problem but were not explored in
able mark that indicated its source camera, then the problem previous works in the forensic literature.
would be solved [10]. However, it would require the watermark The main difference between the various methods in the litera-
to be inserted in all existing photographs, clearly an impossibility ture consists in the choice of the characteristics to describe an im-
for existing images captured with older devices. Non-intrusive ap- age (descriptor). The effectiveness of this process is fundamental to
proaches, therefore, constitute the most suited option to solve the the good accuracy of a method. Here, we analyze existing descrip-
problem. tors on the applicability and effectiveness in the problem of CGs vs.
Researchers have approached this problem in various forms. PGs.
The human visual system is quite complex and uses many visual Areas such as Content-Based Image Retrieval (CBIR) have exten-
features to classify a scene. Using it as inspiration, we can try to sive research in the area of feature extraction and image character-
identify the visual features that distinguish between PG and CG ization [18] and thus present potential sources for our work. The
and use them in our approach. Edges, colors and shapes are exam- characterization of an image can be based on several criteria: color
ples of visual characteristics that could be used [11]. To improve histograms [19], texture [20,21], shape [22] edges [23], meshes
the current results, a first approach would be the use of new and (patches) [23], surface [24], among others.
relevant features. A second approach would be a novel way to We then describe the foundations extensively used in the area.
use the existing methods, such as an ensemble of them. In the following sections, we review the relevant work available in
Prior work reported classification accuracies superior than 95% the literature, describe concepts explored in other areas that we
[12,13] on this problem, however, Gloe et al. [14] created scenarios apply to our set of solutions. Relevant works for combining
in which the accuracy of some of these methods was significantly descriptors and classifiers which we also explore in this work are
lower than the reported in the papers. In a real scenario, containing directly explained in Section 3.
complex scenes, we believe that the methods would present lower
accuracy rates. In a scope in which the objects of study can serve as
criminal evidence, we need robust and scenario-invariant tech- 2.1. Fundamentals
niques. Thus, in the task of discerning between PG and CG, in this
paper our contributions are: A pattern recognition tool, or simply classifier, aims at estab-
lishing a classification criterion based on a set of reference data.
 collection of a complex and diverse dataset of CG versus PG Unlike the pattern matching, the classifier does not seek to identify
which will be freely available for benchmarking algorithms equality, but statistical similarities in the data. While the human
in the field; eye is effective in certain aspects [4], classifiers play an important
role in the automatic data classification.
1278 E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292

Two pattern recognition methods widely used in studies in the 2.2.3. Wang e Moulin [13]
area of identification of CGs are the Linear Discriminant Analysis Wang and Moulin [13] explored the image differences in the
(LDA) and Support Vector Machine (SVM) [25]. LDA creates a linear frequency domain and proposed a model based on histograms of
classification surface and is of relatively simple implementation. wavelet coefficients of the image. For performance improvement,
SVM uses n-dimensional classification surfaces and their imple- band-pass filters were applied to the standard histogram in order
mentation is more complex although normally more effective. to characterize it with a reduced amount of information. In the val-
idation, the authors used an LDA classifier and a database of per-
2.2. State-of-the-art CG vs. PG methods sonal images. The authors reported an accuracy of around 100%
for both types of images at a rate of 0.1% false negatives. According
In the following sections, we describe and discuss relevant to the analysis [28], this method is more precise than the approach
methods found in the literature related to CG versus PG described in [15] and obtained a significant reduction in the com-
discrimination. putation time.

2.2.4. Ng et al. [7]


2.2.1. Lyu e Farid [15] Ng et al. [7] proposed a model based on the geometric proper-
An approach to the problem of distinguishing between PG and ties of an image, which takes into account the physical PGs and
CG is the wavelet decomposition, such as proposed by Lyu and CGs formation processes. The paper defines two distinct and inde-
Farid [15]. This information is used to check if an image follows pendent concepts for the evaluation: (1) the authenticity of the
the pattern of a PG. According to [26], the wavelet coefficients of acquisition process and (2) the authenticity of the scene. The
natural images subbands have a generalized Laplacian distribution, authenticity of acquisition determines if the data was obtained
which is characterized by a peak at zero and long and symmetrical by a digital acquisition device or was generated by computer, inde-
tails. One interpretation of this distribution in an image can be pendently of the image content. The authenticity of the scene, in
understood by the presence of large areas of homogeneous regions turn, aims at assessing the image content, independently of the
and abrupt transitions (edges). The decomposition follows the source. The purpose of the definition of these concepts is the cate-
method of quadrature mirror filters, of Vaidyanathan [27] and pro- gorization of classes that may not properly fit in the traditional
poses the image decomposition in n scales, three orientations (ver- division PG versus CG. PGs with photo-realistic content (recap-
tical, horizontal and diagonal) and three color channels (red, green ture), which cannot be properly categorized as CG or PG, can be
and blue). isolated with the definitions proposed by the authors.
To characterize the distribution of coefficients, the authors ob- Based on these differences, the study proposed the analysis of
tained fourth-order statistics for each subband (mean, variance, the images in two different scales. The best-scale geometry of
skewness and kurtosis) of the original image. In addition to the tra- the scene can be described by a fractal dimension and by local
ditional mean and variance for characterizing a distribution, the patches. At the other scale, the geometry is best described by a lan-
use of kurtosis aims at measuring how much the peak values differ guage of differential geometry. The surface gradient, the second
from the rest of the distribution. The skewness measures the asym- fundamental form and the Beltrami flow vectors are calculated.
metry of the probability distribution. Such features are extracted by the moment of rigid bodies method,
These features, however, do not adequately capture the correla- which characterizes the joint distribution of various sizes. This
tion coefficients in scale, color bands and orientations. The correla- information is used by an SVM classifier during the training phase.
tion coefficients of the image through various scales and The same dataset was used to evaluate the implementation of
orientations reflects relevant characteristics such as edges. This [15] for validation and comparison of the results. The average accu-
correlation is provided by error predictors. Each coefficient is asso- racy was of 83.5%, above the results obtained with the baseline ap-
ciated with an error, again characterized by the fourth order statis- plied to the same data set (80.3%). According to the authors, the
tics: mean, variance, skew and kurtosis. The feature vector is a results indicate that the method is effective in the classification
factor of the number of color channels used, number of scales, of acquisition, although not in the classification of content.
number of orientations, and number of order statistics. From this
predictive model coefficients, the authors estimated the coefficient 2.2.5. Rocha and Goldenstein [29]
of each subband and the errors associated with each coefficient for Rocha and Goldenstein [30] have proposed an approach based
each color channel and obtained a set of 216 statistics. For a fixed on perturbations on the Least Significant Bits (LSB) of the images.
false negative rate of 1.2%, the authors reported an accuracy of The main assumption is that the two classes of images, when ‘‘dis-
54.6% and 66.8% with LDA and SVM classifier, respectively. turbed’’ in a particular form of change in the least significant bits,
show distinct patterns of behavior. The method progressively cre-
2.2.2. Chen et al. [17] ates perturbed images from the input image with different degrees
Chen et al. [17] followed the line of the wavelet decomposition. of disturbance (LSB rate to be changed). A 96-D feature vector is
The method, unlike [15], does not calculate the four statistical dis- formed and supplied to an SVM classifier. The results were ex-
tribution of wavelet coefficients and errors associated with each pressed as the frequency of hits per category: 98.7% accuracy in
estimate. To characterize the distribution, only the moments of the identification of PGs and 95.7% in the identification of CGs.
the characteristic functions of the two distributions are obtained. Two limitations of the method, according to the authors, are the
A total of 78 features are extracted for each color component im- requirement of many training samples (about 40,000 images)
age, resulting in a feature vector of 78  3 dimensions. One of and the inefficacy in cases of recapture attacks.
the proposed methods is to investigate the influence of system col-
ors in the image feature extraction, in particular, the HSV (Hue, Sat- 2.2.6. Dehnie et al. [31]
uration and Value) color space. Dehnie et al. [31] based on the assumption that PGs obtained by
The authors used an SVM classifier on the validation of methods the same process of acquisition [32] preserve common traits. These
and the data used were obtained from different sources. The residual traces can be used to distinguish a PG from a CG. To obtain
authors reported an accuracy of 82.1% considering the HSV color the noise of an image, the adaptive filter proposed by Lukas et al.
space and 76.9% for the RGB, showing the importance of color rep- [32] is applied and the original image subtracted by the filtered im-
resentation for descriptors. age. The filter is derived from a statistical model that takes into ac-
E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292 1279

count the dependency between wavelet coefficients of adjacent be obtained directly from the source, i.e., not resulting from a post-
pixels present in the images. acquisition processing such as rescaling, since this could destroy
The average noise of all the images in the same class gives rise the characteristics associated with the color interpolation.
to the standard reference noise of that class. The classification is
done by the value of the correlation between the residual image 2.2.9. Li et al. [16]
and the standard reference. The photographs were acquired from Li et al. [16] proposed a method which uses the key ideas and
several camera models. The CGs were obtained from the Internet methods of [15,17]. In their implementation, however, the authors
and were divided into two groups: images generated by Maya [2] did not use the wavelet decomposition and worked in the HSV col-
and by 3DS Max [3]. The results indicate that PGs have low corre- or space. The authors consider a two-scale analysis of the image:
lation with the standard errors of both packages. The average per- the original and a reduced one. In each of the scales, an approxima-
formance of the method was 72% [28]. tion of the second order derivative is computed. This differentia-
tion is achieved in four different directions. Variance and kurtosis
2.2.7. Dirik et al. [12] statistics are calculated on these results. Despite this difference
Dirik et al. [12] proposed a method based on the color interpo- to assess the approximate correlation in a given direction, it does
lation process present in digital cameras. Even if further processing not capture the correlation between different directions. To do
is applied, the identification of the algorithm applied in the step of so, they use a linear predictor, similar to that proposed by [15].
demosaicing is still possible. The authors proposed that, to classify From each predicted value, the logarithm of the error is computed.
an image as PG, it is sufficient to identify a color interpolation The authors assume that if PGs follow the linear model pro-
process. posed, then the distribution of these errors serves as criteria for
The method supports the hypothesis that if one PG, interpolated categorization of images. The first fourth-order statistics are used
by a Bayer filter [33], is re-interpolated by the same filter, it will to characterize this distribution. For each scale, RGB color channel
suffer significantly less changes than if it was re-interpolated by and orientation, two statistics from the differentiation matrix and
another filter. four statistics from distribution of errors, are obtained. The method
Dirik et al. [12] also explored the interference of the lenses in provides a total of 144 characteristics.
the digital photograph acquisition process. The chromatic aberra-
tion is related to the difference between refractive indices for dif- 2.2.10. Peng et al. [35]
ferent wavelengths of light incident on the lens acquisition, Peng et al. [35] proposed a method for identification of natural
which causes a misalignment between color channels of the image. images and computer generated graphics based on hybrid features.
An alignment of the color channels means that there is dependence Initially, statistical features are extracted in the spatial and wavelet
between them. An image information with high mutual informa- domain, such as mean, variance, kurtosis, skewness, and median of
tion preserves the alignment of the color channels and, therefore, the histograms of grayscale images. Then, fractal dimensions of the
can be identified as having no traces of chromatic aberration. grayscale images and wavelet sub-bands are extracted as visual
A total of 1200  3 ¼ 3600 images were gathered. Half of the features. Finally, a pre-processing of Gaussian filter is applied to
data set was used to train an SVM classifier, and the other half the images prior to the computation of photo response non-unifor-
was used to test it. The reported results are 98.1%, 89.3% and mity noise (PRNU) through a wavelet-based denoising filter, and
99.6% using color interpolation characteristics, chromatic aberra- physical features are calculated from the enhanced PRNU.
tion and wavelet coefficients, respectively. Results considering fu- An SVM classifier is used in the identification process, achieving
sion of demosaicing and wavelet features resulted in an accuracy of a classification accuracy of 97.3% for computer generated graphics
99.9%. Further experiments with JPEG compression showed a clas- and 91.28% for natural images.
sification accuracy of 90% validating the hypothesis that the chro-
matic aberration features are relevant, even for images with high 2.2.11. Fan et al. [36]
compression ratios. Fan et al. [36] investigated factors that cause people to perceive
images as either real or computer-generated. A set of photographs
2.2.8. Gallagher and Chen [34] and computer-generated images is shown to computer-graphics
Gallagher and Chen [34] also used traces of Bayer interpolation experts and laypeople to judge their types, which were categorized
of colors to distinguish digital images and computer generated as original, modified to show only intrinsic reflectance compo-
images. The algorithm makes no assumption about the identity nents, and modified to show only intrinsic shading components.
or even about the linearity of the algorithm of demosaicing. The From the experiments, results demonstrated that visual realism
only assumption is that all the interpolated pixels present a differ- depends not only on image properties, but also on viewer’s cogni-
ent set of variances of the original set of pixels. tive characteristics. Color and shading played important roles in vi-
To verify this hypothesis, the method seeks a periodicity in the sual realism. Although experts were able to outperform laypeople
variances of the diagonals of the image. Initially, a high-pass filter in the identification process, their ability was limited to grayscale
is applied to enhance periodicity, if present. In case of an image images.
that suffered interpolation, it was expected that the variances were
periodic among different diagonals and constant along them. Sub- 2.2.12. Nguyen et al. [37]
sequently, the variance of each diagonal is estimated by the max- The work described by Nguyen et al. [37] addressed the prob-
imum likelihood method. At the end, there is a vector with the lem of discriminating between computer generated and photo-
variances for each diagonal, whose signal is analyzed through a graphic human faces. The method is based on an estimation of
Fourier transform. face asymmetry, however, the approach only works with frontal
A set of 2400 images was used, of which 1600 were PGs and 800 faces and it is very sensitive to the stage of shape and illumination
were CGs. The approach achieved 98.4% of accuracy. Applying post- normalization. Face asymmetry was used as main information in
JPEG compression, the method had a result of approximately 82% the identification process.
classification accuracy. According to the authors, it was expected
that the method needed large images (larger diagonals), however, 2.2.13. Isenberg [38]
even with small images, the method achieved a classification accu- Isenberg [38] analyzed several evaluation methods for non-
racy of 66%. One restriction of the method is that the images must photorealistic and illustrative rendering (referred to in the work
1280 E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292

as NPR, non-photorealistic rendering), including qualitative and Fast Fourier Transform not equally spaced and via Fourier Wrap.
quantitative techniques. Some issues addressed in the work in- The methods differ mainly by the choice of the spatial grid used
clude aspects that make an NPR technique be able to successfully in each scale and angle. Both return a table of curvelet coefficients
replicate a traditional technique for creating photorealistic con- indexed by a scaling parameter, an orientation parameter and a
tent. The authors present a discussion on implications for the use spatial location. For an array of size n  n, both implementations
of non-photorealistic techniques, and people’s opinion about dif- have complexity Oðn2 log nÞ. The transforms are invertible, with
ferent non-photorealistic techniques as compared to traditional vi- inversion algorithms of similar complexity.
sual representations. The objective of the work is to evaluate how Complementarily, the contourlet transform [42] is also a multi-
good some techniques are in generating digital content. resolution and directional transform. Unlike curvelets, it was pro-
posed directly in the discrete domain. The decomposition by pyra-
2.2.14. Summary midal filter bank is obtained by combining the Laplacian pyramid
All the aforementioned methods that addressed the problem of and a directional filter bank.
CG vs. PG have been studied individually. Some studies have tested Finally, the shearlet transform [45] is newer and is similar in
a simple union of the methods with two or three other methods construction to the curvelets. Its mathematical basis is solid and,
[12]. A robust comparison of methods for distinguishing between given its construction, it also enables multi-resolution analyses of
CG and PG, in which the execution environment and the train- images. The shearlet transform uses a continuous dilation with
ing/testing data are the same, is not available yet according to two parameters. This dilation is the product of a scaling matrix
our knowledge and this is a subject we explore in this paper. dish by a shear. Therefore, the shearlet coefficients depend on scal-
ing, shear and translation parameters.
2.3. Methods from related fields
2.3.3. Histogram of Shearlet Coefficients [46]
There are several works which were not directly proposed for The Histogram of Shearlet Coefficients (HSC) [46] is an approach
the problem of distinguishing PGs and CGs, however, that can be that takes advantage of the directionality of the shearlet transform
extended to deal with such problem. In this sense, we analyzed to determine the distribution of the edges of the image.
some of them to be used as our basis. First, a multi-scale shearlet decomposition is applied in order to
capture the image information in different orientations and scales.
2.3.1. Fractality [39] Then, for each scale, a histogram with the same number of inter-
An important concept in the area of image segmentation is the vals as the number of orientations is computed. The value of each
fractality, proposed by Mandelbrot [40]. The fractal dimension of histogram interval is obtained by summing the absolute values of
an image can be calculated by the technique of box counting the shearlet coefficients. Finally, the histograms of all levels are
[39]. We cover a limited set E 2 Rn with disjoint boxes of side e. concatenated and normalized.
Let N e ðEÞ be the number of boxes. Suppose that E contains an infi- In their work, the authors report promising results for face iden-
nite number of points as a curve or a surface, and that N e ðEÞ tends tification and texture classification.
to þ1 when e tends to 0. The size of the box, D, characterizes the
rate of growth. 2.3.4. Cooccurrence Matrix [47]
Some authors state that the box counting dimension is only de- The Gray Level Cooccurrence Matrix (GLCM) [47] is a widely
fined when upper and lower boundaries coincide. The definition of used texture descriptor in image analysis. Unlike the measures cal-
the authors do not require the use of boxes as a measuring ele- culated directly from the values of the original image (first-order
ment. An alternative definition uses circles of radius , instead of moment), the GLCM regards the relationship of groups of pixels.
boxes. Descriptors based on the cooccurrence matrices are obtained in
two stages. First, directional operators (which define how the im-
2.3.2. X-lets [41–43] age should be covered) are computed from the image. If each pixel
In the field of image processing, there is a series of transforma- of the image can take n values for each color channel, then the
tions applicable to a set of data. Two-dimensional transforms can resulting array of the image will have size n  n. Then, for images
be analyzed from several aspects such as: multi-resolution that assume discrete values between 1 to 256, the related matrices
(decomposition into multiple resolutions), location (in space and have size 256  256 ¼ 65; 536.
frequency), redundancy (whether the coefficients have redundant In the second step, statistics derived from the matrices are used
information), directionality (vertical, horizontal and diagonal as descriptors. For instance, Haralick et al. [47] proposed 14 sepa-
directions can not effectively capture all the details of the image), rate measures. Clausi [48] analyzed the correlation between tex-
and anisotropy (the windows must have different formats to cap- tures proposed by Haralick et al. and concluded that, among the
ture every nuance of an image). reported measures, there are at most five non correlated: contrast,
The wavelet transform lacks the last two characteristics, which dissimilarity, inverse difference moment of the normalized and
are also responsible for the distinction between curvelets and con- standardized inverse difference.
tourlets. The curvelet transforms [43] were initially developed in
the continuous domain. Multi-scale filters are applied, followed 2.3.5. Histogram of Oriented Gradients [49]
by ridgelets [44] processed in each subband. Adjustments were The Histogram of Oriented Gradients (HOG) counts the occur-
proposed for the discrete domain. A new implementation of curv- rences of oriented gradients over regions of an image [49]. HOG
elets was subsequently proposed by the authors without using the differs from other techniques to be calculated on a grid of uni-
ridgelet transform. Shortly after the introduction of curvelets, some formly spaced cells and use of normalization with overlapping
researchers have developed numerical algorithms for their imple- regions.
mentation and reported a series of practical successes [43]. These In the HOG method, the first step is the gamma correction. A
implementations are based on the original construction, which gradient detector is applied and the direction of the gradient is
uses a pre-processing step involving a partitioning in phase and determined for each pixel. For each region of fixed size (cell), the
space followed by a ridgelet transform applied to data blocks. frequency interval of occurrences of each gradient direction is cal-
In [43], the authors of the curvelet transform redesigned it to culated and a histogram of gradient orientations is created. Cells
simplify its implementation. Two methods have been proposed: may have an arbitrary form. The authors implemented the method
E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292 1281

with cells in circular and rectangular formats. Each cell has its Table 1
direction weighted by the intensity gradient. The cells are grouped Concepts used in the implemented methods. In the first column, we show the indices.
In the second, the identifier used in our work. In the third, the main concept used by
into larger regions (blocks), step in which the histograms are con- the method. In the last column, the related features.
catenated and normalized. The resulting feature vector is the con-
catenation of the block histograms. Index Method Basis Feature
1 Li Second order differences [16] Edges/Texture
2 LSB Camera noise [28] Acquisition
2.3.6. Local Binary Patterns [50] 3 LYU Wavelet transform [15] Edges/Texture
The human visual system is able to interpret nearly achromatic 4 POP Interpolator predictor [53] Acquisition
5 BOX Boxes counting [39] Auto-similarity
scenes, such as at low light levels. The color acts only as a sugges-
6 CON Contourlet transform [42] Edges/Texture
tion for richer interpretations. Even when the color information is 7 CUR Curvelet transform [56] Edges/Texture
distorted, for example, due to color blindness, the visual system 8 GLC Cooccurrence matrix [47] Texture
still operates satisfactorily [51]. Intuitively, this suggests that, at 9 HOG Histogram of oriented grads. [49] Shape
10 HSC Histogram of shearlet coeff. [46] Curves
least for our visual system, contrast and texture are distinct phe-
11 LBP Local binary patterns [50] Edges/Texture
nomena. However, the joint use of texture and contrast is popular 12 SHE Shearlet transform [45] Edges/Texture
in image analysis. 13 SOB Sobel operator [57] Edges
Local Binary Patterns (LBP) have been introduced as a supple- 14 FUS1 Concatenation Combination
mentary measure to the contrast of an image [50]. The value is ob- 15 FUS2 Simple voting Combination
16 FUS3 Weighted voting Combination
tained by summing the values thresholded by the central pixel,
17 FUS4 Meta-classification Combination
weighted by their position relative to the central pixel. The algo-
rithm originally employed radius equal to 1 and an 8-connected
neighborhood.
The major limitation of this approach is the support to only
small and fixed areas around the image. Images captured in an 8- technique to put all descriptor values in a common domain, we
connected neighborhood, for example, cannot capture structural can have promising results.
details of larger scales. Moreover, the operator is not robust to sub- In general, the formulation of a classification method that em-
tle changes such as in illumination direction [50]. Subsequently, ploys pattern recognition is usually composed of two stages: fea-
the authors of [52] proposed multi-scale binary pattern operators. ture extraction and classifier training.
Its implementation is simple and consists of applying the operator A descriptor represents each image as an ordered set of features
in multiple neighborhoods. and can be seen as a mapping of the image information to a m-
dimensional feature space in which m is the number of features
the descriptor represents. Considering the descriptors we employ
2.3.7. Demosaicing [53] in this work, we have the lowest m with the fractality method, with
Popescu and Farid [53] proposed a way to detect demosaicing m ¼ 3, and the largest with the curvelet coefficients, with
and distinguish different types of interpolation employed by differ- m ¼ 2328.
ent models of cameras. The authors used this method to identify The use of a large number of features can lead to the curse of
manipulations in an image. dimensionality [54]. In addition, the necessary data volume to per-
Suppose an image has been interpolated by a specific interpola- form a statistically significant analysis increases exponentially
tion algorithm. The pixels of one color channel will present a differ- with the number of data dimensions. A small number of parame-
ent correlation pattern than in the tampered region. Thus, the ters may result in a low and possibly false characterization. Here,
sample block can be used to identify regions that have suffered we also assess if the proposed descriptors incur in the curse of
changes. dimensionality (specially when combining them through fusion)
For simplicity, the authors assume that the interpolation algo- by means of Random Subspace Methods (RSM) [55].
rithm, albeit unknown, is linear. The authors use an Expectation/ In this work, we implemented and validated 17 approaches (see
Maximization (EM) algorithm for iteratively estimating the un- Table 1). Each of the implementations is explained below. The
known parameters (the pixel neighborhood size and the correla- remainder of this section explains how they were implemented:
tion parameters among them). In the E step, the authors the image acquisition (Section 3.1), the state-of-the-art methods
compute the probability that each sample belongs to the linear (Section 3.2), the approaches from related fields (Section 3.3)
model assumed. In the M step, they estimate a model interpola- which are tested for this particular problem for the first time,
tion. The authors report a 97% classification accuracy in the iden- and methods for data normalization and feature/classifier fusion
tification of interpolation models using an LDA classifier. (Section 3.4).

3. Methodology
3.1. Image collection
The CG vs. PG problem seems to be a problem in which there
will be no silver bullet image characterization process which, by it- For a more robust statistical analysis, we create a large and het-
self, completely solves the problem. Experience from several past erogeneous sample space, which translates into a large number of
work in the literature corroborates this fact. As the rich image pro- images and content diversity.
cessing and computer vision image descriptors available in the lit- Among the images, we searched for both indoor and outdoor
erature explore complementary properties of digital images, it is scenes and a variety of equipment sources, as of the personal data-
natural to expect that such properties will help to capture different sets available online and already used in previous publications
nuances and telltales related to the process of creating photoreal- [58]. Among the computer-generated ones, we looked only for
istic images. In this paper, we explore such different properties photo-realistic images. Additionally, we used images of high de-
and complementary descriptions for solving the problem and show gree of realism as the online challenge [59] to test our implemen-
that when combining them in a proper way considering not only a tations. Section 4.2 gives more details on the collection of such
smart combination policy but also an appropriate normalization dataset.
1282 E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292

3.2. Implementation of the state-of-the-art methods 3.3.3. CON


The contourlet transform [42] also has an implementation
We implemented two baselines methods from the literature: available in the Internet, the Contourlet Toolbox [61]. We used this
Lyu [15] and Li [16]. Both are described below. We decided for implementation as the basis of our approach. Initially, we calculate
these two methods because the first one was one of the first pro- the standard Laplacian decomposition of the image and apply a
posed to this problem and is still used with reasonable accuracy. high-pass directional filter bank decomposition to every scale. As
The second one was proposed recently and improves upon the first we can decompose each scale into an arbitrary number of direc-
one. tions, we use two different quantization levels for subbands. More
levels are devoted to high-detailed scales and less levels to the
remaining scales. The implementation has as parameters the pyra-
3.2.1. Lyu
midal filter, the directional filter and the number of directional
The first implementation was based on work described by Lyu
decompositions for each level of the pyramid.
and Farid [15], which presents a detailed description of the original
We used the directional filter pkva [62] in every approach. The
implementation.
pkva filter is biorthogonal and has quincunx form [42]. The imple-
In the first approach, we used the pyramid decomposition with
mentation of the contourlet transform optimizes the filter to ob-
five scales. In the second approach, we replaced the features of
tain an ideal frequency response.
each subband (higher-order statistics) by the features described
The output of the implementation is a three-dimensional array,
by Haralick et al. [47], applied to the coefficients in the frequency
indexed by position, scale and direction. Then, for each subband of
domain.
scale, orientation and color, we extract the order statistics to char-
acterize the distribution of coefficients.
3.2.2. Li
The work of Li et al. [16], as well as [15], presented a fairly de- 3.3.4. SHE
tailed characterization of the implementation, making it possible We also implemented a descriptor based on shearlet transform
to implement the same algorithm as the one published. through the Shearlab tool [63]. We calculated the shearlet coeffi-
We applied an HSV color system conversion to the input image cients and used four higher-order statistics to characterize the dis-
and calculated the approximations present in [16] to estimate the tribution of coefficients.
second-order differences. We calculated the skewness and kurtosis We tested the descriptor with several nuclei of shearlet trans-
distributions from these values. Then, we reduced the scale of the forms: Daubechies8, Daubechies16, Symmlet4 and Symmlet8.
image and calculated the second-order difference again. We calcu-
late a linear prediction coefficients and its associated error, as in 3.3.5. HSC
the previous work of Lyu and Farid [15]. Again, we extracted the We used the work of [46] as the basis for implementing a
skewness and kurtosis coefficients. descriptor based on shearlet transform. We calculated a multi-
scale shearlet decomposition into three levels and eight directions.
3.3. Implementation of methods based on related fields Then, we calculated the histogram of the coefficients at each scale
with eight intervals. Finally, we concatenated and normalized the
We proposed and implemented eleven methods from related histograms.
fields: BOX, CUR, CON, SHE, HSC, GLCM, LBP, HOG, SOB, LSB and We tested the method with different number of angles and sizes
POP. It is paramount to note this paper is probably the first work of the sliding block.
to implement such methods and evaluate them on this challenging
forensic problem of differentiating photographs from computer- 3.3.6. GLCM
generated images. Despite the cooccurrence matrix has expensive computation, it
is simple and effective. Thus, we use a descriptor for GLCM texture.
We set directions and calculate the amount of certain pixel pair
3.3.1. BOX
setups. The resulting cooccurrence matrix was 256  256 for each
The method BOX is based on the technique of box counting [39].
color channel. In a first approach, we applied the descriptor for
The first step is the application of a thresholded edge detector.
each color channel and extracted the characteristics of non-redun-
Then, we count the number of boxes (NðÞ) required to cover the
dant features [47] (homogeneity, energy, contrast, and correlation).
edges of the image for different box sizes. From pairs of values 
We obtained 12 values for each image. In a second approach, we
and Nð1Þ, we adjust a line and compute its angular coefficient. For
applied wavelet pyramid decomposition and we calculated the
each image, we obtained a vector of size three. The linear coeffi-
cooccurrence matrix and the characteristics of Haralick et al. [47]
cient obtained was used as a descriptor in order to estimate the
for each subband of scale, orientation and color obtaining a vector
fractality of the image.
with 144 features.

3.3.2. CUR 3.3.7. LBP


The curvelet transform, as well as the wavelets, have been used We used local binary patterns as the basis for another descrip-
in various fields. The authors [56] have provided a complete imple- tor. We implemented rotation-invariant LBP [64] for radius 1, 2
mentation of curvelet transforms (CurveLab [60]). We used the and 3. The binary pattern location of a pixel is obtained by thres-
curvelet transform based on the Fast Fourier transform version of holding the values of neighboring pixels by the value of central pix-
the tool. el. Then, we calculated the LBPs of the image and the frequency of
Initially, we used curvelet transform coefficients as an input of the intervals. From the histogram of values, we obtained the
the classifier. The curvelet decomposition, by its directionality and descriptor.
location in space and frequency nature, has a large number of coef-
ficients (2; 270; 020 in our implementation). Due to the high 3.3.8. HOG
dimensionality of the features, we computed the four higher-order For each region of fixed size of the image (cell), we calculated
statistics for each subband of each scale, orientation and color the frequency of occurrences of each gradient direction and built
channel. The resulting number of characteristics was 2328. a histogram of oriented gradients (HOG) [65]. The used cells were
E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292 1283

rectangular, of size 3  3 pixels. The direction of each pixel of the and classifiers towards solving the problem we deal with in this
cell was weighted by the gradient intensity. Cells were grouped paper.
into blocks. Finally, the histograms of the blocks were concate- As already studied in several works in the literature [66,54], fea-
nated and normalized. We used blocks of size 3  3 and histograms ture combination normally brings together information from dif-
divided into 9 intervals. ferent characterization techniques and with different domains
(data range value for the dimensions, data types, etc.). In order to
3.3.9. SOB deal with this potential problem, before performing any fusion,
First, we applied the Sobel operator [57] without thresholding. we perform data normalization. Data normalization is a key step
We chose, among all the local maxima in the intensity map, the in dealing with large amounts of heterogeneous data [54] and in
50 largest in magnitude. For each of 50 points, we set a centered this paper we decided to evaluate this point for the PG vs. CG
Gaussian 2D point, in a square block of size 1 (parameter which problem.
we varied). We excluded those which had inadequate Gaussian fits The choice here can be challenging when there is not enough
and calculated the variances rx and ry of each Gaussian. We calcu- information on the data distribution. We theoretically studied sev-
r þr eral normalization techniques, such as t-norms, z-scores [54] and
lated the average variance ( x 2 y ) in order to eliminate the variation
to rotation. We obtained a total of 150 values as the resulting set of w-scores [54], then chose the simplest one: z-score normalization
features. [67]. The reason is that z-score remains one of the most popular
normalization techniques available, whose calculation is
3.3.10. LSB straightforward
Since acquisition equipments usually add noise to the original xl
image, we searched for traces from the internal processing of dig- z¼ ð1Þ
r
ital cameras. Our assumption is that noise can be correlated with
the least significant bits (LSBs) of the image. where x is a feature vector in m-dimensional space, l is the feature
First, we obtained a map of the least significant bits of the image vector mean and r the feature vector standard deviation.
by dividing the value of each pixel by two. Then, we linearized the
data and calculated the cooccurrence features [47] of these
3.4.1. FUS1
coefficients.
Our first combination strategy was the feature concatenation.
Previous works such as [68,69] have explored this idea, however,
3.3.11. POP none has combined more than three different methods. Each fea-
Popescu and Farid [53] proposed a way to identify the demosa- ture extraction method individually provides a set of features,
icing process. It was used as a basis for the implementation of an- which are combined into a single feature vector. The resulting vec-
other descriptor. tor is the input for the classifier (see Fig. 2).
Let f ðx; yÞ be the intensity of the pixel at position ðx; yÞ in a par- One possible advantage is that it does not discard any data be-
ticular color channel of the image. We based on the assumption fore the final judgment and all data is supplied to the classifier,
that f ðx; yÞ belongs to one of two correlation models: M 1 , if it is lin- which can take a better decision based on this data. Another fea-
early correlated to its neighbors, or M 2 otherwise. Thus, if f ðx; yÞ is ture is that only a pattern recognition step is performed.
linearly correlated to its neighbors, and Since each image is represented by a set of features, the result-
a ¼ fau;v j  N 6 u; v 6 þNg is the parameter set, then ing matrix can become prohibitive (e.g., with each combined fea-
X
N ture vector with 4100 dimensions, in our case). The classifier has
f ðx; yÞ ¼ ðau;v f ðx þ u; y þ v Þ þ nðx; yÞÞ the task of finding a separation surface in an m-dimensional space,
u;v ¼N with large m. The dynamic handling of such data volume can be
limited by the processing power of a computer. While the advances
In the Expectation step, we then compute
 2 
1 r ðx; yÞ
Prðf ðx; yÞ 2 M 1 jf ðx; yÞÞ ¼ pffiffiffiffiffiffiffi exp 
r 2p 2r2
The error is such that in the Maximization step, we minimize the
e ¼ 0 and obtain
quadratic error @ a@x;y
!
X
þN X
wðx; yÞf ðx þ s; y þ tÞf ðx þ u; y þ v Þ
u;v ¼N x;y
X
¼ x;y
wðx; yÞf ðx þ s; y þ tÞf ðx; yÞ

It can be noted that these steps are iterative and, in the first itera-
tion, a is initialized, say with value 0:5. The Expectation and Maxi-
mization steps are repeated until the difference between the
parameters a of iteration (i) and (i  1) is smaller than 1. Analyzing
the periodicities in the p-Map in the frequency domain, we ex-
tracted the four higher-order statistics of such p-Maps in order to
feed a classifier. This step goes beyond the direct (an not automatic)
analysis described in [53].

3.4. Implementation of fusion methods


Fig. 2. Flowchart for the concatenation approach. (1) An image is described by k
different descriptors. (2) k Feature vectors are extracted. (3) The resulting vector is
From the knowledge of the peculiarities of the different descrip- formed by the union of k features vectors. (4) The resulting dataset is used as the
tors explored, we analyzed the possibility of combining descriptors input of a final classifier.
1284 E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292

Fig. 3. Simple voting flowchart. (1) k Descriptors are applied to the image. (2) Each
classifier elects a class (vote). (3) Sum of the values for each class (poll). Fig. 4. Weighted voting method flowchart. (1) k Descriptors are applied to the
image. (2) Each of the k classifiers elects a class (vote) and each classifier has a
weight wi from the training phase. (3) Sum of all values, weighted by wi .

in hardware enable such kind of processing, the other approaches


we will present require less computational power and achieve sim- accðiÞ
ilar results. wðiÞ ¼ X ð4Þ
accðjÞ
To assess whether the data incurred in the curse of dimension-
j2A
ality, we perform an analysis using Random Subspace Methods
(RSM) [55] and compared the obtained results and the original It can be noted that in the testing phase, we will have accðiÞ for each
results. method and therefore wðiÞ will be able to be computed.
Let v ot i ðImÞ be the vote of an image Im to be classified by the
3.4.2. FUS2 method i. Using the weighted voting, we perform the classification
The simple voting system (without weighting) was the simplest by each method i and obtain the weighted voting wi  v ot i ðImÞ.
implementation among the proposed combination strategies. Once The final result is obtained through Eq. 5.
the individual methods have been implemented, we performed the 8 X
classification of each method. >
> CG; if ðwi  v oti ðImÞÞ < 0
>
>
An image is classified as PG or CG for each of the k implemented < i2A
X
methods and each of these classifications is called a vote. The class H ¼ PG; if ðwi  v oti ðImÞÞ > 0 ð5Þ
>
>
with the highest number of votes is elected the class of our classi- >
> i2A
:
fier (see Fig. 3). In the case of a tie, we consider error, i.e., the clas- undefined; otherwise
sifier may not correctly identify the class. Its implementation is
simple and consists on counting the number of votes for each class. The flowchart of the weighted voting method, illustrated in Fig. 4, is
very similar to the simple voting method. What changes from one
A disadvantage is that a given method only selects a class without
defining the confidence level of their response. Another disadvan- to another is the decision rule (Step 3), which is given by Eq. 5.
tage is that methods with low accuracy have the same voting
power as a method that produces excellent results. 3.4.4. FUS4
Let v ot i ðImÞ be the vote of the classifier i in the image Im. We The motivation for formulating another voting method is that
define each method can provide more than a simple binary classification
 (vote). Instead of using a simple vote, we used a classification with
þ1; if the vote is for class CG
v oti ðImÞ ¼ ð2Þ confidence level, which we call opinion.
1; if the vote is for class PG In the case of SVM, for example, instead of providing a discrete
We define the Eq. 3 as the rule of our classifier value (1 or þ1) associated to one of the classes, it returns a con-
8 P tinuous value (fxjx 2 Rg). SVM provides a hyperplane separating
< CG;
> if v oti ðImÞ > 0 the classes and the returned value indicates which ‘‘side’’ of the
P
H ¼ PG;
>
if v oti ðImÞ < 0 ð3Þ hyperplane each datapoint is located at. The magnitude, in turn,
: indicates ‘‘how far’’ the hyperplane is from a particular datapoint.
undefined; otherwise
This information is used by the classifier to set the output class.
The opinion of the classifier thus has important semantics:
3.4.3. FUS3
The weighted voting system uses as weights the accuracies ob-  the sign indicates which of the classes was chosen by the
tained in the training phase of the classifier. Methods with higher classifier;
accuracy in the training phase will have larger weight in the count-  its magnitude indicates how certain (roughly speaking) the
ing of votes. Let accðiÞ be the mean accuracy of a method i, and classifier is that the result is correct.
A ¼ 1; 2; . . . ; n the index set of the methods. Then, the weight of
the method is given by For example, an opinion with value 1:34 means that the response
of the classifier is PG and that its ‘‘confidence level’’ is 1:34.
E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292 1285

4.1. System configuration

The implementation of the descriptors was performed in Mat-


lab [70] due to the large number of libraries in the fields of pattern
recognition and computer vision [63,61,60], the extensive docu-
mentation available, and its familiarity by criminal experts, who
are potential users of the developed tool.
The classification approaches were implemented in R language
([71]) due to the large amount of work already developed in the
area. We used the library e1071, which is an implementation of
LIBSVM [72], in conjunction with the pattern recognition library
ipred.
All the experiments were conducted on a Pentium Dual Core
2.7 GHz with 2 GB of RAM.
In a first approach, we found that the results were drastically al-
tered by the choice of the parameters. For this reason, we per-
formed grid search on all tasks of classification.

4.2. Dataset collection

All CG and PG images have been collected from the Internet. An


initial phase of the study included the exploration of websites that
offered the intended content. While there is a large amount of PGs
available in the Internet, there are significantly fewer CGs avail-
able, which is justified by the difficulty in creating this type of
image.
The idea here was to collect as many computer generated
Fig. 5. Meta-classifier flowchart. (1) k Independent features extractions. (2) k images available as possible and combine them all in a more com-
Classifications and k levels of confidence (levels). The resulting k-sized vector. (3) plete dataset. We do not create such images. We collect all of them
The vector serves as input to the last classifier.
from artists and content creators from the web. All the images
(natural and photorealistic computer generated ones) will be avail-
As in the previous method, we conducted a process of feature able for free1 in a quest for standardizing algorithm benchmarking
extraction and classification for each method. We adjusted each in the literature. We emphasized more recent photorealistic images
of the classifiers to return their marginal distances to the decision as they represent a more challenging scenario for differentiating
hyperplane (opinion). Opinions are grouped into a feature vector of from natural images.
size k, which represents the number of methods. This will be the We evaluated several sources and we chose only a portion to be
feature vector used by another classifier in a meta-level (see Fig. 5). used as our data source. Graphical icons were all excluded. From
As the method previously presented, it weights the methods more than 7000 computer-generated images collected, we used
according to their accuracy in the training phase. A classifier is in 4850. To maintain an even number of images for each class, in a
charge of the final decision rule. We chose the SVM as our final range of 60,000 PGs, 4850 were randomly chosen to compose our
classifier, although any supervised learning approach could have data set. All collected images were compressed in JPEG format
been used. Our choice was motivated by the fact that SVM is well and the physical sizes were between 12 KB and 1.8 MB.
known, it is stable as the dimensionality of the input space in- Current cameras allow the capture of images with large dimen-
creases, uses minimization of structural risk and allows fast testing sions (popular cameras allow images with dimensions of 4608 x
(sublinear time) of an entry [69]. 3456 pixels). Our strategy for keeping a consistent data analysis
With the method of meta-classification, we expected a better consisted of standardizing the size of the input images by cropping
result than the method of weighted voting, since in this case the the central regions of them. The dataset we will make available
weighting is performed using a classifier. contains both sets of images (standardized and original).
A categorization of images by authenticity of content and acqui-
sition, as was done by [7], would be more appropriate since we
4. Experiments and validation could assess the cases of recapture (pictures of CG). However, to
establish a robust scenario, we would need to have a significant
The experimental work can be divided into seven subsections: number of images for each of these categories and, in practice, this
system configuration, dataset collection, protocol for validation, is a difficult task. The most frequent subjects of the CGs found in
validation of state-of-the-art methods, validation of methods from the Internet are characters and architecture, whereas CG categories
related fields, validation of combination methods, and comparative of landscapes are uncommon. Thus, we decided to keep 9700
analysis. In Section 4.1, we present details for the environment images, with the traditional categorization PG versus CG.
used to develop and test the algorithms. In Section 4.2, we show
how the dataset is used in the validation process. In Section 4.3, 4.3. Validation protocol, classification techniques and presentation of
we discuss the validation protocol, the used classification tech- results
niques and the form employed to present the results. In Sections
4.4 and 4.5, we present the results for the methods from the liter- In this work, we use pattern recognition tools, in particular, the
ature and from related fields, respectively. In Section 4.6, we use Support Vector Machine (SVM) classifier. To increase the robust-
the previous methods with best results in combination approaches.
Finally, in Section 4.7, we conduct a comparative analysis of the
seventeen implemented approaches. 1
http://www.ic.unicamp.br/rocha/pub/communications.
1286 E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292

Table 2 Table 3
Results for the method based on wavelet coefficients. Results for the method based on second order differences.

Wavelet Accuracy Variance Average Second Order Accuracy Variance Average


Coefficients accuracy Differences accuracy
CG PG CG PG CG PG CG PG
LYU 1 0.942 0.899 2.40E04 2.12E04 0.920 Li 0.948 0.911 1.07E04 5.84E05 0.930
LYU 2 0.938 0.901 5.21E04 3.24E04 0.919
LYU 3 0.912 0.853 5.10E04 2.64E04 0.882
Table 4
Results for the method based on box counting.

Box Counting Accuracy Variance Average accuracy


ness of the results, we applied a cross-validation protocol to each CG PG CG PG
method. We defined the number of images to be used and parti- BOX 1 0.541 0.568 8.98E04 2.00E03 0.554
tioned the dataset into five parts (folds). All methods used the
same partitions to not favor a specific approach. For validation, 4.5. Validation of methods from related fields
we have 9700 images collected from many sources and which will
be available for free for algorithm benchmarking, we created five Eleven methods were implemented from image processing re-
equal-sized partitions of size 1940. The partitions have slightly dif- lated problems: BOX, CON, CUR, GLCM, HOG, HSC, LBP, LSB, SHE,
ferent number of CGs and PGs, once the partitions were created by SOB and POP.
a random process.
In each cross validation stage, four partitions were used for 4.5.1. BOX
training and the remaining one was used for testing. In each stage The technique based on box counting did not produce satisfac-
of the cross-validation, the training–testing sets were used as input tory accuracy. The result was not far from expected, since we used
of the classifier. We employed SVM with an RBF kernel as our clas- only the fractal dimension as descriptor. The feature vector was
sifier. The kernel parameters are determined using the grid search composed of three features. Table 4 summarizes these
algorithm during the training stage. experiments.
There is a wide divergence among the analysis found in works
of the area. As performed by Ng et al. [7], we conducted the anal- 4.5.2. CON
ysis of the results with ROC curves [73] (instead through single The implementation of the contourlet transform has as param-
contingency tables). The ROC curve provides important visual eters the number of levels in the decomposition, the pyramid
information when trying to determine the behavior of the classifier decomposition filter, and the directional filter.
in limiting cases. In the identification of CGs and PGs, we need In all implementations, we applied a 5-scale pyramid decompo-
methods that are more robust in certain scenarios (extremes of sition with pkva filters. In a first implementation, we used levels
the ROC curves). For example, in the case of a criminal evidence, (0, 0, 0, 4, 5), i.e., the bottom three scales without decomposition
it is more severe to use CG as a PG, because it is considered a fraud. (0), the penultimate with 4 levels and the last one with 5 levels.
On the other hand, in the U.S. pornography case (Section 1), it is In a second implementation, we only changed the decomposition
more important to identify the PG impersonating CG. level of the third range, which uses levels 4 – (0, 0, 4, 4, 5). The
choice for the parameters was inspired by the examples of the
own toolbox.
4.4. Validation of state-of-the-art methods
In both cases, we used a cooccurrence matrix as descriptor. A
total of 232 and 254 features in the first and second approach,
The following subsections describe two implemented methods
respectively, were obtained for each color channel. The first ap-
from the literature.
proach used a smaller number of features and was the one that ob-
tained the highest accuracy (90:2%). Table 5 summarizes these
4.4.1. Lyu experiments.
The first implementation of the method described by Lyu et al.
[15] was identical to that reported in the original work. The result 4.5.3. CUR
reached an average classification accuracy of 92%. In a first step, we performed an extraction of the curvelet coef-
We searched for possible improvements through parameter ficients. We obtained three-dimensional arrays of 556; 023
variation. In a first attempt (second implementation), we used a
pyramid decomposition with five scales (instead of four scales of
Table 5
the original method). The result was altered by less than one per- Results for the method based on the contourlet transform.
cent with respect to the original data.
Contourlets Accuracy Variance Average accuracy
In a second modification (third implementation), instead of
using the four higher order statistics for each pyramid decomposi- CG PG CG PG
tion, we used the cooccurrence matrix as a descriptor. We ob- CON 1 0.918 0.887 3.09E04 2.97E04 0.902
tained, in this case, a feature vector of 144 dimensions. The CON 2 0.904 0.885 2.25E04 6.89E04 0.894
average classification accuracy was of 88%. Table 2 summarizes
these experiments.

Table 6
Results for the method based on curvelets.
4.4.2. Li
The method proposed by Li et al. [16] achieved a high accuracy Curvelets Accuracy Variance Average accuracy
in this group, an average of 93%. CG PG CG PG
We obtained a feature vector of size 48 for each color channel,
CUR 0.806 0.805 3.57E04 1.52E03 0.805
totalizing 144 features. Table 3 summarizes these experiments.
E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292 1287

Table 7 Table 10
Results for the method based on the cooccurrence matrix. Results for the method based on the LBP descriptor.

Cooccurrence Accuracy Variance Average Local Binary Accuracy Variance Average


Matrix accuracy Patterns accuracy
CG PG CG PG CG PG CG PG
GLCM 0.640 0.630 1.14E03 8.88E04 0.630 LBP 1 0.885 0.824 1.01E04 8.72E04 0.855
LBP 2 0.904 0.838 5.64E04 1.71E04 0.871
LBP 3 0.894 0.831 8.94E04 4.30E04 0.863

coefficients as output. The resulting feature vector, after extraction Table 11


of the Haralick et al. [47] features, reached the size of 2328 pixels. Results for the method based on the LSB descriptor.
The classification resulted in a mean accuracy of approximately
Least Significant Accuracy Variance Average
80%. The major disadvantage of this approach is the large number Bits accuracy
of data, which is a performance bottleneck of the method. Table 6 CG PG CG PG

summarizes these experiments. LSB 0.672 0.651 5.79E04 9.26E04 0.662

4.5.4. GLCM
Initially, we computed the cooccurrence matrix and then ex-
tracted the characteristics of Haralick et al. [47] for each color Table 12
channel. We used a vector with 12 dimensions to represent each Result for the method based on the shearlet coefficients. The two first rows were
image. Comparing the size of the feature vector of the method with obtained using the Daubechies filter, whereas the third and fourth rows were
obtained using Symmlet filter.
the contourlet transform output (696-d), whose size is suboptimal
[42], the result was significantly worse than other methods, partic- Shearlets Accuracy Variance Average accuracy
ularly the contourlet. Table 7 summarizes these experiments. CG PG CG PG
SHE 1 0.748 0.677 6.75E04 1.73E04 0.713
SHE 2 0.748 0.676 1.57E03 5.49E04 0.712
4.5.5. HOG SHE 3 0.752 0.677 1.24E03 3.28E04 0.715
In the implementation based on the histogram of oriented gra- SHE 4 0.747 0.674 7.05E04 1.16E03 0.710
dients, we used from 9 to 16 orientations. We obtained an average
accuracy of 74% in the best implementation. Table 8 summarizes
these experiments.
4.5.7. LBP
In the method based on Local Binary Patterns, the key parame-
4.5.6. HSC ter is the definition of the radius of neighborhood. We defined
In the HSC [46] method, we varied the size of the block, the dis- three radii in our method, 1, 2 and 3, which resulted in 8, 16 and
placement of the sliding block, the number of levels, and the num- 24 neighbors, respectively. The best accuracy ( 87%) was ob-
ber of angles. We tested three sets of parameters. In all approaches, tained with radius equal to 2 (see Table 10). Table 10 summarizes
we used a displacement of 256 pixels and 8 levels. these experiments.
In a first approach, we used blocks of size 256  256 and 8 an-
gles. In a second approach, we used the same size and changed the
number of angles to 16. Finally, in a last approach, we used eight 4.5.8. LSB
angles, but with blocks of size 512  512. In the first and second We calculated the least significant bit plane for each color chan-
approach, we obtained four juxtaposed blocks, while in the last nel image, as well as [29]. From each LSB map, we extracted the
we obtained just one block. The method was applied indepen- cooccurrence matrix and the Haralick et al. [47] features. The fea-
dently in each color channel. Table 9 summarizes these ture extraction is fast and has four characteristics for each map of
experiments. least significant bits. The average accuracy was 66%. Table 11 sum-
marizes these experiments.

Table 8
Results for the method based on the histogram oriented gradients.
4.5.9. SHE
HOG Accuracy Variance Average accuracy We implemented a method based on the work of [74] using dif-
CG PG CG PG ferent wavelet filters. We used Daubechies and Symmlet filter with
HOG 1 0.735 0.700 1.02E03 5.33E04 0.718 different sizes of support.
HOG 2 0.754 0.725 1.64E04 8.77E04 0.740 After the extraction of shearlet coefficients for each color level,
scale and direction, we used mean, variance, skewness and kurtosis
of the coefficients as descriptors. The accuracies obtained from
each of the implementations differed slightly and were around
Table 9 71%. Table 12 summarizes these experiments.
Results for the method based on the histogram of shearlet coefficients. In each of the
methods, we decomposed the image into three levels. The coefficients were obtained
for each block, with center shifted by 256 pixels.
4.5.10. SOB
Shearlet Accuracy Variance Average
Histogram accuracy The Sobel operator is commonly used as edge detector. In our
CG PG CG PG
implementation, we used two variations: Gaussian window with
HSC 1 0.818 0.787 2.24E04 4.17E04 0.802 side 7 and Gaussian window with side 9. The results differed
HSC 2 0.830 0.785 5.09E04 7.37E04 0.808 slightly and were around 55%. Table 13 summarizes these
HSC 3 0.815 0.783 7.97E04 2.89E04 0.799
experiments.
1288 E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292

Table 13  Concatenation (FUS1): juxtaposition of the feature arrays;


Results for the method based on edge detection.  Simple Voting (FUS2): each method gives a vote and the
SOB Accuracy Variance Average accuracy final classification is performed by counting the votes;
CG PG CG PG  Weighted Voting (FUS3): each method casts a vote and the
final decision is performed according to a decision rule;
SOB 1 0.554 0.552 8.20E04 1.20E03 0.553
SOB 2 0.564 0.561 1.35E03 7.50E04 0.562
 Meta-classification (FUS4): each method casts more than a
vote (an opinion) and the final classifier takes it into
account to make the decision.
Table 14
Results for the method based on interpolation. 4.6.1. FUS1
Interpolation Accuracy Variance Average
The concatenation method achieved an accuracy of  93%,
Process accuracy which is a result very close to the best individual method (Li).
CG PG CG PG
The major disadvantage of using the concatenation method is the
0.570 0.575 3.15E04 8.74E04 0.573 difficulty encountered by the classifier in handling high-dimen-
sional data. We applied Random Subspace Methods (RSM) [55] to
verify whether the curse of dimensionality was a hindrance. The
average accuracy obtained with RSM was  91:7%, which lead us
Table 15 to believe that there was no serious difficulties related to the
Results for the method based on entropy. dimensionality of the feature vector.
Color Channel Entropy Variance
CG PG CG PG 4.6.2. FUS2
Red 7.1332 7.2690 0.4734 0.3041
In the simple voting method, unlike the concatenation, every
Green 6.9514 7.3222 0.6908 0.2711 single method performs independently a classification process
Blue 7.3158 7.3678 0.2162 0.2622 and there is only one count of votes in the end. The accuracy ob-
tained with the method was two percentage points above the con-
catenation method (FUS1), i.e., 95%.
4.5.11. POP
To identify traces of the PG acquisition process, we used the 4.6.3. FUS3
technique proposed by Popescu and Farid [53] as basis. As the The weighted voting method showed a boost upon the outcome
method has a large data set as output, we used the four higher-or- of simple voting (FUS2), i.e.,  95:5%. The preference of the simple
der statistics to extract information. Our implementation has as voting method over it can just be justified if the classifications had
output a feature vector with four dimensions for each color chan- already been made and we do not have access to the weighting of
nel. The method achieved an average accuracy of 57%. Table 14 each method.
summarizes these experiments.
4.6.4. FUS4
4.5.12. ENT The meta-classification method provided the best result among
In a step of pre-extraction of features, we validated the effec- the combination approaches, an average accuracy of four percent-
tiveness of the entropy as a descriptor. We calculated the entropy age points above the best individual method, i.e.,  97%þ average
for all images and calculated the mean and variance of the entropy classification accuracy. In terms of complexity, it is very close to
values for each class (PG and CG) and for each color channel the results of simple voting (FUS2) and weighted voting (FUS3).
(Table 15). The resulting feature vector was composed of 13 dimensions,
It is possible to observe that, in the green channel, the entropy which is the number of methods used in the voting.
can provide a proper distance between classes, but the high vari-
ance makes it impossible to establish a separation surface. We con- 4.7. Comparative analysis
cluded that the entropy do not distinguish the classes individually,
although it can be effective when used in conjunction with other Feature normalization is a key step in dealing with high disper-
descriptors. sion data. In our experiments, We applied z-score normalization
for all data. We compared the results with and without z-score nor-
4.6. Validation of combination methods malization and we found that there is no significant difference be-
tween the normalization (difference of 0:5 percentage points). We
Once we have implemented several individual approaches, we believed this occurred because the data values are defined within a
implemented fusion methods. Our hypothesis is that individual re- certain interval (for instance, in the case of the meta-classifier, all
sults can be improved with a coalition of different and complemen- the values are in the range of [-10, +10]).
tary descriptors. We implemented four ways of combining the No picture was incorrectly classified by all individual methods
individual methods (summarized in Table 16): and correctly classified by a combination approach. Fusion

Table 16
Comparison of the results for the combination methods.

Method of Combination Probability Number of features Accuracy Variance Average accuracy


CG PG CG PG
FUS1 no 4011 0.934 0.922 1.75E01 0.17E01 0.928
FUS2 no 10 0.934 0.960 2.22E04 3.43E04 0.947
FUS3 yes 10 0.949 0.965 2.29E04 5.42E04 0.957
FUS4 yes 10 0.966 0.980 7.70E04 4.42E04 0.973
E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292 1289

Fig. 6. Some images correctly identified by, at least, 9 out of 13 individual methods. CGs and PGs are shown in the first and second rows, respectively.

Table 17

1.0
Comparison among the implemented approaches for distinguishing CGs and PGs. For
each of the seventeen methods, it is shown the number of dimensions of the feature
space, the accuracies for each class, and its average accuracy.

0.8
Index Method m CG PG Average accuracy
1 BOX 3 0.541 0.568 0.554
2 CON 696 0.918 0.887 0.902 False positive rate
3 CUR 2328 0.806 0.805 0.805 0.6
4 GLC 12 0.640 0.630 0.635 BOX
5 HOG 256 0.754 0.720 0.740 CON
CUR
0.4

6 HSC 96 0.818 0.787 0.802 GLC


7 LBP 78 0.904 0.838 0.871 HOG
8 Li 144 0.948 0.911 0.930 HSC
LBP
9 LSB 12 0.672 0.651 0.662 LI
0.2

10 LYU 216 0.942 0.899 0.920 LSB


11 POP 12 0.570 0.575 0.573 LYU
POP
12 SHE 60 0.748 0.677 0.713 SHE
0.0

13 SOB 150 0.554 0.552 0.553 SOB


14 FUS1 4011 0.934 0.922 0.928
0.0 0.2 0.4 0.6 0.8 1.0
15 FUS2 13 0.934 0.960 0.947
16 FUS3 13 0.949 0.965 0.957 True positive rate
17 FUS4 13 0.966 0.980 0.973
Fig. 7. ROC curves for each tested method.

schemes did not take advantage in such cases. This is justified by


1.0

the fact that combination methods use only the results of the indi-
vidual methods and make their rules in agreement with the con-
sensus of the methods. Furthermore, the combination methods
0.8

do not aim at altering the results of images correctly classified by


each method individually.
Fig. 6 depicts images incorrectly classified by at least 9 out of 13
True positive rate
0.6

methods and correctly classified by the method of Meta-


classification.
Table 17 shows, for each tested method, the best results among
0.4

the implementations. Each line represents an approach and each


column represents a description of that approach. From the table,
we see that the accuracies have a large range of values. The highest
0.2

accuracy among the individual methods ( 93%) was obtained by


the method Li and the worst results were obtained by SOB (55%).
0.0

The highest accuracy of the combination methods was about 97%.


Regarding the size of the feature spaces, there is also divergence 0.0 0.2 0.4 0.6 0.8 1.0
among the methods: the lowest was of size m ¼ 3 for BOX and False positive rate
the largest was of size with m ¼ 2328 features for CUR.
Fig. 8. ROC curve for the meta-classifier (AUC = 0:98).
ROC curves, shown in Fig. 7, were constructed as follows: we
defined a grid of values between the lowest and the highest mar-
ginal values. Each grid point was used as cut-off value. For each
point of this lattice, therefore, we considered all points above it 4.7.1. Fake or photo
as the positive class and all points below it as the negative class. The website Fake or Foto [59] challenges every visitor to iden-
Fig. 8 presents the ROC curve for the best result among all imple- tify, among 12 images, which ones are PGs or CGs. We applied each
mented in our work. The area under the curve (AUC) is 0:98. method to isolated images shown in Table 9. The single method
1290 E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292

Fig. 9. Samples of images from the Fake or Foto website. Images with green borders are PGs and with brown borders are CGs. (For interpretation of the references to colour in
this figure caption, the reader is referred to the web version of this article.)

with the highest number of hits was Li with 8 hits out of 12. Con- methods varied in the number of features used (3–2, 328) and in
catenation, simple voting, weighted voting and meta-classification the computational performance. Among the methods evaluated,
methods reached 9, 8, 9 and 10 hits, respectively. The results show the methods of [15,16] showed the best results individually. We
that, in a complex scenario of distinguishing between CGs and PGs, emphasize that the dataset collected for validation o the methods
methods of fusion can improve the overall accuracy of the results will be available for free allowing fair comparisons in the future.
(see Fig. 9). There are a number of approaches addressing the problem PG
versus CG. So far, however, there was little effort to combine the
5. Conclusions and future work existing techniques and assess their complementarity. In this pa-
per, we discussed four methods for combining descriptors. The first
Technology has significantly changed the way we capture, cre- method performed a simple junction of each feature array into a
ate and store images. Current digital cameras allow instant images single set. Simple voting methods perform the extraction and clas-
with the support of tools for acquiring and processing images in- sification steps individually for each method and the class with the
serted into the device itself. Editing and distributing a photo also majority of votes is elected. The weighted voting method uses
became a simple task, due to powerful image editors and the information from the training phase to consider the final choice.
Internet. The fourth method uses meta-classification distances of the SVM
In this context, images are consolidated as an important means scores as characteristics. The best results were obtained with the
of communication. In parallel, advances in computer graphics have meta-classifier (97% of accuracy), with more than four percentage
allowed the imaging in levels of complexity never achieved before. points above the best single method.
Modern techniques and more powerful machines allow the crea- For a real scenario of classification, if the user already has the
tion of complex scenes, visually very close to reality. Images of this image descriptors, then the implementation of combination meth-
nature can confuse a naïve user. The combination of the fact that ods is simple. The joint use of descriptors is usually done by con-
the PGs have their role as digital document and the fact that CGs catenation, however, our results indicated that a simple voting
can have a complexity enough to confuse the viewer can create can generate better results.
problems in a court of law. The proof of the authenticity of an im- Future work opportunities include the extension of the methods
age by a jury can validate a criminal evidence which, in turn, may to local regions of images (blocks), application of more sophisti-
implicate/acquit an individual. cated normalization techniques such as w-score [54] and the inclu-
Most existing studies that seek to distinguish PGs and CGs in sion of others, among many descriptors available. Another
the literature to date do not perform a complete and robust com- approach could be, instead of croppings, the use of resized images,
parison with previous methods. The works often make compari- since this operation is common in the Internet. Counter forensic
sons with other methods only through the reported accuracies. techniques are also targets of study because we can use descriptors
Using the values reported by other methods lead to inconsistent that are more robust to certain types of attacks. The case of images
comparisons. The environments of tests may vary, for instance, obtained at recapture, as discussed in [7], represents a problem of
the set of training data test. The methods take several hypotheses relevance that could be explored in a future work and, in this case,
to the point that, in a real scenario, we expect that the results are the separation of the scene contents and concepts would be
significantly worse [14]. In a context where the objects of study are fundamental.
potential criminal evidence, we need robust methods in every type
of scenario.
This work aimed at creating a common test scenario, imple- Acknowledgements
menting some state-of-the-art methods and, more importantly,
proposing and implementing several new methods from related This work was partially supported by São Paulo Research Foun-
fields, and performing a consistent comparison of them, discussing dation – FAPESP (Grants 2010/05647-4, 2010/13745-6, and 2011/
their complementarity when used in conjunction with data fusion 22749-8), National Counsel of Technological and Scientific Devel-
techniques. Our tests used a common data set, training/testing opment – CNPq (Grants 304352/2012-8 and 307113/2012-4) and
parameters, and same performance conditions. The implemented Microsoft.
E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292 1291

References [32] J. Lukas, J. Fridrich, M. Goljan, Digital camera identification from sensor noise
sensor, IEEE Transactions on Information Forensics and Security (TIFS) 1 (2)
(2006) 205–214.
[1] Nvidia, 2012. <http://www.nvidia.com/>.
[33] B.E. Bayer, Color imaging array, U.S. Patent 3971065, 1976.
[2] Maya, 2012. <http://usa.autodesk.com/maya/>.
[34] A.C. Gallagher, T. Chen, Image authentication by detecting traces of
[3] 3ds Max, 2012. <http://usa.autodesk.com/3ds-max/>.
demosaicing, in: IEEE International Conference on Computer Vision and
[4] H. Farid, M.J. Bravo, Image forensic analyses that elude the human visual
Pattern Recognition (CVPR), USA, 2008, pp. 1–8.
system, in: SPIE Symposium on Electronic Imaging (SEI), CA, 2010, pp. 754106–
[35] F. Peng, J. Liu, M. Long, Identification of natural images and computer
754106–10.
generated graphics based on hybrid features, International Journal of Digital
[5] H. Farid, Creating and detecting doctored and virtual images: implications to
Crime and Forensics (IJDCF) 4 (2013) 1–16.
the child pornography prevention act, Tech. Rep. 2004–518, Dartmouth
[36] S. Fan, T.-T. Ng, J.S. Herberg, B.L. Koenig, S. Xin, Real or fake?: human
College, USA, 2004.
judgments about photographs and computer-generated images of faces, in:
[6] A. Rocha, W. Scheirer, T. Boult, S. Goldenstein, Vision of the unseen: current
SIGGRAPH Asia, ACM, Singapore, 2012, pp. 17:1–17:4.
trends and challenges in digital image and video forensics, ACM Computing
[37] Dang-Nguyen, G. Boato, F.G.B. DeNatale, Discrimination between computer
Surveys (CSUR) 43 (2011) 26:1–26:42.
generated and natural human faces based on asymmetry information, in: 20th
[7] T.-T. Ng, S.-F. Chang, J. Hsu, L. Xie, M.-P. Tsui, Physics-motivated features for
European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 2012,
distinguishing photographic images and computer graphics, in: ACM
pp. 1234–1238.
Multimedia (ACMMM), Singapore, 2005, pp. 239–248.
[38] T. Isenberg, Evaluating and validating non-photorealistic and illustrative
[8] A. da Silva Pinto, H. Pedrini, W.R. Schwartz, A. Rocha, Video-based face
rendering, in: Image and Video-Based Artistic Stylisation, Computational
spoofing detection through visual rhythm analysis, in: 25th Conference on
Imaging and Vision, vol. 42, Springer-Verlag, London, 2013, pp. 311–331.
Graphics, Patterns and Images (SIBGRAPI), Ouro Preto, Brazil, 2012, pp. 221–
[39] L.S. Liebovitch, T. Toth, A fast algorithm to determine fractal dimensions by
228.
box counting, Physics Letters A 141 (1989) 386–390.
[9] W.R. Schwartz, A. Rocha, H. Pedrini, Face spoofing detection through partial
[40] B. Mandelbrot, How long is the coast of Britain? Statistical self-similarity and
least squares and low-level descriptors, in: International Joint Conference on
fractional dimension, Science 156 (3775) (1967) 636–638.
Biometrics (IJCB), 2011, pp. 1–8.
[41] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1999.
[10] M.K. Johnson, H. Farid, Detecting photographic composites of people, in:
[42] M.N. Do, M. Vetterli, Contourlets: a directional multiresolution image
International Workshop on Digital Watermarking (IWDW), China, 2007, pp.
representation, in: IEEE International Conference on Image Processing (ICIP),
19–33.
USA, vol. 1, 2002, pp. I–357–360.
[11] T. Pouli, E. Reinhard, Image statistics and their applications in computer
[43] E. Candes, L. Demanet, D. Donoho, L. Ying, Fast discrete curvelet transforms,
graphics, Tech. rep., Eurographics State of the Art Report (STAR), 2010.
Multiscale Modeling Simulation 5 (3) (2006) 861–899.
[12] E. Dirik, H. Sencar, N. Memon, Source camera identification based on sensor
[44] M. Do, M. Vetterli, The finite Ridgelet transform for image representation, IEEE
dust characteristics, in: IEEE Signal Processing Applications for Public Security
Transactions on Image Processing (TIP) 12 (1) (2003) 16–28.
and Forensics (SAFE), USA, 2007, pp. 1–6.
[45] G. Kutyniok, W.-Q. Lim, Compactly supported shearlets are optimally sparse,
[13] Y. Wang, P. Moulin, On discrimination between photorealistic and
Journal of Approximation Theory (2011) 1564–1589.
photographic images, in: IEEE International Conference on Acoustics, Speech,
[46] W.R. Schwartz, R.D. da Silva, L.S. Davis, H. Pedrini, A novel feature descriptor
and Signal Processing (ICASSP), France, vol. 2, 2006, p. II.
based on the shearlet transform, in: IEEE International Conference on Image
[14] T. Gloe, M. Kirchner, A. Winkler, R. Bohme, Can we trust digital image
Processing (ICIP), Belgium, 2011, pp. 1053–1056.
forensics?, in: ACM Multimedia (ACMMM), Germany, 2007, pp. 78–86.
[47] R.M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image
[15] S. Lyu, H. Farid, How realistic is photorealistic?, IEEE Transactions on Signal
classification, IEEE Transactions on Systems, Man, and Cybernetics (SMC) 3
Processing (TSP) 53 (2) (2005) 845–850
(6) (1973) 610–621.
[16] W. Li, T. Zhang, E. Zheng, X. Ping, Identifying photorealistic computer graphics
[48] D.A. Clausi, An analysis of co-occurrence texture statistics as a function of grey
using second-order difference statistics, in: International Conference on Fuzzy
level quantization, Canadian Journal of Remote Sensing (CJRS) 28 (1) (2002)
Systems and Knowledge Discovery (FSKD), China, vol. 5, 2010, pp. 2316–2319.
45–62.
[17] W. Chen, Y. Shi, G. Xuan, Identifying computer graphics using hsv color model
[49] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in:
and statistical moments of characteristic functions, in: IEEE International
IEEE International Conference on Computer Vision and Pattern Recognition
Conference on Multimedia and Expo (ICME), China, 2007, pp. 1123–1126.
(CVPR), USA, 2005, pp. 886–893.
[18] Y. Rui, T.S. Huang, S.-F. Chang, Image retrieval: current techniques, promising
[50] T. Ojala, M. Pietikäinen, T. Mäenpää, A generalized local binary pattern
directions, and open issues, Journal of Visual Communication and Image
operator for multiresolution gray scale and rotation invariant texture
Representation 10 (1) (1999) 39–62.
classification, in: International Conference on Advances in Pattern
[19] R. Chakravarti, X. Meng, A study of color histogram based image retrieval, in:
Recognition (ICAPR), Brazil, 2001, pp. 399–408.
IEEE International Conference on Information Technology (CIT), USA, 2009, pp.
[51] H. Farid, A 3-D photo forensic analysis of the Lee Harvey Oswald backyard
1323–1328.
photo, Tech. Report TR2010-669, Dartmouth College, USA, 2010.
[20] W. Equitz, W. Niblack, Retrieving images from a database using texture-
[52] T. Mäenpää, M. Pietikäinen, Multi-scale binary patterns for texture analysis, in:
algorithms from the qbic system, Tech. Rep., RJ 9805, IBM Research,1994.
Scandinavian Conference on Image Analysis (SCIA), Sweden, 2003, pp. 885–
[21] G. Elkharraz, S. Thumfart, D. Akay, C. Eitzinger, B. Henson, Texture features
892.
corresponding to human touch feeling, in: IEEE International Conference on
[53] A.C. Popescu, H. Farid, Exposing digital forgeries in color filter array
Image Processing (ICIP), Egypt, 2009, pp. 1341–1344.
interpolated images, IEEE Transactions on Signal Processing (TSP) 53 (10)
[22] S. Loncaric, A survey of shape analysis techniques, Pattern Recognition 31 (8)
(2005) 3948–3959.
(1998) 983–1001.
[54] W. Scheirer, A. Rocha, R. Micheals, T. Boult, Robust fusion: extreme value
[23] A.B. Lee, K.S. Pedersen, D. Mumford, The nonlinear statistics of high-contrast
theory for recognition score normalization, in: European Conference on
patches in natural images, International Journal of Computer Vision (IJCV) 54
Computer Vision (ECCV), USA, vol. 6313, 2010, pp. 481–495.
(2003) 83–103.
[55] T.K. Ho, The random subspace method for constructing decision forests, IEEE
[24] N. Sochen, R. Kimmel, R. Malladi, A general framework for low level vision,
Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 20 (8)
IEEE Transactions on Image Processing (TIP) 7 (3) (1998) 310–318.
(1998) 832–844.
[25] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, second edition.,
[56] E.J. Candes, D.L. Donoho, Curvelets –A Surprisingly Effective Nonadaptive
John Wiley & Sons Inc., 1973.
Representation for Objects with Edges, Vanderbilt University Press, 2000.
[26] R.W. Buccigrossi, E. Simoncelli, Image compression via joint statistical
[57] R. Gonzalez, R. Woods, Digital Image Processing, third ed., Prentice-Hall, 2007.
characterization in the wavelet domain, IEEE Transactions on Image
[58] G.C. de Silva, T. Yamasaki, K. Aizawa, Sketch-based spatial queries for
Processing (TIP) 8 (12) (1999) 1688–1701.
retrieving human locomotion patterns from continuously archived gps data,
[27] P.P. Vaidyanathan, Quadrature mirror filter banks, m-band extensions and
IEEE Transactions on Multimedia (TMM) 11 (7) (2009) 1240–1253.
perfect-reconstruction techniques, in: IEEE International Conference on
[59] Fake or Foto, 2012. <http://www.fakeorfoto.com/>.
Acoustics, Speech, and Signal Processing (ICASSP), USA, vol. 4, 1987, pp. 4–20.
[60] Curvelab, 2012. <http://www.curvelet.org>.
[28] T.-T. Ng, S.-F. Chang, Identifying and prefiltering images – distinguishing
[61] C. toolbox, 2012. <http://www.ifp.illinois.edu/minhdo/software/>.
between natural photography and photorealistic computer graphics, IEEE
[62] S.-M. Phoong, C. Kim, P. Vaidyanathan, R. Ansari, A new class of two-channel
Signal Processing Magazine (SPM) 26 (2) (2009) 49–58.
biorthogonal filter banks and wavelet bases, IEEE Transactions on Signal
[29] A. Rocha, S. Goldenstein, Progressive randomization process and equipment
Processing (TSP) 43 (3) (1995) 649–665.
for multimedia analysis and reasoning, Patent PCT/BR2007/000156, World
[63] Shearlab, 2012. <http://www.shearlab.org/indexsoftware.html>.
Intellectual Property Org. (WIPO), 2008.
[64] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation
[30] A. Rocha, S. Goldenstein, Progressive randomization: seeing the unseen,
invariant texture classification with local binary patterns, in: IEEE
Elsevier Computer Vison and Image Understanding (CVIU) 114 (3) (2010) 349–
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 24,
362.
2002, pp. 971–987.
[31] S. Dehnie, T. Sencar, N. Memon, Identification of computer generated and
[65] O. Ludwig, D. Delgado, V. Goncalves, U. Nunes, Trainable classifier-fusion
digital camera images for digital image forensics, in: IEEE International
schemes: an application to pedestrian detection, in: IEEE International
Conference on Image Processing (ICIP), USA, 2006, pp. 2313–2316.
Conference on Intelligent Transportation Systems (ITSC), 2009, pp. 1–6.
1292 E. Tokuda et al. / J. Vis. Commun. Image R. 24 (2013) 1276–1292

[66] A. Rocha, J.P. Papa, L.A.A. Meira, How far do we get using machine learning [70] Matlab, 2012. <http://www.mathworks.com/>.
black-boxes?, International Journal of Pattern Recognition and Artificial [71] R, 2012. <http://www.r-project.org/>.
Intelligence (IJPRAI) 26 (2) (2012) 1261001-1–1261001-23 [72] C. Chang, C. Lin, LIBSVM: a library for support vector machines, ACM
[67] R.J. Larsen, M.L. Marx, An Introduction to Mathematical Statistics and Its Transactions on Intelligent Systems and Technology (TIST) 2 (2011) 1–27.
Applications, third ed., Pearson, 2000. [73] T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters 27 (8)
[68] A. Dirik, S. Bayram, H. Sencar, N. Memon, New features to identify computer (2006) 861–874.
generated images, in: IEEE International Conference on Image Processing [74] J. Ma, G. Plonka, The curvelet transform, a review of recent applications, IEEE
(ICIP), USA, vol. 4, 2007, pp. 433–436. Signal Processing Magazine (SPM) 27 (2) (2010) 118–133.
[69] C.M. Bishop, Pattern Recognition and Machine Learning, first ed., Springer,
2006.

You might also like