Professional Documents
Culture Documents
1.1 History:
Many of the techniques of digital image processing, or digital picture processing as it was often called, were developed in the 1960s at the Jet Propulsion Laboratory, MIT, Bell Labs, University of Maryland, and few other places, with application to satellite imagery, wire photo standards conversion, medical imaging, videophone, character recognition, and photo enhancement. But the cost of processing was fairly high with the computing equipment of that era. In the 1970s, digital image processing proliferated, when cheaper computers Creating a film or electronic image of any picture or paper form. It is accomplished by scanning or photographing an object and turning it into a matrix of dots (bitmap), the meaning of which is unknown to the computer, only to the human
viewer. Scanned images of text may be encoded into computer data (ASCII or EBCDIC) with page recognition software (OCR).
A signal is a function depending on some variable with physical meaning. Signals can be
o o o o
One-dimensional (e.g., dependent on time), Two-dimensional (e.g., images dependent on two co-ordinates in a plane), Three-dimensional (e.g., describing an object in space), Or higher dimensional.
The image can be modeled by a continuous function of two or three variables; Arguments are co-ordinates x, y in a plane, while if images change in time a third variable t might be added.
The image function values correspond to the brightness at image points. The function value can express other physical quantities as well (temperature, pressure distribution, distance from the observer, etc.).
The brightness integrates different optical quantities - using brightness as a basic quantity allows us to avoid the description of the very complicated process of image formation.
The image on the human eye retina or on a TV camera sensor is intrinsically 2D. We shall call such a 2D image bearing information about brightness points an intensity image.
The real world, which surrounds us, is intrinsically 3D. The 2D intensity image is the result of a perspective projection of the 3D scene. When 3D objects are mapped into the camera plane by perspective projection a lot of information disappears as such a transformation is not one-to-one.
Recognizing or reconstructing objects in a 3D scene from one image is an ill-posed problem. Recovering information lost by perspective projection is only one, mainly geometric, problem of computer vision.
The second problem is how to understand image brightness. The only information available in an intensity image is brightness of the appropriate pixel, which is dependent on a number of independent factors such as
o
Object surface reflectance properties (given by the surface material, microstructure and marking),
o O
Illumination properties, And object surface orientation with respect to a viewer and light source.
directly facing the light, for example, will be brighter than a surface that is turned away from the light. Once expressed in this form, standard techniques can be used to determine the direction to the light source for any object or person in an image. Any inconsistencies in lighting can then be used as evidence of tampering. Duplication or cloning is a simple and powerful form of manipulation used to remove objects or people from an image. This form of tampering can be detected by first partitioning an image into small blocks. The blocks are then re-ordered so that they are placed a distance to each other that is proportional to the differences in their pixel colors. With identical and highly similar blocks neighboring each other in the re-ordered sequence, a region growing algorithm combines any significant number of neighboring blocks that are consistent with the cloning of an image region. Since it is statistically unlikely to find identical and spatially coherent regions in an image, their presence can then be used as evidence of tampering.
Both steganography and digital watermarking employ steganographic techniques to embed data covertly in noisy signals. But whereas steganography aims for imperceptibility to human senses, digital watermarking tries to control the robustness as top priority. Since a digital copy of data is the same as the original, digital watermarking is a passive protection tool. It just marks data, but does not degrade it nor controls access to the data. One application of digital watermarking is source tracking. A watermark is embedded into a digital signal at each point of distribution. If a copy of the work is found later, then the watermark may be retrieved from the copy and the source of the distribution is known. This technique reportedly has been used to detect the source of illegally copied movies. Digital watermarking is the process of inserting a digital signal or pattern (indicative of the owner of the content) into digital content. The signal, known as a watermark, can be used later to identify the owner of the work, to authenticate the content, and to trace illegal copies of the work. Watermarks of varying degrees of obtrusiveness are added to presentation media as a guarantee of authenticity, quality, ownership, and source. To be effective in its purpose, a watermark should adhere to a few requirements. In particular, it should be robust, and transparent. Robustness requires that it be able to survive any alterations or distortions that the watermarked content may undergo, including intentional attacks to remove the watermark, and common signal processing alterations used to make the data more efficient to store and transmit. This is so that afterwards, the owner can still be identified. Transparency requires a watermark to be imperceptible so that it does not affect the quality of the content, and makes detection, and therefore removal, by pirates less possible. The media of focus in this paper is the still image. There are a variety of image watermarking techniques, falling into 2 main categories, depending on in which domain the watermark is constructed: the spatial domain (producing spatial watermarks) and the frequency domain (producing spectral watermarks). The effectiveness of a watermark is improved when the technique exploits known properties of the human visual system. These are known as perceptually based watermarking techniques. Within this category, the class of image-adaptive watermarks proves most effective. .2.1.1 Principle of digital watermarks A watermark on a bank note has a different transparency than the rest of the note when a light is shined on it. However, this method is useless in the digital world.
Currently there are various techniques for embedding digital watermarks. Basically, they all digitally write desired information directly onto images or audio data in such a manner that the images or audio data are not damaged. Embedding a watermark should not result in a significant increase or reduction in the original data. Digital watermarks are added to images or audio data in such a way that they are invisible or inaudible unidentifiable by human eye or ear. Furthermore, they can be embedded in content with a variety of file formats. Digital watermarking is the content protection method for the multimedia era.
identifying information into their work. That is, watermarks are used in the protection of ownership. The presence of a watermark in a work suspected of having been copied can prove that it has been copied. By indicating the owner of the work, they demonstrate the quality and assure the authenticity of the work. With a tracking service, owners are able to find illegal copies of their work on the Internet. In addition, because each purchaser of the data has a unique watermark embedded in his/her copy, any unauthorized copies that s/he has distributed can be traced back to him/her. Watermarks can be used to identify any changes that have been made to the watermarked data. Some more recent techniques are able to correct the alteration as well.
Digitally signed messages may be anything representable as a bitstring: examples include electronic mail, contracts, or a message sent via some other cryptographic protocol. A digital signature scheme typically consists of three algorithms:
A key generation algorithm that selects a private key uniformly at random from a set of possible private keys. The algorithm outputs the private key and a corresponding public key.
A signing algorithm that, given a message and a private key, produces a signature.
A signature verifying algorithm that, given a message, public key and a signature, either accepts or rejects the message's claim to authenticity.
Two main properties are required. First, a signature generated from a fixed message and fixed private key should verify the authenticity of that message by using the corresponding public key. Secondly, it should be computationally infeasible to generate a valid signature for a party who does not possess the private key.
2.2.1.1 Authentication
Although messages may often include information about the entity sending a message, that information may not be accurate. Digital signatures can be used to authenticate the source of messages. When ownership of a digital signature secret key is bound to a specific user, a valid signature shows that the message was sent by that user. The importance of high confidence in sender authenticity is especially obvious in a financial context. For example, suppose a bank's branch office
sends instructions to the central office requesting a change in the balance of an account. If the central office is not convinced that such a message is truly sent from an authorized source, acting on such a request could be a grave mistake.
2.2.1.2 Integrity
In many scenarios, the sender and receiver of a message may have a need for confidence that the message has not been altered during transmission. Although encryption hides the contents of a message, it may be possible to change an encrypted message without understanding it. (Some encryption algorithms, known as nonmalleable ones, prevent this, but others do not.) However, if a message is digitally signed, any change in the message after signature will invalidate the signature. Furthermore, there is no efficient way to modify a message and its signature to produce a new message with a valid signature, because this is still considered to be computationally infeasible by most cryptographic hash functions (see collision resistance).
2.2.1.3 Non-repudiation
Non-repudiation, or more specifically non-repudiation of origin, is an important aspect of digital signatures. By this property, an entity that has signed some information cannot at a later time deny having signed it. Similarly, access to the public key only does not enable a fraudulent party to fake a valid signature. The device signature may be in the form of sensor pattern noise (SPN) camera response function Re sampling artifacts Color filter array Interpolation artifacts JPEG compression Lens aberration
sensor dust
The Bayer color filter mosaic. Each two-by-two submosaic contains 2 green, 1 blue and 1 red filter, each covering one pixel sensor. In photography, a color filter array (CFA), or color filter mosaic (CFM), is a mosaic of tiny color filters placed over the pixel sensors of an image sensor to capture color information. Color filters are needed because the typical photosensors detect light intensity with little or no wavelength specificity, and therefore cannot separate color information. Since sensors are made of semiconductors they obey solid-state physics. The color filters filter the light by wavelength range, such that the separate filtered intensities include information about the color of light. For example, the Bayer filter (shown to the right) gives information about the intensity of light in red, green, and blue (RGB) wavelength regions. The raw image data captured by the image sensor is then converted to a full-color image (with intensities of all three primary colors represented at each pixel) by a demosaicing algorithm which is tailored for each type of color filter. The spectral transmittance of the CFA elements along with the demosaicing algorithm jointly determine the color rendition. The sensor's passbandquantum efficiency and span of the CFA's spectral responses are typically wider than the visible spectrum, thus all visible colors can be distinguished. The responses of the filters do not generally correspond to the CIEcolor matching functions, so a color translation is required to convert the tristimulus values into a common, absolute color space.
The Foveon X3 sensor uses a different structure such that a pixel utilizes properties of multijunctions to stack blue, green, and red sensors on top of each other. This arrangement does not require a demosaicing algorithm because each pixel has information about each color. Dick Merrill of Foveon distinguishes the approaches as "vertical color filter" for the Foveon X3 versus "lateral color filter" for the CFA.
Image Name
Description Very common RGB filter. With one blue, one red, and two green. Bayer-like with one of the green filters modified to "emerald"; used in a few Sony cameras. One cyan, two yellow, and one magenta; used in a few cameras of Kodak. One cyan, one yellow, one green, and one magenta; used in a few cameras. Traditional RGBW similar to Bayer and RGBE patterns.
Bayer filter RGBE filter CYYM filter CYGM filter RGBW Bayer RGBW #1
22
22
22
22
RGBW #2 RGBW #3
Three example RGBW filters from Kodak, with 50% white. (See Bayer filter#Alternatives)
44
24
Diazonaphthoquinone (DNQ)-novolacphotoresist is one material used as the carrier for making color filters from color dyes. There is some interference between the dyes and the ultraviolet light needed to properly expose the polymer, though solutions have been found for this problem. Color photoresists sometimes used include those with chemical monikers CMCR101R, CMCR101G, CMCR101B, CMCR106R, CMCR106G, and CMCR106B. A few sources discuss other specific chemical substances, attending optical properties, and optimal manufacturing processes of color filter arrays. For instance, Nakamura said that materials for on-chip color filter arrays fall into two categories: pigment and dye. Pigment based CFAs have become the dominant option because they offer higher heat resistance and light resistance compared to dye based CFAs. In either case, thicknesses ranging up to 1 micrometre are readily available. Theuwissen says "Previously, the color filter was fabricated on a separate glass plate and glued to the CCD (Ishikawa 1981), but nowadays, all single-chip color cameras are provided with an imager which has a color filter on-chip processed (Dillon, 1978) and not as a hybrid." He provides a bibliography focusing on the number, types, aliasing effects, moire patterns, and spatial frequencies of the absorptive filters. Some sources indicate that the CFA can be manufactured separately and affixed after the sensor has been manufactured, while other sensors have the CFA manufactured directly on the surface of the imager. Theuwissen makes no mention of the materials utilized in CFA manufacture. At least one early example of an on-chip design utilized gelatin filters (Aoki et al., 1982).[15] The gelatin is sectionalized, via photolithography, and subsequently dyed. Aoki reveals that a CYWG arrangement was used, with the G filter being an overlap of the Y and C filters. Filter materials are manufacturer specific. Adams et al. state "Several factors influence the CFA's design. First, the individual CFA filters are usually layers of transmissive (absorptive) organic or pigment dyes. Ensuring that the dyes have the right mechanical propertiessuch as ease of application, durability, and resistance to humidity and other atmospheric stressesis a challenging task. This makes it difficult, at best, to fine-tune the spectral responsivities.".
Given that the CFAs are deposited on the image sensor surface at the BEOL (back end of line, the later stages of the integrated circuit manufacturing line), where a low-temperature regime must be rigidly observed (due to the low melting temperature of the aluminum metalized "wires" and the substrate mobility of the dopants implanted within the bulk silicon), organics would be preferred over glass. On the other hand, some CVD silicon oxide processes are low temperature processes. Ocean Optics has indicated that their patented dichroic filter CFA process (alternating thin films of ZnS and Cryolite) can be applied to spectroscopic CCDs. Gersteltec sells photoresists that possesses color filter properties.
Noise clearly visible in an image from a digital camera The original meaning of "noise" was and remains "unwanted signal"; unwanted electrical fluctuations in signals received by AM radios caused audible acoustic noise ("static"). By analogy unwanted electrical fluctuations themselves came to be known as "noise." Image noise is, of course, inaudible. The magnitude of image noise can range from almost imperceptible specks on a digital photograph taken in good light, to optical and radioastronomical images that are almost entirely noise, from which a small amount of information can be derived by sophisticated processing (a noise level that would be totally unacceptable in a photograph since it would be impossible to determine even what the subject was).
3.4 Types
o o o o o o o
Amplifier noise (Gaussian noise) Salt-and-pepper noise Shot noise Dark current noise Quantization noise (uniform noise) Read noise Anisotropic noise
Image with salt and pepper noise Fat-tail distributed or "impulsive" noise is sometimes called salt-and-pepper noise or spike noise. An image containing salt-and-pepper noise will have dark pixels in bright regions and bright pixels in dark regions. This type of noise can be caused by analog-to-digital converter errors, bit errors in transmission, etc. It can be mostly eliminated by using dark frame subtraction and interpolating around dark/bright pixels. Dead pixels in an LCD monitor produce a similar, but non-random, display.
Image on the left has exposure time of >10 seconds in low light. The image on the right has adequate lighting and 0.1 second exposure.
In low light, correct exposure requires the use of long shutter speeds, higher gain (ISO sensitivity),
or both. On most cameras, longer shutter speeds lead to increased salt-and-pepper noise due to photodiodeleakage currents. At the cost of a doubling of read noise variance (41% increase in read noise standard deviation), this salt-and-pepper noise can be mostly eliminated by dark frame subtraction. Banding noise, similar to shadow noise, can be introduced through brightening shadows or through color-balance processing. The relative effect of both read noise and shot noise increase as the exposure is reduced, corresponding to increased ISO sensitivity, since fewer photons are counted (shot noise) and since more amplification of the signal is necessary.
For instance, the noise level produced by a Four Thirds sensor at ISO 800 is roughly equivalent to that produced by a full frame sensor (with roughly four times the area) at ISO 3200, and that produced by a 1/2.5" compact camera sensor (with roughly 1/16 the area) at ISO 100. This ability to produce acceptable images at higher sensitivities is a major factor driving the adoption of DSLR cameras, which tend to use larger sensors than compacts. An example shows a DSLR sensor at ISO 400 creating less noise than a point-and-shoot sensor at ISO 100.
luminance noise less objectionable to the eye, since its textured appearance mimics the appearance of film grain. The high sensitivity image quality of a given camera (or RAW development workflow) may depend greatly on the quality of the algorithm used for noise reduction. Since noise levels increase as ISO sensitivity is increased, most camera manufacturers increase the noise reduction aggressiveness automatically at higher sensitivities. This leads to a breakdown of image quality at higher sensitivities in two ways: noise levels increase and fine detail is smoothed out by the more aggressive noise reduction. In cases of extreme noise, such as astronomical images of very distant objects, it is not so much a matter of noise reduction as of extracting a little information buried in a lot of noise; techniques are different, seeking small regularities in massively random data.
One of the few engineering definitions for PRNU or "photoresponsenonuniformity" is in the photonics dictionary. And it is for CCD only.
Subtract the DSNU image from this average image to eliminate the contribution from the DSNU. Obtain the spatial variance of the pixel values over the entire CCD
Divide the spatial variance by the average image from (ii) to obtain the PRNU as a percentage of the actual pixel values. Repeat the calculations for the different exposure times to compare the PRNU.
We expect the PRNU to increase with increasing illumination, since increasing the illumination level will enhance the difference in the photo-response of the pixels across the image and lead to a higher PRNU. In our measurements, since the maximum value of the Opt linear device is around 4 candelas, and increasing the illumination level increases the non-uniformity of the illumination
produced by the Opt linear, we chose to increase the exposure times to mimic the effect of increasing illumination levels. The dominating component of sensor pattern noise is photo response non-uniformity (PRNU). However, the PRNU can be contaminated by various types of noise introduced at different stages of the image acquisition process. Figure 1 demonstrates the image acquisition process. A colour photo is represented in three colour components (i.e., R, G, and B). For most digital cameras, during the image acquisition process, the lenses let through the rays of the three colour components of the scene, but for every pixel only therays of one colour component is passed through the CFA and subsequently converted into electronic signals by the sensor. This colour filtering is determined by the CFA. After the conversion, a colour interpolation function generates the electronic signals of the other two colour components for every pixel according to the colour intensities of the neighboring pixels. This colour interpolation process is commonly known as demosaicking. The signals then undergo additional signal processing such as white balance, gamma correction and image enhancement. Finally, these signals are stored in the cameras memory in a customized format, primarily the JPEG format. In acquiring an image, the signal will inevitably be distorted when passing through each process and these distortions result in slight differences between the scene and the camera-captured image. As formulated in [11], a camera output model can be expressed as where I is the output image, and is the input signal of the scene, g is the colour channel gain, (= 0.455) is the gamma correction factor, K is the zero-mean multiplicative factor responsible for the PRNU, and , stand for dark current, shot noise, read-out noise and quantization (lossy Compression) noise, respectively. In Eq. (1),s andr are random noise and is the fixed pattern noise (FPN) that is associated with every camera and can be removed by subtracting a dark frame from the image taken by the same camera. Since is the dominating term in Eq. (1), after applying Taylor expansion to Eq. (1) and keeping the first two terms of the expansion
where
is the noise residual obtained by applying a denoising filter on image I. Although various denoising filters can be used, the wavelet-based denoising process (i.e., the discrete wavelet transform followed by a Wiener filtering operation), has been reported as effective in producing good results.
where S is the number of images involved in the calculation, is the gamma correction factor ,is the s-th image taken by device d and Note the multiplication operation in Eq. (5) is element-wise. 2) Secondly, the noise residual WI of image I under investigation is extracted using Eq. (5) and compared against the reference PRNU Kd of each device d available to the investigator in the hope that it will match one of the reference fingerprints, thus identifying the source device that has taken the image under investigation. The normalised cross-correlation is the noise residual extracted from .
operation in Eq. (6) is element-wise. Given the PRNU-based approaches potential in resolving device identification problem to the accuracy at individual device level, it is important that the PRNU extracted is as close to the genuine pattern noise due to the sensor as possible. Since for most cameras, only one of the three colours of each pixel is physically captured by the sensor while the other two are artificially interpolated by the demosaicking process, this inevitably introduce noise with power stronger than that of the genuine
PRNU. We can see from Eq. (2), (3) and (4) that the accuracy of both PRNU K and noise residual W depends on the denoising operation applied to I in obtaining the most common method of obtaining . However, as mentioned earlier that I is to apply the discrete wavelet transform followed by a
Wiener filtering operation directly to the entire image I without differentiating physical components from artificial components and, as a result, allowing the interpolation noise in the artificial components to contaminate the real PRNU in the physical components. Addressing this shortcoming is the motivation of this work. In this work, we will look at the impact of demosaicking on PRNU fidelity in Section II and propose an improved formula for extracting PRNU in Section III. In Section IV, we present some experiments on device identification and image content integrity verification to validate the proposed PRNU extractionformula. Section V concludes this work. Because the PRNU is formulated in Eq. (3) and (5) as a function of the noise residual W (i.e., Eq. (4)), in the rest of the work we will use the two terms, PRNU and noise residual, interchangeably whenever there is no need to differentiate them.
4.3 DEMOSAICING
A demosaicing (also de-mosaicing or demosaicking) algorithm is a digital image process used to reconstruct a full color image from the incomplete color samples output from an image sensor overlaid with a color filter array (CFA). It is also known as CFA interpolation or color reconstruction. Most modern digital cameras acquire images using a single image sensor overlaid with a CFA, so demosaicing is part of the processing pipeline required to render these images into a viewable format. Many modern digital cameras can save images in a raw format allowing the user to demosaic it using software, rather than using the camera's built-in firmware. The aim of a demosaicing algorithm is to reconstruct a full color image (i.e. a full set of color triples) from the spatially under sampled color channels output from the CFA. The algorithm should have the following traits:
Avoidance of the introduction of false color artifacts, such as chromatic aliases, zippering (abrupt unnatural changes of intensity over a number of neighboring pixels) and purple fringing
Low computational complexity for fast processing or efficient incamera hardware implementation
To reconstruct a full color image from the data collected by the color filtering array, a form of interpolation is needed to fill in the blanks. The mathematics here is subject to individual implementation, and is called demosaicing.
prevent the artificial components from contaminating the reliable PRNU residing in the physical components with the interpolation noise.
CHAPTER 5 CD-PRNU (Color Decoupled Photo Response Non-Uniformity) 5.1 FORMULATION OF COLOUR DECOUPLED PRNU (CD-PRNU)
In this section, we will discuss the formulation and extraction of CD-PRNU. First, a mathematical model for the CD-PRNUis derived and then an extraction algorithm is proposed to extract the noise residual that is to be used for estimating the final CD-PRNU, without prior knowledge about the CFA.
sensor and this colour is determined by the colour configuration of the CFA pattern F. The other two colour components are to be determined by the demosaicking process. For each colour component of a pixel , can be determined according to
The first part of Eq. (7) means that if the colour component c is the same as the colour that the CFA pattern F allows to pass, i.e , then no demosaicking is needed because c has
been physically captured by the sensor. Otherwise, the second part of Eq. (7) is artificially applied to calculate the colour. According to Eq. (7), the image output model of Eq. (1) proposed in can be reformulated as
Eq. (9) suggests that in the artificial components, the PRNU is actually the interpolation noise P while, in the physicalcomponents, the PRNU remains unaffected by the interpolation noise. It can also be seen from Eq. (9) that the physical components and artificial components have similar mathematical expression. Hence if the physical and artificial colour components can be separated / decoupled, P can be extracted in the same way as the sensor pattern noise K is extracted (i.e., Eq. (3)). That is
where
and
is the
corresponding sensor pattern noise, which is actually the interpolation noise. We can also use the same ML estimate as in Eq. (5) to extract the reference interpolation noise d from S low-variation images taken by d such that for a particular device
where
is the artificial colour components of the s-th low-contrast image taken by device d and . We will discuss how the physical and artificial
colour components can be decoupled in simple manner without a priori knowledge about the CFA pattern in Section III.B.
According to Eq. (10) and (11), we can extract the sensor pattern noise and interpolation noise, respectively, from the physical and artificial components if the CFA is known. However, manufacturers usually do not provide information about the CFA used by their cameras. Therefore, several methods have been proposed to estimate the CFA. Unfortunately, these methods have to exhaust all of the possible CFA patterns in order to infer/estimate the real/optimal CFA. However, exhaustive search is by no means acceptable. In this work, to extract the CD-PRNU, we first separate the three colour channels of a colour image I of pixels. Most CFA
patterns are of 2 2 elements and are periodically mapped to the sensors. We know that, for each pixel of I, only one of the three colour components is physical and the other two are artificial, so the second step is, for each channel , we perform a 2:1 down-sampling across both horizontal and , such that
know (actually we do not have to know) which pixels carry the colour captured physically by the hardware and which are not. But by decomposing into four sub-images, , we know that each
of the four sub-images either contains only the physical colour or only the artificial colours. By decoupling the physical and artificial colour components in this fashion before extracting the noise residual, we can prevent the artificial components from contaminating the physical components during the DWT process. Eq. (4) is then used to obtain noise residual from each sub-images
. Finally the CD-PRNU Wc of each colour channel c is formed by combining the four sub-noise residuals such that
where,
decoupled noise residual extraction process is shown in Figure 2 and the procedures are listed in Algorithm 1. Note that Algorithm 1 is for extracting the noise residual pattern W from an image I. To estimate the CD-PRNU Pd of a particular device d and use it as the reference signature of d, Eq. (11) is applied.
of the PRNUs extracted from 30 photos of blue sky according to Eq. (11). For device identification purpose, we need clean PRNUs (which appear as high frequency bands of images) as device
fingerprints for comparison against the PRNU extracted from individual images under investigation. The reason blue-sky images are chosen in this work is because blue sky contains less scene details (high frequency signal), thus giving better chance of extracting clean PRNU. Actually, other images with low-variation scenes (i.e., scenes without significant details) can be used instead. Taking the average of the PRNUs from 30 blue sky images is to further reduce variation. Our empirical experience suggests that an average of 20 blue sky images is accurate enough. Source camera identification requires similarity comparisons among PRNUs (CD-PRNUs) and therefore the feasibility of the chosen similarity metrics is important. Fridrich suggested the use of the Peak to Correlation Energy (PCE) measure in [15], which has been proved to be a more stable detection statistics than normalised cross-correlation when applied to the scenarios in which the images of interest may have undergone geometrical manipulations, such as rotation or scaling. The purpose of this experiment is to demonstrate the capability of the proposed CD-PRNU in dealing with the colour interpolation noise, so geometrical transformations will not be applied in order to prevent biased evaluation from happening. Therefore, in the following experiments, normalised cross-correlation formulated as in Eq. (6) will be used to measure the similarity between PRNUs (CD-PRNUs). In practice, the normalised cross-correlation has to be greater than a specified threshold for a camera to be identified as the source camera. However, in this experiment, the key point is about demonstrating the different performance of the traditional PRNU and the proposed CD-PRNU. Therefore, a camera is identified as the source camera, if out of the six reference PRNUs (or CDPRNUs), its reference PRNU (or CD-PRNU) is most similar to the PRNU (or CD-PRNU), WI, of the image I under investigation. Because PRNU is often used in content integrity verification, where smaller image blocks have to be analysed, we also compare the performance of the proposed CD-PRNU against that of the traditional PRNU [11] when they are applied to blocks of 5 different sizes cropped from the centre of the fullsized PRNU (CD-PRNU). Table 2 lists the identification rates. Individually speaking, C1, C3, C4, C5 and C6 perform significantly better when CD-PRNU is used in all cases, except for a few cases when images are of full size (1536 2048 pixels) and the identification rates are close or equal to 100% (1.0000). For C2, PRNU performs equally well as CD-PRNU when the image size is 192 256 pixels and slightly outperforms CD-PRNU when the block size is 48 64 pixels. We suspect that the reason C2 does not perform as expected is because the CFA pattern is not a 2 2 square array as we have assumed. Another reason is that, because the smaller the images, the less data is available, therefore identification results become less reliable. Generally speaking,
when the statistics of the six cameras are pooled together, as listed in the Total column of Table 2, we can see that CD-PRNU still outperforms PRNU significantly. This has been graphically presented in Figure 3(a).
Figure 3. Performance comparison of source camera identification a) Overall identification rates when CD-PRNU and PRNU are used as fingerprint In Figure 3(b), a ROC curve of the performance of PRNU and CD-PRNU are demonstrated. We can see that the CD-PRNU outperforms the PRNU because at all fixed False Positive rate the CDPRNUs True Positive rate are always higher than that of the PRNU.
Figure 3. Performance comparison of source camera identification b) Overall ROC curve when CDPRNU and PRNU are used as fingerprint For a system with a Pentium Core II 1.3G CPU and 3 GB RAM, it takes 0.526 seconds to compute the similarity between the PRNUs of two images of 2048 1536 pixels and 0.567 seconds to calculate the similarity between a pair of CD-PRNUs of the same size. The amount of data processed during the extraction of PRNU and CD-PRNU is the same. Although extracting CD-PRNU requires down-sampling and up-sampling, these two operations are trivial and only incur negligible increase of time complexity.
Table 2. Source camera identification rates using traditional PRNU and proposed CD-PRNU.
Figure 4. The original image, source image and forged images for the content verification experiments. (a) Original Image I.1 (b) Original Image I.2 (c) Forged Image I.3 In the second experiment, we cropped an 80 100-pixel area from Image II.1 in Figure 5(a), which covers the face of the person, pasted it at the area where the face of another person is in Image II.2 in Figure 5(b) to create the forged Image II.3 in Figure 5(c). The images in Figure 5(a) and (b) are also taken by the same camera.
Figure 5. The original image, source image and forged images for the content verification experiments. (a) Original Image II.1 (b) Original Image II.2 (c) Forged Image II.3 In the third experiment, we cropped a 60 80-pixel area from Image III.1 in Figure 6(a) taken by Canon Power Shot A400, which covers the face of the person, pasted it at the area where the face of another person is in Image III.2 in Figure 6(b), which is taken by Olympus C730, to create the forged Image III.3 in Figure 6(c).
Figure 6. The original image, source image and forged images for the content verification experiments. (a) Original Image III.1 (b) Original Image III.2 (c) Forged Image III.3 To detect the manipulated areas, we slid a 128 128-pixel window across the PRNU extracted from the image under investigation and another window of the same size across the reference PRNU of the cameras that have taken images I.2, II.2 and III.2. In Chens method [11], the windows are
moved a pixel at a time, which incurs a high computational load. Moreover, this method is not accurate at the pixel level [11]. Therefore, in our experiment, the sliding step/displacement is set to 5 pixels in order to reduce the computational load without sacrificing the accuracy of the integrity verification. Table 3 lists the number of manipulated and non-manipulated blocks of 5 5 pixels in the forged images.
Table 3. Number of manipulated and non-manipulated areas in each image (unit: block).
To decide whether a block centered at the window superposed on the image has been manipulated or not, the cross-correlation of the PRNU patterns inside the two windows at the same location was calculated according to Eq. (6). If the cross-correlation is lower than a predetermined threshold t, the block in the centre of the window is deemed as manipulated. As discussed in [11], the cross-follows the Generalized Gaussian (GG) distribution, therefore, we use various thresholds defined as to analyze the performance of PRNU and CD-PRNU, where and are the mean and standard deviation of the correlations distribution, respectively, and T(t) is the threshold. By varying the value of t, we can evaluate the integrity verification performance across a wide range of correlation thresholds T(t). In the following experiments we will allow t to vary independently in the range from 0.0 to 3.0 and use the four metrics, true positive (TP), false positive (FP), true negative (TN) and false negative (FN) to measure the performance of integrity verifications based on PRNU and CD-PRNU. As t grows, we will obtain lower TP and FP, while higher TN and FN. Let B be an arbitrary block and M(B) and Md(B) be defined as
TP, FP, TN and FN are defined as TP = |{B | M(B) = 1 and Md(B) = 1}|, TN = |{B | M(B) = 0 and Md(B) = 0}|, FP = |{B | M(B) = 0 and Md(B) = 1}| and FN = |{B | M(B) = 1 and Md(B) = 0}|. Higher TP and TN, and lower FP and FN indicate better performance.
According to Chens predication, the block dimensions impose a lower bound on the size of tampered regions that our algorithm can identify. Thus, we remove all simply connected tampered regions from Z that contain less than 6464 pixels (one quarter of the number of pixels in the block). Chen applies erosion and dilation operations with a square kernel in order to filter small areas identified as tampered with. The final authentication result is a image with the dilated areas highlighted as the tampered areas. However, the performance of the filtering / dilation operation strongly depends on parameter setting and hence many experiments must be run to obtain the best parameters for filtering. In order to simplify the comparison and to obtain a fair result, we use the raw data without any filtering to calculate the TP, TN, FP and FN. As a result, the experiments on III.3 demonstrate that CD-PRNU-based method significantly outperforms the PRNU-based method when the tampered area is about one quarter of the sliding window.
an algorithm with better performance will have a higher true positive rate (), which is marked vertically. The ROC curves for the integrity verification experiments on image I.3 is illustrated as Figure 8. It is clear that the ROC curve of the PRNU-based scheme mostly overlaps with that of Random Guess, which means the authentication result is generally as unreliable as that of a random guess. This is because the area we copied from the source image I.1 is at approximately the same location as the original area in image I.2; therefore the PRNU pattern noises in the two areas are almost the same. As a result, the scheme cannot detect the manipulated area based on PRNU. By
contrast, the CD-PRNU-based scheme results in a curve much higher than the PRNU-based method, which means that by using CD-PRNU manipulated blocks can be detected more reliably.
Figure 7. Authentication results on image I.3: Integrity verification performance of the PRNU and CD-PRNU in terms of a) TP, b) TN, across a range of correlation threshold T(t), with t varying from 0.0 to 3.0.
Figure 8. The ROC curve of Truth Positive Rate with respect to False Positive Rate of PRNU and CD-PRNU when authentication is performed on image I.3.
When verifying the integrity of image II.3, CD-PRNUs consistently higher TP and lower FN, as shown in Figure 9(a) and 9(d), again indicate its superiority to PRNU. However, mixed performance in terms of TN and FP can be seen in Figure 9(b) and 9(c). Albeit their mixed performance in terms of TN and FP, both PRNU and CD-PRNU can effectively detect the manipulated blocks as their ROC curves have suggested in Figure 10. Figure 10 also shows that the ROC curve of CD-PRNU is still slightly higher than that of PRNU, indicating a slightly better performance of CD-PRNU.
Figure 9. Authentication results on image II.3: Integrity verification performance of the PRNU and CD-PRNU in terms of a) TP, b) TN, c) FP and d) FN across a range of correlation threshold T(t), with t varying from 0.0 to 3.0.
Figure 10. The ROC curve of Truth Positive Rate with respect to False Positive Rate of PRNU and CD-PRNU when authentication is performed on image II.3.
reflected in the PRNUs ROC curve in Figure 12 and is due to the fact that he manipulated area is too small (60 80 pixels), which is only about one quarter of the sliding window (128 128 pixels). Chen predicated in that one quarter of the sliding window is the lower bound on the size of tampered regions that our algorithm can identify, and therefore areas smaller than this should be filtered in order to remove the falsely identified noise. The experiment result on III.3 conforms to Chens observation. Since the tampered area is 60 80 pixels, approximately one quarter of the window, the method based on PRNU can perform no better than a random guess. By contrast, the manipulated blocks can be effectively detected by the CD-PRNU-based scheme because the areas in question are from two images taken by different cameras and thus contain different interpolation noise. As a result, the CD-PRNU-based method can identify smaller areas.
Figure 11. Authentication results on image III.3: Integrity verification performance of the PRNU and CD-PRNU in terms of a) TP, b) TN, c) FP and d) FN across a range of correlation threshold T(t), with t varying from 0.0 to 3.0.