You are on page 1of 4

DISTRIBUTED SOURCE CODING AUTHENTICATION OF IMAGES

WITH CONTRAST AND BRIGHTNESS ADJUSTMENT AND AFFINE WARPING

Yao-Chung Lin, David Varodayan, and Bernd Girod

Information Systems Laboratory, Stanford University, Stanford, CA 94305


{yao-chung.lin, varodayan, bgirod}@stanford.edu

ABSTRACT 2. REVIEW OF IMAGE AUTHENTICATION WITH DSC

Media authentication is important in content delivery via untrusted Fig. 1 is the block diagram for our earlier image authentication
intermediaries, such as peer-to-peer (P2P) file sharing. Many dif- scheme [1] as well as the current work. We denote the source image
ferently encoded versions of a media file might exist. Our previous as x. We model the target image y by way of the space-varying
work applied distributed source coding not only to distinguish the two-state lossy channel in Fig. 2. The legitimate state of the channel
legitimate diversity of encoded images from tampering but also to performs lossy JPEG2000 or JPEG compression and reconstruction
localize tampered regions in an image already deemed to be inau- with peak signal-to-noise ratio (PSNR) of 30 dB or better. The tam-
thentic. In both cases, authentication requires a Slepian-Wolf en- pered state additionally includes malicious tampering. The channel
coded image projection that is supplied to the decoder. state variable Si is defined per nonoverlapping 16x16 block of im-
We extend our scheme to authenticate images that have under- age y. If any pixel in block Bi has been tampered with, Si = 1;
gone contrast, brightness, and affine warping adjustment. Our ap- otherwise, Si = 0.
proach incorporates an Expectation Maximization algorithm into the We now review the authentication system. The left-hand side of
Slepian-Wolf decoder. Experimental results demonstrate that the Fig. 1 shows that a pseudorandom projection (based on a randomly
proposed algorithm can distinguish legitimate encodings of authen- drawn seed Ks ) is applied to the original image x to produce projec-
tic images from illegitimately modified versions, despite arbitrary tion coefficients X, which are quantized to Xq . The authentication
contrast, brightness, and affine warping adjustment, using authenti- data comprise two parts, both derived from Xq . The Slepian-Wolf
cation data of less than 250 bytes per image. bitstream S(Xq ) is the output of a Slepian-Wolf encoder based on
Index Terms— Image authentication, distributed source coding, rate-adaptive low-density parity-check (LDPC) codes [6]. The much
Expectation Maximization smaller digital signature D(Xq , Ks ) consists of the seed Ks and
a cryptographic hash value of Xq signed with a private key. The
authentication data are generated by a server upon request. Each re-
1. INTRODUCTION sponse uses a different random seed Ks , which is provided to the
decoder as part of the authentication data. This prevents an attack
Media authentication is important in content delivery via untrusted which simply confines the tampering to the nullspace of the projec-
intermediaries, such as peer-to-peer (P2P) file sharing or P2P mul- tion. Based on the random seed, for each 16x16 nonoverlapping
ticast streaming. In these applications, many differently encoded block Bi , we generate a 16x16 pseudorandom matrix Pi by drawing
versions of the original file might exist. Moreover, transcoding and its elements independently from a Gaussian distribution N (1, σ2 )
bitstream truncation at intermediate nodes might give rise to further and normalizing so that ||Pi ||2 = 1. We choose σ = 0.2 empiri-
diversity. Intermediaries might also tamper with the media for a vari- cally. The inner product Bi , Pi  is an element of X, quantized to
ety of reasons, such as interfering with the distribution of a particular an element of Xq .
file, piggybacking unauthentic content, or generally discrediting a The authentication decoder, on the right-hand side of Fig. 1,
distribution system. In previous work, we applied distributed source seeks to authenticate the image y with authentication data S(Xq )
coding (DSC) to image authentication to distinguish the diversity of and D(Xq , Ks ). It first projects y to Y in the same way as during
legitimate encodings from malicious manipulation [1] and demon- authentication data generation. A Slepian-Wolf decoder reconstructs
strated that the same framework can localize tampering in images Xq  from the Slepian-Wolf bitstream S(Xq ) using Y as side infor-
deemed to be inauthentic [2]. We extended the system to be robust mation. Decoding is via joint bitplane LDPC belief propagation [7]
to contrast and brightness adjustment [3] and affine warping [4]. In initialized according to the known statistics of the legitimate chan-
this paper, we handle simultaneous contrast, brightness and affine nel state at the worst permissible quality for the given original image.
warping adjustment. Our approach lets the authentication decoder Then the image digest of Xq  is computed and compared to the im-
learn adjustment parameters using an Expectation Maximization [5] age digest, decrypted from the digital signature D(Xq , Ks ) using
(EM) algorithm that generalizes the solutions in [3, 4]. a public key. If these two image digests are not identical, the re-
Section 2 reviews our image authentication system using dis- ceiver declares image y to be inauthentic. If they match, then Xq
tributed source codes [1]. In Section 3, we formalize the authentica- has been recovered. To confirm the authenticity of y, the receiver
tion problem and introduce our extension for image authentication verifies that the likelihood of the legitimate channel is greater than a
with parameter learning. The EM algorithmic details are given in certain threshold.
Section 4. Simulation results in Section 5 show that the proposed Since this second-pass comparison uses all available infor-
scheme can distinguish between authentic editings of images and il- mation, the threshold for the likelihood specifies how statistically
legitimately modified versions. similar the target image must be to the original to be declared
Original Image Target Image
x y
Two-state Channel

Random Random Side Information Y Random


Projection Seed Ks Projection
Slepian-Wolf
Image Projection X Reconstructed Image
Bitstream S(Xq)
Slepian-Wolf Slepian-Wolf Projection Xq' Cryptographic
Quantization
Encoder Decoder Hash Function
Random
Quantized Image Seed Ks
Projection Xq Cryptographic Asymmetric Asymmetric
Comparison
Hash Function Image Encryption Decryption Image
Digital
Digest Signature Digest
D(Xq,Ks)

Private Key Public Key

Fig. 1. Image authentication system based on distributed source coding.

authentic. But the rate of the Slepian-Wolf bitstream S(Xq ) deter- Original
Image
mines whether the quantized image projection Xq is recovered at x Contrast and
Affine Warping JPEG2000/ Target
Brightness Image
all [8]. Accordingly, at the encoder, we select a Slepian-Wolf bit- Adjustment
and Cropping JPEG Si=0
y
rate just sufficient to successfully decode with both legitimate 30 dB
Malicious
JPEG2000 and JPEG reconstructed versions of x. At the decoder, Tampering Si=1
we choose a threshold for the likelihood for the second-pass com-
parison to distinguish between the different joint statistics induced
in the images by the legitimate and tampered channel states. Fig. 3. Space-varying two-state lossy channel with contrast and
brightness adjustment and affine warping .
Original
Image x Target Image
JPEG2000/JPEG Si=0
y
realigned to the original, with channel states Si labeled red if tam-
pered and blue if cropped out in y. The rest are legitimate cropped-in
Malicious
Si=1
states.
Tampering
The image authentication system described in Section 2 cannot
authenticate legitimate images that have undergone contrast, bright-
Fig. 2. Space-varying two-state lossy channel. ness, and affine warping adjustment, because the side information is
neither aligned with the corresponding authentication data nor com-
pensated for the contrast and brightness changes. Approaches sug-
3. CONTRAST, BRIGHTNESS, AND AFFINE WARPING gested in the prior art to overcome this problem involve generating
ADJUSTMENT MODEL affine-invariant features that serve as authentication data [9, 10, 11].
These features are invariant to some transformations, but would be
In this paper, we replace the two-state lossy channel in Fig. 2 with sensitive to others. We instead propose that the authentication de-
the one in Fig. 3. Now both the legitimate and tampered states of coder jointly estimate the adjustment parameters directly from the
the channel are affected by contrast and brightness adjustment and Slepian-Wolf bitstream S(Xq ) and the target image y using an EM
affine warping. In the legitimate state, we model the channel as algorithm.

y(m) = αx(n) + β + z(m), where m, n ∈ R2 , and α, β ∈ R,


with n = Am + b, where A ∈ R2×2 , b ∈ R2 . 4. EXPECTATION MAXIMIZATION

A and b are transformation and translation parameters in affine The introduction of learning to the system in Fig. 1 requires a mod-
warping, respectively, α and β are contrast and brightness ad- ification of the Slepian-Wolf decoder block from a joint-bitplane
justment parameters, and z is noise introduced by compression LDPC decoder [7] to the parameter-learning Slepian-Wolf decoder
and reconstruction. To keep y the same size as x, it is padded shown in Fig. 5. As before, it takes the Slepian-Wolf bitstream
with black pixels (arbitrarily) and cropped. Fig. 4(a) shows a S(Xq ) and the target image y and yields the reconstructed image
source image “Lena” at 512x512 original resolution. The legiti- projection Xq  . But it now also estimates the parameters via an
mate y in Fig. 4(b) is first contrast and brightness adjusted with EM algorithm. The E-step updates the a posteriori probability mass
(α, β) = (1.2, −20), rotated by 5 degrees around the image center, functions (pmf) Papp (Xq ) using the joint bitplane decoder and also
then cropped to 512x512, and finally JPEG2000 compressed and estimates corresponding coordinates for a subset of reliably-decoded
0.996 −0.087 projection pixels. The M-step updates the parameters based on the
reconstructed at 32 dB PSNR. Here, A =
0.087 0.996 corresponding coordinate pmfs, denoted Q(m) in Fig. 5. This loop
and b = [ 23 −21 ]T . The tampered y in Fig. 4(c) addition- of EM iterations terminates when hard decisions on Papp (Xq ) satisfy
ally includes malicious tampering. Fig. 4(d) shows the tampered y the constraints imposed by S(Xq ).
(a) (b) (c) (d)

Fig. 4. Test image “Lena” (a) x original, (b) y in legitimate state, (c) y in tampered state, (d) channel states Si (red: tampered, blue:
cropped-out) associated with the 16x16 blocks of realigned output in (c).

In the E-step, we fix the parameters A, b, α, and β at their cur- Target Image
y
rent hard estimates. The inverse adjustment is applied to the image
y to obtain a compensated image ycomp . If the parameters are accu- Contrast, Brightness,
and Affine Wapring
A,b, D , E Parameter
rate, ycomp would be closely aligned to the original image x in the Adjustment
Estimation
Q(m)
cropped-in region. We derive intrinsic pmfs for the image projec-
tion pixels Xq as follows. In the cropped-in region, we use Gaussian ycomp Corresponding
Coordinate Estimation
Reconstructed
distributions centered at the random projection values of ycomp , and Slepian-Wolf
Bitstream S(Xq) Joint Bitplane Papp(Xq)
Image Projection
Xq'
in the cropped-out region, we use uniform distributions. Then, we Decoder
run three iterations of joint bitplane LDPC decoding on the intrinsic
pmfs with the Slepian-Wolf bitstream S(Xq ) to produce extrinsic
Fig. 5. Slepian-Wolf decoder with parameter learning.
pmfs Papp ([Xq ]i = xq ).
We estimate the corresponding coordinates m(i) for those pro-
jection pixels for which maxxq Papp ([Xq ]i = xq ) > T = 0.995, the lower bound separately over these two sets of parameters. We
denoting this set of reliably-decoded projection indices as C. We model P (n(i) |m(i) , [xmax
q ]i , y; A, b) as a Gaussian distribution,
also denote the maximizing reconstruction value xq to be [xmax q ]i . 1 i.e., (n(i) − Am(i) − b) ∼ N (0, σ 2 I). Taking partial derivatives
(i)
The latent variable update can be written as Qi (m) := P (m = with respect to A and b, and setting to zero, we obtain the optimal
m|[xmax
q ]i , y, n(i) ; A, b, α, β), where n(i) is the set of top-left co- updates:
ordinates of the 16x16 projection blocks Bi in the original image ⎡ ⎤
x, and m(i) represents the corresponding set of coordinates in the ⎡ ⎤ .. ..
A11 A21 ⎢ . . ⎥
overcomplete random projection Y . ⎣ A12 A22 ⎦ := E(GT G)−1 E[GT ] ⎢ (i) ⎥
⎢ n(i) n2 ⎥ ,
In the M-step, we re-estimate the parameters A, b, α, and β by ⎣ 1

b1 b2 .. ..
holding the corresponding coordinate pmfs Qi (m) fixed and maxi- . .
mizing a lower bound of the log-likelihood function:
 where
L(A, b, α, β) ≡ log P ([xmax
q ]i , n(i) , y; A, b, α, β) ⎡ (i) ⎤T
i∈C
··· m1 ···
⎛ ⎞ G = ⎣ ··· (i)
m2 ··· ⎦ .
  ··· 1 ···
= log ⎝ P ([xmax
q ]i , n(i) , y|m(i) ; A, b, α, β)P (m(i) )⎠
i∈C m(i) Similarly, we model P ([Xqmax ]i |y, m; α, β) as a quantized
 Gaussian with mean at (Y (m) − β)/α), and variance σz2 /α2 . Set-
≥ Qi (m) log P ([xmax
q ]i , n(i) , y|m; A, b, α, β)
i∈C m
ting partial derivatives with respect to α and β to zero, we obtain the
 optimality updates:
= Qi (m)   
i∈C m |C| i∈C μiXY − i∈C j∈C μiX μjY

α :=   
log P (n(i) |m, [xmax
q ]i , y; A, b) + log P ([xmax
q ]i , y|m; α, β) . |C| i∈C μiX 2 − i∈C j∈C μiX μjX
1  i
β := μY − αμiX ,
The lower bound is due to Jensen’s inequality and concavity of |C| i∈C
log(.). Note also that P ([xmax
q ]i , y|m(i) ; α, β) does not depend
on the parameters A and b, and P (n(i) |m, [xmax q ]i , y; A, b) does
not depend on the parameters α and β. Thus, we can maximize where μiX = Em∼Qi [E[X|Y (m), [xmax
q ]i ]],
μiY = Em∼Qi [Y (m)],
that C is nonempty, we make sure to encode a small portion
1 To guarantee
of the quantized image projection Xq with degree-1 syndrome bits. The μiX 2 = Em∼Qi [E[X 2 |Y (m), [xmax
q ]i ]],
decoder knows those values with probability 1 and includes their indices in
C. μiXY = Em∼Qi [Y (m)E[X|Y (m), [xmax
q ]i ]].
5. SIMULATION RESULTS −1
10

False Acceptance Rate


Our experiments use “Barbara”,“Lena”,“Mandrill,” and “Peppers” −2
of size 512x512 at 8-bit gray resolution. The two-state channel in 10

Fig. 3 changes contrast and brightness, applies affine warping to the


images, and crops them to 512x512. Then JPEG2000 or JPEG com-
−3
pression and reconstruction is applied at 30 dB reconstruction PSNR. 10
In the tampered state, the malicious attack overlays a 20x122 pixel EM Decoder
text banner, white or black whichever is more visible, randomly on Oracle Decoder
Fixed Decoder
the image. The image projection X is quantized to 4 bits, and the −4
10 −3 −2 −1 0
Slepian-Wolf encoder uses a 4096-bit LDPC code with 740 degree-1 10 10 10 10
False Rejection Rate
syndrome nodes.
Fig. 6 compares the minimum rates for decoding S(Xq ) with Fig. 7. Receiver operating characteristic curves.
legitimate test images using three different decoding schemes: the
proposed EM decoder that learns all the parameters in contrast and
brightness adjustment and affine warping, an oracle decoder that 6. CONCLUSIONS
knows the parameters, and a fixed decoder that always assumes no
change applied in the images except compression. Fig. 6 (a) and (b) We have extended our image authentication system to handle im-
show the results when the affine warpings are rotation around the ages that have undergone both contrast and brightness adjustment
image center and horizontal shearing, respectively. In both cases, and affine warping. Our authentication decoder learns the contrast
contrast is increased by 20%, and brightness is decreased by 20 out and brightness adjustment and affine warping parameters via an un-
of 255. The EM decoder requires minimum rates only slightly higher supervised EM algorithm. We demonstrate that an authentication
than the oracle decoder, while the fixed decoder requires higher rates Slepian-Wolf bitstream of 245 bytes is sufficient to distinguish be-
as the geometric distortion increases and quickly reaches a regime, tween legitimate editings of images and illegitimately modified ver-
where the side information does no longer reduce the bit-rate of the sions. The work can be extended to other editing models using an
quantized random projections (0.015625 bits per pixel of the original appropriate M-Step.
image).
7. REFERENCES

[1] Y.-C. Lin, D. Varodayan, and B. Girod, “Image authentication based on


Rate (bits per pixel of original image)
Rate (bits per pixel of original image)

distributed source coding,” in IEEE International Conference on Image


0.015 0.015 Processing, San Antonio, TX, Sep. 2007.
[2] Y.-C. Lin, D. Varodayan, and B. Girod, “Image authentication and
0.01 EM Decoder 0.01 EM Decoder tampering localization using distributed source coding,” in IEEE Mul-
Oracle Decoder Oracle Decoder timedia Signal Processing Workshop, Crete, Greece, Oct. 2007.
Fixed Decoder Fixed Decoder
0.005 0.005 [3] Y.-C. Lin, D. Varodayan, T. Fink, E. Bellers, and B. Girod, “Authenti-
cating contrast and brightness adjusted images using distributed source
0 0 coding and expactation maximization,” in International Conference on
−12 −8 −4 0 4 8 12 0 2 4 6 8 10
Degree of Rotation (°) Percentage of Shearing (%) Multimedia and Expo, Hannover, Germany, June 2008.
[4] Y.-C. Lin, D. Varodayan, and B. Girod, “Distributed source coding
(a) Rotation (b) Horizontal shearing
authentication of images with affine warping,” in IEEE International
Conference on Acoustics, Speech, and Singal Processing, Taipei, Tai-
wan, April 2009.
Fig. 6. Minimum rate for decoding the legitimate test image, “Bar- [5] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximization tech-
bara,” using different decoders. nique occurring in the statistical analysis of probabilistic functions of
markov chains,” Annals of Mathematical Statistics, vol. 41, no. 1, pp.
164–171, Oct. 1970.
For the next experiment, we set the authentication data size to [6] D. Varodayan, A. Aaron, and B. Girod, “Rate-adaptive codes for dis-
245 bytes and measure false acceptance and rejection rates. The ac- tributed source coding,” EURASIP Signal Processing Journal, Special
Section on Distributed Source Coding, vol. 86, no. 11, pp. 3123–3130,
ceptance decision is made based on the likelihood of Xq and y with Nov. 2006.
estimated parameters within the estimated cropped-in blocks. The [7] D. Varodayan, A. Mavlankar, M. Flierl, and B. Girod, “Distributed
channel settings remain the same except that parameters α is ran- grayscale stereo image coding with unsupervised learning of disparity,”
domly drawn from [0.9,1.1], β from [−10,10], A11 and A22 from in IEEE Data Compression Conference, Snowbird, UT, March 2007.
[0.95, 1.05], A21 and A12 from [−0.05, 0.05], and b1 and b2 from [8] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information
[−10, 10]. The JPEG2000/JPEG reconstruction PSNR is selected sources,” IEEE Transactions on Information Theory, vol. IT-19, no. 4,
from 30 to 42 dB. With 2000 trials each on “Barbara”, “Lena”, pp. 471–480, July 1973.
[9] F. Lefebvre, J. Czyz, and B. Macq, “A robust soft hash algorithm for
“Mandrill”, and “Peppers,” Fig.7 shows the receiver operating char- digital image signature,” in International Conference on Multimedia
acteristic curves created by sweeping the decision threshold of the and Expo, Baltimore, Maryland, 2003.
legitimate likelihood. The EM decoder performs very closely to the [10] C. D. Roover, C. D. Vleeschouwer, F. Lefebvre, and B. Macq, “Robust
oracle decoder, while the fixed decoder rejects authentic test images image hashing based on radial variance of pixels,” in IEEE Interna-
with high probability. In the legitimate case, the EM decoder esti- tional Conference on Image Processing, Genova, Italy, Sep. 2005.
mates the transform parameters A11 , A21 , A12 , A22 , b1 , b2 , α, and [11] S. Roy and Q. Sun, “Robust hash for detecting and localizing image
tampering,” in IEEE International Conference on Image Processing,
β with mean squared error 4.5 × 10−7 , 2.6 × 10−6 , 3.4 × 10−7 , San Antonio, TX, Sep. 2007.
1.6 × 10−6 , 0.05, 0.54, 2 × 10−5 , 0.34 and respectively.

You might also like