You are on page 1of 5

NOVEL TECHNIQUES FOR WATERMARK EXTRACTION IN THE DCT DOMAIN

J. R. Hernández , M. Amado and F. Pérez-González

Dept. Tecnologı́as de las Comunicaciones, ETSI Telecom., Universidad de Vigo, 36200 Vigo, Spain
email: jhernan@tsc.uvigo.es, fperez@tsc.uvigo.es

ABSTRACT 
of applying a DCT transform to x[n] in a 8 8 pixels block ba-
sis. For copyright protection purposes, a watermark W [k] car-
A novel DCT-domain watermark extraction procedure for still im-
rying some hidden information (owner and image identification
ages that does not require the original image is presented. This
number, transaction date, etc.) is added to the original image in
method is based on the generalized Gaussian model, which in-
the DCT domain, obtaining as a result the watermarked version
cludes as a special case the cross-correlation-based watermark de-
Y [k] =4 X [k] + W [k].
tector structures, used so far in the literature. Optimal maximum
likelihood (ML) structures are given, which allow to analytically In the watermarking technique we analyze in this paper, the
assess the performance of watermarking methods in the DCT do- watermark W [k] is generated in the DCT domain employing a 2-
main within a statistical framework. These original theoretical re- dimensional multipulse amplitude modulation scheme [2, 3]. In
other words, W [k] can be expressed as the sum of N orthogonal
sults are validated with experiments that show a considerable im-
provement over the existing watermark extraction techniques. The f g
pulses Pi [k] Ni=1 N X
perceptual model used in the tests is also described. W [k ] = bi Pi [k]; (1)
i=1
1. INTRODUCTION 4
where the coefficients b = (b1 ; : : : ; bN ) are used to encode the
In recent years we have witnessed a striking proliferation of tech- hidden message. The modulation pulses Pi [k] N f g
i=1 are gener-
niques for representation, storage and distribution of digital mul- ated as a function of a secret key K , only known by the copyright
timedia information. Unfortunately, these developments have also owner. They are expressed as
[k]s[k]; k 2 Si

opened the gate to unathorized copying, distribution and manip-
ulation of data, mostly images. Specialized and costly hardware Pi [k] = 0; otherwise;
(2)
may alleviate the problem of images duplication, at the price of a
dramatic reduction in marketing possibilities –this is the crypto- where s[k] is a key-dependent pseudorandom sequence such that
graphic approach taken by pay TV channels, not foreseeable for 4 fS gN are also
s[k] 2 f 1; 1g; 8k, and the sets of indexes T = i i=1
scenarios such as Internet–. Watermarking techniques can at least
key-dependent and determine the spatial shape of the pulses. The
sequence [k ] is called the perceptual mask and indicates the max-
ensure that ownership information is invisibly embedded into the
image, thus preventing or deterring users from illegal uses.
imum allowable magnitude of the alteration that the coefficient
Although many watermarking methods have sprouted over the
X [k ] may suffer without achieving noticeable distortions. The sets
few past years, even with commercial products available, the re-
fS g N S \S
i i=1 are assumed to be non-overlapping, i.e. i j = ; i = ;8 6
sults up to date are quite discouraging, since there are freely avail-
able programs (e.g., unZign, Stirmark) that have succeeded in wip-
j , and sparsely spread over the whole image in a pseudorandom
fashion to provide security and robustness against cropping [2, 3].
Given a watermarked image Y [k] and the secret key K , first
ing the watermark away with little impact on the quality of the re-
sulting image. Parallel to this, the lack of theoretical analyses in
the presence of a watermark for that key is tried to be detected
most of the available literature makes it difficult to know the actual
in the so-called watermark detection test. If the result is positive,
limits in the performance of the various methods and to provide
then the watermark decoding procedure obtains an estimate of the
well-founded solutions which are the only way to eventually turn
message b. We will assume in this paper that no attacks aimed at
digital copyright protection into a mature discipline. In this paper
desynchronizing the watermark are performed. However, both the
we make a contribution in this direction by showing how water-
synchronization and watermark detection problems can be tackled
marking in the DCT domain (the most commonly used) can be
within the statistical framework presented in sections 2 to 4. The
dramatically improved by carefully modeling the problem and de-
perceptual model used in our particular watermarking scheme is
signing the proper watermark detector. We will assume throughout
given in Section 5. Section 6 is devoted to experimental results,
the paper that the original image is not known. While knowledge
while Section 7 presents our conclusions and future lines of re-
of the original image greatly simplifies the extraction procedure
search.
[1] it also narrows the range of possible applications.
Let x[n] be a two-dimensional sequence representing the lu-
minance of the original image, where n = (n1 ; n2 ). For the sake 2. STATISTICAL MODEL
of readability, we will use in the sequel this vector notation to rep-
resent two-dimensional discrete indexes. Let X [k] be the result Detector structures usually proposed for hidden information de-
coding in DCT-domain spread spectrum data hiding techniques
Work partially funded by CICYT under contract TIC-96-0500-C10-10 are based on the crosscorrelation between the watermarked image
Y [k] and the pseudorandom sequence s[k]. This scheme would When a binary antipodal constellation is used to encode M = 2N
be appropriate if noise –in watermarking, the original image– fol- possible messages, the ML detector structure is equivalent to a bit-
lowed a Gaussian distribution. However, the Gaussian assumption by-bit hard decisor, so the outputs of the decoder are
is inaccurate for DCT coefficients of common images. Some au-
thors have proposed the generalized Gaussian probability density ^bi = sgn(ri ); i 2 f1; : : : ; N g:
function (pdf)
Now let us analyze the performance of the watermark decod-
fx (x) = A e j xj :
c
(3) ing process in terms of the probability of bit error Pb . Obviously,
performance results strongly depend on image characteristics, so
as an alternative leading to improved statistical models [4]. Note we will obtain Pb conditioned to a given original image X [k] or, in
that the Gaussian and the Laplacian pdf’s are just special cases of other words, the probability of getting a bit error when a secret key
this expression, given by c = 2 and c = 1, respectively. Pre- is taken at random and is applied in both the watermarking and de-
coding processes. In this context, X [k] will be regarded as a deter-
T fS g
vious works in this field show that DCT coefficients at low fre-
quencies are reasonably well-modeled by a generalized Gaussian ministic signal while the sequence s[k] and the sets = i N i=1
distribution with c = 1=2. Coefficients at high frequencies are will be modeled statistically.
better approximated by a Gaussian distribution and sometimes by If the pseudorandom sequence s[n] is modeled as an i.i.d. two-
dimensional random process with marginal pdf fs (s), then, each
jS j
a Laplacian distribution.
The parameters A and in Eq. (3) can be expressed as sufficient statistic ri is the sum of i statistically independent
jS j f
contributions ( i is the cardinality of the set k; Pi [k] = 0 ). 6 g

(3=c)
1=2 4
Hence, by central limit theorem arguments, r = (r1 ; : : : ; rN )
= 1 ; A= c ; (4)
 (1=c) 2 (1=c) can be accurately approximated as the output of a vector Gaussian
channel. Therefore, the probability of error conditioned to X [k ]
where  is the standard deviation. Hence, the pdf is completely can be expressed as a function of the first and second order mo-
specified by c and  . Let us define the sequence ments of r1 ; : : : ; rN . Let us define the two-dimensional sequence
4 X [8k + i; 8k + j ]; i; j 2 f0; : : : ; 7g;
Ci;j [k1 ; k2 ] = 4 jY [k] + [k]s[k]jc[k]
r [k ] = jY [k] [k]s[k]jc[k] ;
1 2

which results if we take the (i; j )-th DCT coefficient of every extracted from Eq. (3).
block. We will model each of these 64 sequences as the output 2
If the tiling generation process is such that each index k N2
of a two-dimensional i.i.d. random process whose marginal distri- S
belongs to i with probability 1=N for all i 2f g
1; : : : ; N and
bution follows Eq. (3), with parameters c(i; j ) and  (i; j ). Let us assignments of indices to sets are performed independently, i.e.
also define the sequences c[k] as P rfk 2 Si ; m 2 Sj g = P rfk 2 Si gP rfm 2 Sj g; 8k 6=
m; i; j 2 f1; : : : ; N g, then after some algebraic manipulations it
4 c(k mod 8; k mod 8)
c[k] = can be proven [10] that
1 2

and  [k] in a similar fashion. Thus, these two sequences indicate E [ri ] = 1
X E [r[k]] : (5)
N  c[k]
the parameters c and  associated with each sample X [k ]. k [k]

3. WATERMARK DECODER
V ar(ri ) = 1 V ar(r[k]) + N 1 X E 2 [r[k]] : (6)
X

N 2c[k ] N2 2c[k ]
Let us assume that M possible different messages can be encoded k  [k] k  [k]
with the vector b = (b1 ;  2 f 
; bN ) and let bl ; l 1; ; M g Assume that bi = 1. Then, Y [k] = X [k] + [k ]s[k]; 8k 2 Si ,
denote the codeword associated to one of those messages. Also,
let Wl [k]; l2f g
1; : : : ; M be the watermark obtained from bl = and, as a consequence,
(bl;1 ; : : : ; bl;N ) using Eq. (1). Then, assuming the i.i.d. general-
ized Gaussian model for X [k], it can be easily shown that the op- r[k] = jX [k ] + 2 [k]s[k]jc[k] jX [k]jc[k]:
timum decoder in the ML sense is the one that chooses the index s[k] follows a discrete uniform two-level distribution, s[k] 2
2f g
l 1; : : : ; M verifying
If
f 1; 1g, it can be easily shown that the mean and variance of r[k]
jY [k] Wm [k]jc[k] jY [k] Wl[k]jc[k] > 0; 8 m 6= l:
are
X
c[k]
E [r[k]] = 1 jX [k]j + 2 [k]
h
c[k]
 [k ]
k 2
Assuming that bl;i 2f g 8 2f g 2f
1; 1 ; l 1; : : : ; M ; i 1; : : : ; N , g
+ jX [k]j 2 [k]
c[k] i
jX [k]jc[k]
this optimization problem is equivalent
P
to finding the codeword bl
which maximizes the expression N i=1 bl;i ri ; where the coeffi-
c[k]
V ar(r[k]) = 1 jX [k]j + 2 [k]
h

cients ri are sufficient statistics for the detection problem and are 4
defined as c[k] i2

jX [k]j 2 [k] :
4
ri =
X jY [k] + [k]s[k]jc[k] jY [k] [k]s[k]jc[k] :
k2Si
 [k]c[k] These expressions can be applied in equations (5) and (6) to com-
pute the moments of ri . When bi = 1, it can be verified that
V ar(ri ) is given by Eq. (6) and E [ri ] is negative and its absolute which is a sum of statistically independent terms. Hence, applying
value is given by Eq. (5). When a binary antipodal constelation the central limit theorem we can infer that l(Y ) is approximately
with M = 2N is used to encode the hidden message, the probabil- Gaussian. Assuming that s[k] is an i.i.d. two-dimensional random
ity of bit error Pb of the ML decoder (a bit-by-bit hard decisor) is sequence with a discrete marginal distribution with two equiprob-
Pb = Q (SNR), where Q(x) = 4 p1 R 1 e t2 =2 dt and the signal able levels, s[k]2f g 8
1; 1 ; k, then we can easily prove that the
2 x
to noise ratio SNR is defined as mean and variance of l(Y ) conditioned to H0 are [10]

4
SNR = p
E [ri ] : (7) E [l(Y ) j H0 ] =
X
[k]c[k] jX [k ]jc[k]
V ar(ri ) k

1X
[k]c[k] jX [k] + [k ]jc[k]
4. WATERMARK DETECTOR 2
k

Now let us analyze the watermark detection test, in which we have
to decide whether a given image contains a watermark generated + jX [k ] [k]jc[k] (12)
with a certain key. The watermark detection problem can be math-
ematically formulated as the binary hypothesis test 

V ar(l(Y ) j H0 ) = 1 [k]2c[k] jX [k] + [k ]jc[k]


X
H1 : Y [k] = X [k] + W [k] 4
H0 : Y [k] = X [k] (8)
k
2
where X [k] is the original image, not available during the test, jX [k] [k]jc[k] : (13)
and W [k] is a watermark generated from the secret key K that is
tested. If the watermark carries hidden information, it is not the
goal of the watermark detection test to estimate the hidden mes- Similarly, we can prove that l(Y ) conditioned to H1 is approxi-
sage; this task is left to the decoding process. Therefore, we must mately Gaussian with mean and variance
take into account the uncertainty about the value of the codeword
vector b when designing the detector. The optimum ML (Maxi-
E [l(Y ) j H1 ] = E [l(Y ) j H0 ] (14)
mum Likelihood) decision rule for the test formulated above is V ar(l(Y ) j H1 ) = V ar(l(Y ) j H0 ) (15)

H1 4 4
Let us define m1 = E [l(Y ) j H1 ] and 12 = V ar(l(Y ) j H1 ).
(Y ) >< ; (9) If H1 is decided in the detection test when l(Y ) >  , then the
H0 probabilities of false alarm (PF ) and detection (PD ) are
where  is the decision threshold and (Y ) is the likelihood func-    

PF = Q  + m1 ; PD = Q  m1 : (16)
tion
1 1
(Y ) =
1 M
X f (Y j H1 ; bl ) Let us define the following “signal to noise ratio”
f (Y j H0 )
(10)
M l=1
4 m21 :
SNR1 =
If we assume that the coefficients of the original image X [k] 12
(17)
follow the generalized Gaussian model studied in Sect. 2, and that
the watermark does not carry hidden information, in other words, 2
If we denote by Q 1 (PF ) the value x R such that Q(x) = PF ,
that there is only one pulse (N = 1) and it is modulated by a then it can be easily proved, by examining the expressions in (16),
known value b1 = 1, then the log-likelihood function l(Y ) =
4 that
ln (Y ) has the form  p 

X   PD = Q Q 1 (PF ) 2 SNR1 : (18)


l(Y ) = [k]c[k] jY [k]jc[k] jY [k] [k]s[k]jc[k] :
k Hence, the ROC (Receiver Operating Characteristic) of the water-
(11) mark detector depends exclusively on the value of SNR1 . Obvi-
ously, the larger the value of SNR1 , the larger the PD associated
where [k] is the parameter in the generalized Gaussian pdf for with a certain PF and the better, as a consequence, the perfor-
the coefficient X [k ] (it can be obtained from c[k] and  [k] using mance of the detector.
Eq. (4)).
Let us now analyze the performance of the watermark detec-
5. PERCEPTUAL MODEL
tion test conditioned to a certain original image. For this purpose,
we will characterize statistically l(Y ) for each of the two hypoth- In Eqs. (1,2) the watermark W [k] depends on a perceptual mask
esis when we assume that s[k] is the only random element in the [k ] that multiplies the pseudorandom sequence s[k]. This per-
watermarking system. When H0 is true, we have that Y [k] =
8
X [k]; k. Therefore,
ceptual mask determines the maximum amplitude distortion that
each coefficient of the original image may suffer while satisfying
X  
[k]c[k] jX [k]jc[k] jX [k] [k]s[k]jc[k] ; the invisibility constraint. A good psychovisual model in the DCT-
l (Y ) = domain (with 8x8 blocks) is capital to render the sequence [k].
k For our work we have followed the model proposed in [5, 6] that
has been also applied to derive adaptive quantization matrices for
the JPEG algorithm [7]. This model has been here simplified by
disregarding the so-called contrast-masking effect for which the
perceptual mask at a certain coefficient depends on the amplitude
of the coefficient itself. Consideration of this effect constitutes a
future line of research. On the other hand, the background inten-
sity effect, for which the mask depends on the magnitude of the
DC coefficient (i.e., the background), has been taken into account.
The so-called visibility threshold T (i; j ), i 2 f  g
0; ; 7 ,
j 2 f  g
0; ; 7 , determines the maximum allowable magnitude
of an invisible alteration of the (i; j )-th DCT coefficient and can
be approximated in logarithmic units by the following quadratic
function with parameter K Figure 1: One of the watermarks used in the tests.
 
Tmin (fi;0 + f )
2 2 2
0;j
log T (i; j ) = log
(fi;2 0 + f02;j )2 4(1 r)fi;2 0 f02;j
 q 2 c = 1=2 Laplace Gaussian
+ K log fi;2 0 + f02;j log fmin ; Empirical 29.38 29.07 21.39
Theoretical 29.34 28.71 20.78
where fi;0 and f0;j are respectively the vertical and horizontal
spatial frequencies (in cycles/degree) of the DCT-basis functions,
Tmin is the minimum value of T (i; j ), associated to the spatial Table 1: Empirical and theoretical signal to noise ratio SNR1 (in
frequency fmin , and r is taken as 0.7 following [5]. The threshold dB) in the watermark detection test.
T (i; j ) can be corrected for each block by considering the DC co-
efficient X0;0 and the average luminance of the screen X 0;0 (1024
for an 8-bit image) in the following way
 a different values of the generalized Gaussian parameter c are plot-
T
T 0 (i; j ) = T (i; j ) X0;0 : ted. Note that the parameter in Eq. (19) has been set to 1/5
X 0;0 –so the watermark is well below the visibility level– in order to
produce statistically significant results. The actual performance
Note that the actual dependence of X0;0 on the block indices has is substantially better, but the qualitative conclusions remain the
been dropped in the notation for conciseness. Following [5], the same. As can be inferred from Figure 2 and also from Figure
parameters used in our scheme have been set to aT = 0:649, 3, where the SNR in Eq. (7) is plotted for different values of c,
fmin = 3:68 cycles/degree, Tmin = 1:1548 and K = 1:728.
0 has been obtained, the per-
 
good results are obtained in the range 1=2 c 1. Interestingly
Once the corrected threshold value Ti;j enough, the performance for c = 2, corresponding to the cross-
ceptual mask is calculated as correlation-based detector used so far in the literature [9], suffers
   a severe deterioration, corresponding to a drop of more than 6 dB
[k1 ; k2 ] = 4 1 + pÆ (l1 ) Æ (l )  T 0 (l ; l )
1+ p 2 1 2 (19) in the SNR (cf. Fig. 3).
2+1 2+1
Although not directly discussed here, our analysis can be some-
where l1 = k1 mod 8, l2 = k2 mod 8 and < 1 is a scaling what straightforwardly extended to the case of JPEG compression.
factor that allows to introduce a certain degree of conservativeness Figure 4 shows the theoretical BER obtained when image ‘Lena’
in the watermark due to those effects that have been overlooked is watermarked (with a 100 bits hidden message) and later com-
(e.g., spatial masking in the frequency domain [8]). The remaining pressed with JPEG to a percentage of its original quality. As it
factors in (19) allow to express the corrected threshold in terms of can be seen, in this case, the Laplacian detector (c = 1) per-
DCT coefficients instead of luminances. forms slightly better than the one with c = 1=2. The Gaussian
(cross-correlation) detector, not shown in the figure, leads to a
much higher BER. The curves labeled as ‘Optimum’ correspond
6. EXPERIMENTAL RESULTS to a detector specifically designed for the JPEG compression at-
tack.
In order to validate the theoretical analysis presented in previous
sections, we have watermarked the well-known image Lena (256 x To measure the performance of the watermark detection test,
256 pixels) following the method described in Sect. 1, modifying we have watermarked the image Lena with 1000 different keys us-
only 22 coefficients in the mid-frequency range (low frequency co- ing “pure” watermarks carrying no additional information. In Ta-
efficients have very low capacity, i.e., slight modifications become ble 1 we show both empirical and theoretical values of the “signal
quite visible; high frequency coefficients can be easily erased by to noise ratio” SNR1 for the image Lena and detector structures
compression algorithms). based on three values of c. As we have already discussed, SNR1
To analyze the performance of the watermark decoding pro- completely determines the shape of the ROC. We can see that the
cess, we watermarked the image Lena with 100 different keys for theoretical approximations accurately fit the empirical data. Em-
different bit rates, measured in terms of the number of coefficients pirical measures of PF and PD , not shown here, clearly validate
altered by each information bit, and computed the resulting bit er- the accurateness of approximations in Eq. (16) [10]. Besides this,
ror rate (BER). Figure 1 shows one of the watermarks used in the it is clear that substantial gains in performance are obtained by
experiment. In Figure 2 both empirical and theoretical results for abandonning the Gaussian statistical assumption.
Figure 2: BER as a function of the pulse size for Lena (256  256). Figure 4: BER as a function of JPEG final quality for Lena.

[2] J. R. Hernández, F. Pérez-González, J. M. Rodrı́guez, and


G. Nieto, “Performance analysis of a 2d-multipulse ampli-
tude modulation scheme for data hiding and watermarking
of still images,” IEEE J. Select. Areas Commun., vol. 16,
pp. 510–524, May 1998.
[3] J. R. Hernández, F. Pérez-González, and J. M. Rodrı́guez,
“The impact of channel coding on the performance of spatial
watermarking for copyright protection,” in Proc. ICASSP’98,
vol. 5, (Seattle, Washington (USA)), pp. 2973–2976, May
1998.
[4] R. J. Clarke, Transform Coding of Images. Academic Press,
1985.

Figure 3: SNR as a function of c for Lena (256  256). [5] A. J. Ahumada and H. A. Peterson, “Luminance-model-
based DCT quantization for color image compression,” in
Human Vision, Visual Processing, and Digital Display III
(Proc. of the SPIE) (B. E. Rogowitz, ed.), 1992.
7. CONCLUSIONS AND FURTHER WORK
[6] J. A. Solomon, A. B. Watson, and A. J. Ahumada, “Visi-
Novel structures based on the use of generalized Gaussian models bility of DCT basis functions: Effects of contrast masking,”
have been proposed for the ML detection of DCT-domain water- in Proc. Data Compression Conference, (Snowbird, Utah
marks embedded in still images. By considering these models, (USA)), pp. 361–370, IEEE Computer Society Press, 1994.
we have been able to dramatically improve the performance of the [7] A. B. Watson, “Visual optimization of DCT quantization ma-
cross-correlation-based detectors that have been used up to date. trices for individual images,” in Proc., AIAA Computing in
In any case, we also have presented a theoretical analysis that al- Aerospace 9, (San Diego, California (USA)), pp. 286–291,
lows to assess the performance of DCT-based methods, measured American Institute of Aeronautics and Astronautics, 1993.
in terms of the bit error rate and the probabilities of false alarm
and detection, for a given image. The Gaussian detector is simply [8] A. N. Netravali and B. G. Haskell, Digital Pictures. Repre-
a particular case of the generalized model, so the analytical results sentation, Compression and Standards. New York: Plenum
given here are directly applicable. One immediate extension of Press, 1995.
our analysis is the consideration of channel codes which have been [9] M. Barni, F. Bartolini, V. Cappellini, and A. Piva, “A DCT-
already shown to considerably improve on spatial-domain water- domain system for robust image watermarking,” Signal Pro-
marking methods [11]. Another future line of research consists cesing, vol. 66, pp. 357–372, May 1998.
in fitting a generalized Gaussian model (from a discrete set of pa- [10] J. R. Hernández, Técnicas de Sellado Invisible para la Pro-
rameters c and  ) to the DCT coefficients histogram so as to decide tección de los Derechos de Autor en Imágenes Digitales.
upon and apply the optimal detector structure. PhD thesis, Universidade de Vigo, 1998.
[11] J. R. Hernández, J. M. Rodrı́guez, and F. Pérez-González,
8. REFERENCES “Improving the performance of spatial watermarking of im-
ages using channel coding.” Accepted for publication in Sig-
[1] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, “Se-
nal Procesing, Elsevier.
cure spread spectrum watermarking for multimedia,” IEEE
Transactions on Image Processing, vol. 6, pp. 1673–1687,
December 1997.

You might also like