You are on page 1of 5

AUDIO WATERMARKING IN THE FFT DOMAIN USING

PERCEPTUAL MASKING
M. V. Rama Krishna
ECE Department
IIT Guwahati - 781039
ramakrishna mv1@yahoo.com
D. Ghosh
ECE Department
IIT Guwahati - 781039
ghosh@iitg.ernet.in
ABSTRACT
This paper presents a new oblivious technique for em-
bedding watermark into digital audio signals, which
is based on the patchwork algorithm in the FFT do-
main. The proposed watermarking scheme exploits a
psychoacoustic model of MPEG audio coding to en-
sure that the watermark does not affect the subjective
quality of the original audio. The audio is watermarked
by modifying selected FFT coefcients of an audio
frame under a constraint specied by the psychoacous-
tic model. Experimental results show that our scheme
introduces no audible distortion and is robust against
some common signal processing attacks.
1. INTRODUCTION
The outstanding progress in digital technology has not
only led to easy reproduction and retransmission of
digital data but has also helped unauthorized data ma-
nipulation. Consequently, the necessity arises for copy-
right protection of digital data (audio, image and video)
against unauthorized recording attempts. Recently, there
is a large interest in audio watermarking techniques
that are largely stimulated by the rapid progress in au-
dio compression algorithms and wide use of Internet
for compressed music distribution over the Globe. A
fewof the earliest audio watermarking techniques have
been reported in [1]. They include approaches such as
phase coding, echo coding and spread spectrum tech-
nique. In phase coding technique, watermark is em-
bedded by modifying the phase values of the Fourier
transform coefcients of audio segments. In the other
two approaches, watermark is embedded by modifying
the cepstrum at a known location using multiple de-
caying echoes or spread spectrum noise. Another au-
dio watermarking technique is proposed in [2] where
Fourier transformcoefcients over the middle frequency
bands are replaced with spectral components of the
watermark data. In [3], watermark for an audio is gen-
erated by modifying the least signicant bit of each
sample. Reports in [4, 5] discuss watermarking meth-
ods in audio by exploiting the characteristics of the hu-
man auditory systemso as to guarantee that the embed-
ded watermark is imperceptible. However, the disad-
vantage of these schemes is that the original audio sig-
nal is required in the watermark detection process, i.e.,
the algorithms are non-oblivious. Audio watermark-
ing using patchwork algorithm is developed in [6, 7].
This algorithm is based on statistical methods in the
transform domain, e.g., DCT, FFT, etc. Patchwork al-
gorithm has the advantage that it satises the security
constraint. But, here the watermark is not guaranteed
to be inaudible. Furthermore, robustness is not max-
imized. This is because the amount of modication
made to embed the watermark is estimated and not
necessarily the maximum amount possible.
In this paper, we present an oblivious audio water-
marking algorithm wherein the earlier proposed patch-
work algorithm in the FFT domain [6] is modied by
incorporating the concept of perceptual masking pro-
posed in [4]. Perceptual masking is based on the char-
acteristics of human auditory system and hence, it is
guaranteed that the embedded watermark in our pro-
posed algorithm will be inaudible. As the perceptual
characteristics of individual audio signals vary, the mag-
nitude of modication made to each coefcient adapts
to and is highly dependent on the audio being water-
marked. This is described in the sections to follow.
2. THE PATCHWORK AND MASKING
MODELS
2.1. Patchwork algorithm
The patchwork algorithm articially modies the dif-
ference (patch value) between means of samples in two
randomly chosen subsets called patches. The modi-
cation incorporated depends on the watermark data to
be embedded and is detected with a high probability by
comparing the observed patch value with the expected
one.
The two major steps in the patchwork algorithm
are: (1) Choose two patches A and B pseudo-randomly,
and (2) for watermark bit equal to 1, add a small con-
stant value d to the samples of patch A and subtract the
same value d from the samples of patch B, i.e.,
a

i
= a
i
+d
b

i
= b
i
d (1)
where a
i
, b
i
are sample values of the patches A and B,
respectively, and a

i
, b

i
are the modied samples. The
mean difference of the watermarked patches, hence, is
given as
e =
1
n
n

i=1
(a

i
b

i
) = a

= (ab) +2d. (2)


where n is the patch size, a, b are original sample
means of the patches and a

, b

are modied sample


means. For watermark bit equal to 0, d is subtracted in
patch A and is added in patch B, resulting in e = (a
b) 2d. Thus, the mean difference (patch value) for a
watermarked frame is modied by 2d than that of the
original frame. The detection process starts with the
subtraction of sample values between the two patches
and then comparing the observed mean difference with
the expected one.
2.2. Masking model
The performance of the patchwork algorithm depends
on the amount of modication d, which on the other
hand affects inaudibility. By increasing the value of d
the probability of false detection can be reduced. But,
high value of d adversely effects the inaudibility re-
quirement. Hence, the value of d is to be chosen so
as to trade-off between the inaudibility and probability
of false detection. In our work, the optimum value of
d is selected by considering the psychoacoustic model
of MPEG audio coding [8]. This model gives the max-
imum amount of modication that is possible for ev-
ery sample without any signicant degradation in the
subjective audio quality. The underlying concept be-
hind this is audio masking by which a faint but au-
dible signal becomes inaudible in the presence of an-
other stronger signal. The masking effect depends on
the spectral and temporal characteristics of both the
masked signal and the masker. Our procedure uses
frequency masking that refers to masking between fre-
quency components of the audio signals. If two sig-
nals, which occur simultaneously, are close together in
frequency, the stronger masking signal will make the
weaker signal inaudible; the low level signal will not
be audible if it is below some threshold. The mask-
ing threshold may be measured by using the MPEG
audio psychoacoustic model and is the limit for maxi-
mum modication while keeping the perceptual audio
quality high.
3. PROPOSED WATERMARK EMBEDDING
AND DETECTION ALGORITHM
3.1. Watermark embedding
In the embedding process, we repeatedly apply an em-
bedding operation on short segments of the audio sig-
nal. Each one of these segments is called a frame.
Let, the size of each frame be N. In this work, we
use binary watermarks w
j
i.e., w
j
= 0 or 1. The audio
watermark-embedding scheme is shown in gure 1 and
is described below.
Step 1. Map the secret key to the seed of a random
number generator. Generate an index set I = {I
1
, .., I
2n
},
2n<N/2, whose elements are pseudo-randomly selected
index values in the range [2, N/2]. The choice of N/2 is
due to the Hermitian symmetry of FFT for real valued
signals.
Step 2. Generate the binary watermark sequence {w
j
}
of length equal to the number of frames to be water-
marked; j denotes the frame number.
Step 3. Let, for an input audio frame, the set S =
{S
1
, S
2
, ....S
N
} be the FFT coefcients whose subscripts
denote frequency range from the lowest to the highest
FFT
Psycho Achostic
Model
Algorithm
Patch Work
IFFT
Watermarked
Audio Frame M
Sequence
PN(1, 1)
FFT
Y
P
Input
S
Audio Frame
Figure 1: Watermark embedding procedure
frequencies. Dene patches A and B as
A = {a
i
| a
i
= S
I
2i1
, i = 1, ....n}
B = {b
i
| b
i
= S
I
2i
, i = 1, ......n} (3)
Step 4. Calculate the masking threshold of the current
audio frame from the set S using the psychoacoustic
model. This gives the frequency mask M={M
k
}, k=1,
2, ..N.
Step 5. Generate the pseudo-random noise (1 or 1)
with a length of N and then apply FFT to it. Let, this
be Y={Y
k
}.
Step 6. Use the mask M to weigh the FFT coefcients
of the noise like sequence, that is
P = {P
k
| P
k
= M
k
Y
k
} (4)
Step 7. Dene C and D as two subsets of P, where c
i
= P
I
2i1
and d
i
=P
I
2i
, i=1,...,n.
Step 8. Embedding of a watermark bit w
j
is done as
follows
If w
j
= 1,

i
= a
i
+|Re(c
i
)|
b

i
= b
i
|Re(d
i
)|
(5)
If w
j
= 0,

i
= a
i
|Re(c
i
)|
b

i
= b
i
+|Re(d
i
)|
(6)
where is the watermark strength parameter.
Step 9. Dene newpatches A and B with a
i
= S
N/2+I
2i1
and b
i
=S
N/2+I
2i
, i=1,...,n. Also, dene C and D as c
i
=
P
N/2+I
2i1
and d
i
=P
N/2+I
2i
, i=1,...,n. Apply watermark
embedding process following the rule given in Step 8.
Step 10. Finally, replace the selected elements a
i
and
b
i
by a

i
and b

i
, respectively, and then apply IFFT. The
output is the watermarked audio frame.
We repeat the above steps to the next frame until
no watermark bits are left for embedding. To have a
safe communication between the embedding and de-
tection of watermarks, the watermark bit code may be
repeatedly and consecutively embedded several times.
Repeated embedding of the same information and de-
tection based on majority voting play error correcting
functionality.
3.2. Watermark detection
Step 1. Map the secret key to the seed of a random
number generator and then generate the index set I
same as that in the embedding process.
Step 2. For a watermarked audio frame, obtain the
subsets A and B from the FFT coefcients and then
compute the means of the two patches as
a

=
1
n
n

i=1
Re(a

i
)
b

=
1
n
n

i=1
Re(b

i
) (7)
Step 3. Calculate the mean difference (patch value) e=
a

Step 4. Compare e with a predened threshold T and


then decide the embedded watermark bit for that par-
ticular frame. The threshold T is the expected value
E[( a

b)] over all the frames in the audio signal which


may be assumed to be generally zero. Therefore, the
watermark detection rule may be stated as
IF e > 0 then 1 is detected,
ELSE IF e < 0 then 0 is detected.
We repeat the above steps until all watermark bits
are detected.
4. EXPERIMENTAL RESULTS
A total number of ten audio sequences are used in our
experiment. Each audio sequence has a duration of 10
seconds and are sampled at 44.1 KHz with 16 bits per
sample. The audio sequences are rst watermarked us-
ing our technique and then we attempt to extract the
watermark bits from the watermarked signals. The
frame size and the patch size are taken as 512 and 30,
respectively.
In order to demonstrate the subjective quality of
the watermarked signals, we perform an informal sub-
jective listening test according to the hidden reference
listening test [9]. Ten listeners participated in the lis-
tening test. The subjective quality of a watermarked
audio signal is measured in terms of Diffgrade which
is equal to the subjective rating given to the water-
marked test item minus the rating given to the hid-
den reference. The Diffgrade scale is partitioned into
ve ranges: Watermark imperceptible (0.00), percep-
tible but not annoying (0.00 to 1.00), slightly an-
noying (1.00 to 2.00), annoying (2.00 to 3.00),
and very annoying (3.00 to 4.00). The test results,
along with the corresponding SNR of the watermarked
audio signals, are tabulated in Table 1.
Table 1: SNR and Subjective Quality of Watermarked
audio signals
Test audio SNR (dB) Diffgrade
Blues1 21.14 0.00
Blues2 25.28 -0.20
Country1 20.42 0.00
Country2 18.87 -0.20
Classic1 27.14 -0.60
Classic2 29.61 0.00
Folk1 18.61 0.00
Folk2 17.59 0.00
Pop1 18.86 0.00
Pop2 19.05 0.00
Since the proposed algorithm is only statistically
optimal, a measure for its performance may be the prob-
ability of detecting a watermark bit correctly. Con-
versely, we may use bit-error-rate (BER) as the perfor-
mance measure; lower the BER, better is the perfor-
mance. Table 2 gives the BER in the detection process
when the watermarked signals are not disturbed (no
attack) and when various signal processing operations
are applied on them. The various signal processing at-
tacks used in our experiment are down-sampling by 2,
MPEG compression (MPEG-1 layer 3 with bit-rate 64
Kbps and 128 Kbps), Band-pass ltering (using 2nd
order Butter-worth lter with cut-off frequencies 100
Hz and 6 KHz), echo addition (echo signal with a delay
of 100 ms and a decay of 50%) and equalization (using
a 10-band equalizer with +6 dB and 6 dB gain).
Table 2: Error probabilities for various attacks
Type of attack bit error BER (%)
No attack 0/861 0.00
Down sampling 0/861 0.00
MPEG-layer 3 (128 Kbps) 4/861 0.46
MPEG-layer 3 (64 Kbps) 10/861 1.16
Band-pass Filtering 10/861 1.16
Echo Addition 2/861 0.23
Equalization 0/861 0.00
It is observed that although our method introduces
large distortion (low SNR) due to watermarking the
subjective quality of the watermarked audio signal is
acceptable (Diffgrade above 1.00) for all the test sig-
nals. Also, the watermark bits are detected correctly
with a high probability under various signal process-
ing attacks (maximum BER in our experiment is only
1.16%). Therefore, in our opinion, the proposed tech-
nique is robust as well as perceptually transparent.
5. CONCLUSION
In this paper, we describe a new algorithm for digital
audio watermarking. The algorithm is better than that
in [4] in the sense that it is oblivious while the later is
a non-oblivious technique. Compared to the algorithm
in [6], the proposed watermark embedding scheme ac-
complishes perceptual transparency by exploiting the
masking effect of the human auditory system. This
embedding scheme adapts the watermark so that the
energy of the watermark is maximized under the con-
straint of keeping the auditory artifact as low as possi-
ble. Also, the proposed scheme is robust to some of the
common signal processing attacks. These are corrobo-
rated by the experimental results given in the previous
section.
Despite the success of the proposed method, it also
has a drawback a synchronization problem. The use
of a PN sequence to generate the index set is vulnera-
ble to time scale modication attack. Also, our mask-
ing model can still be improved by considering the
temporal masking effect. Further research will focus
on overcoming these problems.
REFERENCES
[1] W. Bender, D. Gruhl and N. Morimoto, Tech-
niques for data hiding, Tech. Rep., MIT Media
Lab, 1994.
[2] J. F. Tilki and A. A. Beex, Encoding a hid-
den digital signature onto an audio signal using
psychoacoustic masking, Proc. 7th Intl. Conf.
Signal Processing and Technology, pp. 476-480,
1996.
[3] P. Basia and I. Pitas, Robust audio watermark-
ing in the time domain, Proc. 9th European Sig-
nal Processing Conf. (EUSIPCO98), pp. 25-28,
1998.
[4] M. D. Swanson, B. Zhu, A. H. Tewk and L.
Boney, Robust audio watermarking using per-
ceptual masking, Signal Processing, vol. 66, no.
3, pp. 337-356, 1998.
[5] L. Boney, A. H. Tewk, and K. N. Hamdy, Dig-
ital watermarking for audio signals, Proc. 3rd
Intl. Conf. Multimedia Computing and Systems,
pp. 437-480, 1996.
[6] M. Arnold, Audio watermarking: Features, ap-
plications and algorithms, Proc. IEEE Intl. Conf.
Multimedia, vol. 2, pp. 1013-1016, 2000.
[7] I. K. Yeo and H. J. Kim, Modied patchwork al-
gorithm: A novel audio watermarking scheme,
Proc. Intl. Conf. Information Technology, Cod-
ing and Computing, pp. 237-242, 2001.
[8] ISO/IEC IS 11172, Information Technology-
Coding of Moving Pictures and Associated Au-
dio for Digital Storage up to about 1.5Mbits/s.
[9] T. Painter and A. Spanias, Perceptual Coding of
Digital Audio, Proc. IEEE, vol. 88, no. 4, 2000.