You are on page 1of 15

Robust Audio Watermarking Based on Low-Order Zernike Moments

Shijun Xiang1,2 , Jiwu Huang1,2 , Rui Yang1,2 , Chuntao Wang1,2 , Hongmei Liu1,2
School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510275, China Guangdong Key Laboratory of Information Security Technology, Guangzhou 510275, China xiangshijun@gmail.com, isshjw@mail.sysu.edu.cn
1

Abstract. Extensive testing shows that the audio Zernike moments in lower orders are very robust to common signal processing operations, such as MP3 compression, low-pass ltering, etc. Based on the observations, in this paper, a robust watermark scheme is proposed by embedding the bits into the low-order moments. By analyzing and deducting the linear relationship between the audio amplitude and moments, watermarking the low-order moments is achieved in time domain by scaling sample values directly. Thus, the degradation in audio reconstruction from a limited number of watermarked Zernike moments is avoided. Experimental works show that the proposed algorithm achieves strong robustness performance due to the superiorities of exploited low-order moments.

Introduction

Due to RST (Rotation, Scale, Translation) invariance of image Zernike moments [1], Zernike transform is widely applied in some image processing elds, such as image recognition [2], robust image watermarking [3][4][5][6], and image authentication [7]. In [2], the authors introduced the RST invariance of image Zernike moments for image recognition, and pointed out the low order moments represent image shape while the higher order ones ll the high frequency details. In [8], the authors analyzed the reconstruction power of image Zernike moments and the reasons of the reconstruction degradation by using a limited number of Zernike moments. In the applications using Zernike moments, how to regenerate the signal from the moments is an important issue. In [3], the watermarked image was degraded in the reconstruction procedure due to the high-frequency details in higher order moments is lost. By converting the watermark, a vector composed by some selected moments, into the spatial domain signal in [4], or by remaining and adding higher order information before the watermark is embedded in [5], the watermarked image avoided the degradation caused by the reconstruction. The above methods share a idea that the image Zernike moments are robust to geometric attacks.

2 Naturally, we are motivated in a way that the application of Zernike moments on audio watermarking is benecial? Digital audio, one-dimensional (1-D) discrete signal, may be mapped into two-dimensional (2-D) form for performing Zernike transform. In this way, it is possible to discover the characteristics of Zernike moments on audio signal processing. According to the best of our knowledge, there has no any relative report on Zernike transform in audio applications. Possibly, it is due to the following some reasons: 1) Audio is 1-D signal; 2) Audio Zernike transform may introduce a series of unknown problems, such as synchronization and reconstruction; 3) Compared with image Zernike moments, the characteristics of Zernike moments on audio are sealed yet, which are required to be opened. In this paper, audio Zernike transform is performed by mapping audio into 2-D form. Furthermore, the features of audio moments are investigated by using extensive experimental works. It is noted that, 1) the reconstruction degradation from moments is unavoidable and the quality of regenerated audio is unacceptable; 2) the low-order moments capture the basic shape of audio signal and represent its low-frequency components. As a result, the low-order moments are very robust to common signal processing operations, such as MP3 compression and low-pass ltering, etc. By using the advantages of the low-order moments, a robust multi-bit audio watermarking algorithm is proposed. In order to avoid the degradation in the reconstruction procedure, we analyze and deduct the linear relationship between the audio amplitude and its moments in proposed strategy. Watermarking the audio Zernike moments in lower orders is achieved by scaling the sample values, and thus the degradation in the reconstruction procedure is avoided. The watermarked audio is imperceptible. Simulation results show that the proposed algorithm is very robust to common signal processing operations and attacks in Stirmark Benchmark for Audio. In the next section, we introduce the theory of Zernike transform. We then investigate the characteristics of audio Zernike moments via extensive testing. This is followed by a description of a general framework for our proposed watermark embedding and detecting strategy. We then analyze the watermark performance and test the watermark robustness on some common signal processing and most attacks in Stirmark Benchmark for Audio. Finally, we draw the conclusions.

Zernike Moments

In this section, we describe Zernike moments and their properties. Some of the materials in the following are based on [1][2][4]. Zernike introduced a set of complex polynomials which form a complete orthogonal set over the interior of the unit circle, i.e., x2 + y 2 = 1. Let the set of these polynomials be denoted by Vnm (x, y ). The form of these polynomials is: Vnm (x, y ) = Vnm (, ) = Rnm () exp(jm) (1) where n is positive integer or zero, m an integers subject to constraints (n |m|) is nonnegative and even, is the length of vector from origin to (x, y ) pixel,

3 is the angle between vector and x axis in counterclockwise. Rnm () is radial polynomial, dened as:
n
|m| 2

Rnm () =
s

(1)s

(n s)! s!
|m| ( n+2

+ s)!

m| ( n| 2

+ s)!

n2s

(2)

Note that Rn,m () = Rn,m (). So Vn,m (, ) = Vn,m (, ). These polynomials are orthogonal and satisfy Vn,m (x, y ) Vp,q (x, y ) dxdy = x2 +y 2 1

1 n=p . Zernike moments are the projection of the 0 n=p function onto these orthogonal basis functions. The Zernike moment of order n with repetition m for a continuous 2-D function f (x, y ) that vanishes outside the unit circle is
n+1 np mq ,

with np =

Anm =

n+1
x2 +y 2 1

f (x, y ) Vn,m (x, y ) dxdy

(3)

For a 2-D digital signal, like digital image, the integrals are replaced by summations to n+1 n=0
+

Anm =

f (x, y ) Vn,m (x, y ) m

(4)

Suppose that one knows all moments Anm of f (x, y ) up to given order Nmax . (x, y ) by the following formula It is desired to reconstruct a discrete function, f
Nmax

(x, y ) = f

Anm Vnm (, )
n=0 m

(5)

(x, y ) goes to f (x, y ). Theoretically, as Nmax increasing, f

Audio Zernike Moments

In this section, audio Zernike transform is achieved by mapping 1-D audio signal into 2-D form. Then, the characteristics of audio Zernike moments are investigated based on extensive testing. It is found that the low-order moments are robust to common audio signal processing. And, the reconstruction degradation from moments is unavoidable and distorted severely.

Fig. 1. The original audio and the reconstructed audio under the dierent order Nmax

3.1

Mapping

A 1-D digital audio signal, may be mapped into a 2-D form by using the following projection: L=RR+M f (x, y ) = g (x R + y ) (6)

where f (x, y ) is corresponding audio version after projection, L is the length of audio, M is the rest of audio samples, and R is the width or height in f (x, y ), the value of which is as large as possible under the constraint of Equation (6). 3.2 Reconstruction Degradation

After mapping, Zernike decomposition and reconstruction procedures on audio signal are performed, referred to Equation (4) and (5). We choose a clip from our test data set, ute music denoted as ute.wav (16-bit signed mono audio le sampled at 44.1 kHz with the length of 5s), for testing. The number of the given max order Nmax is assigned to 4, 10, 20, 30, 40, 45, 50, 60 and 70, respectively. The waveforms of original one and the reconstructed audios are aligned in Fig.1. As to other kinds of audio, such as pop music, piano music and speech, etc., the simulation results are similar. In Fig.1, Origin.wav is the original audio while AudioWithOrder*.wav denote the reconstructed ones, in which Nmax is assigned as . It is noted that the low-order moments captured the basic shape of audio signal while the higher order ones ll the high frequency details. This observation is similar to that in images [2]. In detail, when Nmax is less than 50, the bigger Nmax is used, the closer to the original audio the reconstructed audio is. When Nmax is greater than 50, the reconstructed audio is distorted seriously. The degradation caused

5 in reconstruction procedure is due to that when Nmax is lower the high frequency information is discarded, while Nmax is higher the cumulative computation error occurs in the reconstruction [8]. Referred to Fig.1, it is evident that the reconstruction degradation from limited moments is unavoidable. 3.3 Selection of Robust Features

In order to apply Zernike moments in audio watermarking, it is necessary to investigate the robustness performance of audio Zernike moments to common signal processing manipulations, such as MP3 compression, low-pass ltering, etc. In the following experimental works, we design the following mathematical expression to compute the modication of moments before and after audio processed, Ebn =
m

|Anm |,

Ean =
m

|Anm |

(7)

1.04

32 kbps 40 kbps 48 kbps

E /E

an

bn

1.02

0.98 5 10 15 20 25 Order (n) 30 35 40

1.02 1.015 1.01 Ean/Ebn 1.005 1 0.995 0.99 0.985 0.98 5 10 15 20 25 Order (n) 30 35 40 64 kbps 80 kbps 96 kbps 128 kbps

Fig. 2. The eects of MP3 compression with the bit rates of 32, 40, 48, 64, 80, 96 and 128 kbps.

1.1 1.05 E /E
bn an

22 kHz 16 kHz 10 kHz

1 0.95 0.9

10

15

20 25 Order (n)

30

35

40

45

1.3 1.2 Ean/Ebn 1.1 1 0.9 0.8 0 5 10 15 20 25 Order (n) 30 35 40 45 8 kHz 4 kHz 2 kHz

2 1.5 Ean/Ebn 1 0.5 0 1 kHz 0.8 kHz 0.4 kHz

10

15

20 25 Order (n)

30

35

40

45

Fig. 3. The eects of low-pass ltering with the cut-o frequency of 0.4, 0.8, 1, 2, 4, 8, 10, 16 and 22 kHz

where Anm is the corresponding version of Anm after undergoing some signal processing operations. Ebn and Ean denote the total amplitude of all moments with the given order n before and after processed, respectively. And, 0 n Nmax . We select ute.wav as the example clip to test the eect of MP3 compression, and low-pass ltering. Fig.2 and Fig.3 have the same scaling in both horizontal (given order n ) and vertical (Ean /Ebn ) axis. As to other kinds of audio, such as pop music, piano music and speech, etc., the simulation results are similar. In above experiments, MP3 compression and low-pass ltering operations are performed by using the software CoolEditPro v2.1. Based on the extensive testing with dierent audio signals, we have the following observations: i. Zernike transform of 1-D signal may be achieved by mapping the signal into 2-D form. It is noted that the low-order moments capture the basic shape

7 of the signal but the reconstruction degradation from Zernike moments is large and unavoidable, referred to Fig.1. ii. Based on the extensive experiments, it is also found that the low-order moments are robust to common signal processing operations. The moments under order 10 is very robust to MP3 compression even with the lowest bit rate of 32 kbps, referred to Fig.2. The moments under order 20 is robust to low-pass ltering up to with cut-o frequency of 2 kHz, referred to Fig.3. As a conclusion, if we embed the watermark into those moments under order 10 and try to avoid the degradation in reconstruction procedure, it is expected that the watermark will be very robust to these common signal processing manipulations and some hostile attacks.

Proposed Watermark Algorithm

In this section, a robust audio watermark algorithm is presented. The watermark bits are embedded into the low-order moments to achieve good robustness. We deduct the linear relation between audio amplitude and its moments. In the proposed watermark scheme, by applying the linear relation we watermark the low-order moments by scaling audio amplitude in time domain directly, and thus the generated watermarked audio avoids the reconstruction distortion. To resist amplitude scaling attack, the use of three successive segments as a group is designed to embed one bit of information by modifying the low-order moments in each three segments. 4.1 Watermark Embedding

Embedding Scheme: The basic idea of the embedding algorithm is to split the original audio to many segments, three segments as a group. Mapping the segments into 2-D form and performing Zernike transform. Then embed one bit of watermark into the low-order Zernike moments. According to the dierence of the moments before and after watermarking, a corresponding scaling factor is computed. Finally, the watermark audio is generated by scaling original one. The embedding model is shown in Fig.4. In the algorithm, the adaptive embedding strategy is introduced to control the embedding strength, achieving the value as large as possible under the imperceptivity constraint. The detail is described as below. Suppose that SN R1 is the SNR of the watermarked audio versus the original one, SN R0 is a predened value. If SN R1 < SN R0 , the embedding strength factor d will be automatically modied until SN R1 SN R0 . The watermarked audio becomes more similar to original one as d decreasing. It is noted that utilizing the relationships among dierent audio sample sections to embed data is proposed in [9]. This strategy is one type of modied patchwork scheme [10]. However, what proposed in this paper is dierent from [9]. Instead of in the time domain, we embed watermark signal in the low-order Zernike moments in order to achieve better robustness performance.

8
The length of the segment (L) Original audio Audio segments Embedding strength factor

Fk

f k ( x , y ) N max
Zernike transformation
k Anm

d
Embedding watermark bit Watermarks

g (i )

Segmenting and Preprocessing

w( j )

Modify d

Measure amplitude scaling factor ( D k )

Watermarked audio

 G

 (i ) g

Preprocessing and fit together

Watermark audible ?

 F k

fk ( x, y )

Reconstruction audio segments

Fig. 4. The watermark embedding scheme


g(i) 0 Seg_1 L Seg_2 2L Seg_3 3L

Fig. 5. Three consecutive sample segments

Embedding Strategy: The original audio, g (x), is split into proper segments. Suppose each segment includes L samples, as shown in Fig.5. Generally, L is designed according to the embedding capacity and SNR of the watermarked audio. After mapped into 2-D form, fk (x, y ), by using Equation (6), Zernike transform is performed on each segment with a given order n. The total modulus of the moments in the k th segment is denoted by Ek , as shown in Equation (8). n is suggested lower than 10 to achieve good robustness. Ek =
m

|Anm |,

n < Nmax

(8)

Denote the total modulus of the n order moments in the three consecutive segments as Ek1 , Ek and Ek+1 , respectively. Their relations may be obtained from the following Equation, A = Emax Emed B = Emed Emin (9)

where A and B stand for the dierences, respectively. And, Emax = maximum(Ek1 , Ek , Ek+1 ), Emed = meddium(Ek1 , Ek , Ek+1 ) and Emin = minimum(Ek1 , Ek , Ek+1 ). So we exploit Equation (10) to embed one watermark bit w(i), AB S BAS if w(i) = 1 if w(i) = 0 (10)

where S = d (Ek1 + Ek + Ek+1 ) is the embedding strength.

9 Assumed that after embedding one watermark bit, Ek1 , Ek and Ek+1 go k1 , E k and E k+1 , respectively. It is equivalent to Ek1 , Ek and Ek+1 by to E the corresponding factor k1 , k and k+1 , which may be computed by the following expressions, k1 = k1 k k+1 E E E , k = and k+1 = Ek1 Ek Ek+1 (11)

According to Equation (8), Ek is linear to Ak nm . It means that the k after watermarking may be expressed as corresponding moments A nm
k k A nm = Anm k

(12)

According to the analysis in Section 3.2, the serious reconstruction degradation will be caused if the watermarked signal is regenerated from the k modied moments A nm . Thus, It is required to introduce a new strategy to reconstruct the watermarked audio. 4.2 The Reconstruction Strategy

Now, we focus on aiming at resolving the reconstruction degradation. Consider amplitude linear scaling of the signal f (x, y ) through a factor . Assumed that (x, y ) and A nm , respectively. We the scaled signal and moments are denoted by f have the following expression, nm = n + 1 A n=0 = n+1 n=0
+ +

(x, y ) V (x, y ) f nm
m f (x, y ) Vnm (x, y ) m

(13)

= Anm From Equation (13), it is noted that the relation between audio sample values and the moments is mathematically linear. The linear relation has been veried by extensive testing. This conclusion is very useful. It means that the modication of Zernike moments may be mapped as the operation of scaling audio amplitude. Using this conclusion, we introduce the following strategy to generate the watermarked audio by scaling the sample values in each segment, referred to Equation (14). f k (x, y ) = k fk (x, y )
th

(14)

where k is the amplitude scaling factor of the k audio segment, computed by th using Equation (11), fk (x, y ) and f segment of the original k (x, y ) denote the k 2-D signal and the watermarked 2-D signal, respectively. The process is repeated to embed the watermark bits. Finally, by using Equation (6) we obtain the reconstructed watermarked audio, g (x).

10 How to reduce the reconstruction degradation from Zernike moments is an important issue in watermark applications [4]. In the proposed strategy, by using the linear relation between the audio and its moments, the degradation can be avoided. The watermark is designed to embed into the moments under 10 orders to achieve strong robustness. Additionally, since the watermarked audio is reconstructed by scaling sample values, the computation cost is low. 4.3 Watermark Extraction

In the detection, the watermarked audio, which has undergone some signal processing operations, for example, MP3 compression, low-pass, is performed Zernike transform as in the watermark embedding. Considering the synchronization attacks such as cropping, data structure of hidden bits, as shown in Fig.6, is designed. The synchronization codes [9][11] are introduced to locate the embedding region of watermark bits. According to the requirement, we may embed one or many synchronization codes.
Synchronization Code {Syn(i)} The hidden multi-bit information {Wmk(i)}

Fig. 6. Data structure of hidden bit stream [11]

k1 , E k and E k+1 , which are ordered to As in Equation (9), we compute E obtain Emin , Emed and Emax . Similar to Equation (10), we have =E max E med A =E med E min B and B , we get the hidden bit by using the following rule, Comparing A w (i) = B 0 1 if A <0 0 if A B (16) (15)

The process is repeated until all hidden bits are extracted. The parameters, the length of segment L, the given order n and the synchronization sequence Syn(i), are beforehand known, so the detection process is blind.

Performance Analysis

In this section, we evaluate the performance of the proposed algorithm in terms of data embedding capacity, resisting amplitude modication attack, and error probability of synchronization codes and watermarks. The embedding capacity, denoted by C, refers to the number of bits that are embedded into the audio signal within a unit of time. Suppose that the sampling rate of audio is R (Hz ). In our algorithm, C = R/(3 L) (bps).

11
A udio signal W aterm arks N oise

Pw Encoder C hannel D etector

Pd

Fig. 7. The watermark bit error probability in the channel and detector

In this paper, we embed one watermark bit into the relative relation in each three audio segments. Our goal is to resist amplitude scaling. Referred to Equation (8) and (13), whether amplitude scaling attack occurs or not, the magnitude relation between A and B keeps unchanged. It means that the algorithm is immune to such attack. 5.1 Error Analysis on Synchronization Code Searching

There are two types of errors in searching synchronization codes, false positive error and false negative error. A false positive error occurs when a synchronization code is supposed to be detected in the location where no synchronization code is embedded, while a false negative error occurs when an existing synchronization code is missed. Once a false positive error occurs, the bits after the locations of the false synchronization code will be regarded as the watermark bits. When a false negative error takes place, some watermark bits will be lost. The false positive error probability of the synchronization code P1 can be calculated as follows, P1 = 1 2N1
T

k=0

k CN 1

(17)

where N1 is the length of a synchronization code, and T is the threshold used to judge the existence of synchronization code. Generally, we use the following formulation to evaluate the false negative error probability P2 of the synchronization code according to the bit error probability, denoted as Pd , in the detector.
N1

P2 =
k=T +1

k CN (Pd )k (1 Pd )(N1 k) 1

(18)

5.2

Error Analysis on Watermark Extraction

It is noted that the introduction of synchronization codes in the algorithm may make the dierence between the bit error probability of the watermark in the detector Pd and in the channel Pw , illustrated in Fig.7. Suppose that x, the number of synchronization codes, are embedded and the number of the false positive synchronization codes and false negative synchronization codes detected is y and z, respectively. So the error probability

12 Pw may be expressed as follows. The false positive error probability P1 can be expressed as y/(x + y z ) here. (x z ) N2 Psw + y N2 Paw = (1 P1 ) Psw + P1 Paw (x + y z ) N2

Pw =

(19)

where, N2 is the length of the watermark bits, which follow a corresponding synchronization code, Psw and Paw denote the error probability of the watermarks in case of false negative and positive synchronization code occurring. From the view of point in probability theory, the value of Psw and Paw is approximately Pd and 50%. Accordingly, we have the following formulation. Pw = (1 P1 ) Psw + P1 Paw (1 P1 ) Pd + P1 50% (20)

From Equation (20), it is noted that the bit error probability of the watermark in the channel is dierent from that in the detector after introducing synchronization code, and the dierence mainly relies on the number of the false positive synchronization code. The occurring of the false negative synchronization code will lead to the loss of some hidden information bits, the eect of which on the error probability of the watermark may be ignored. When the value of y go to ZERO, P1 goes to ZERO, thus Pw goes to Pd .

Experimental Results

The proposed algorithm is applied to a set of audio signals including pop, light, rock, piano, drum and electronic organ. Nmax = 10 and the moments in order 8 is watermarked to achieve good robustness. The length of segments L = 225 is mapped into 15 15 2-D form. A clip (20s, mono, 16 bits/sample, 44.1 kHz and WAVE format) from the light music titled Danube is the example audio watermarked with 13 repeated 100 bits of binary sequence composed of a 31-bit synchronization code and the 69-bit watermark, with the embedding factor d = 0.25. The SNR is 25.63 dB beyond the 20 dB requested by the IFPI, with the ODG (Objective Dierence Grade) of -3.60 implemented by EAQUAL 0.1.3 alpha [12][13][14] considered HAS (Human Auditory System). The subjective listening test shows the watermarked audio is perceptibly very similar to original one. It is an evidence that the proposed watermark strategy has removed the reconstruction degradation caused by limited order moments. It is owed to that the watermarked signal is regenerated by amplitude scaling operation in time domain. We test the robustness of the proposed algorithm with BER (Bit Error Rate). The audio editing and attacking tools adopted in our experiments are CoolEditPro v2.1, GoldWave v4.25 and Stirmark Benchmark for Audio v0.2 [15][16]. The test results under common audio signal processing, cropping, and attacks in Stirmark for Audio are listed in Tables 1-3. From Table 1 we can see that our algorithm is robust enough to some common audio signal processing manipulations, such as, MP3 compression of 32 kbps,

13
Table 1. Robustness Performance to Common Attacks Attack Type Requantization 16 32 16(bit) MP3 (32 kbps) Low pass (9 kHz) Low pass (6 kHz) Low pass (3 kHz) BER(%) 0 1.15 0 7.46 9.31 Attack Type Resample 44.1 16 44.1(kHz) MP3 (40128 kbps) Low pass (8 kHz) Low pass (4 kHz) Volume (50150%) BER(%) 0 0 3.61 8.46 0

Table 2. Robustness Performance to cropping attacks Attack Type Cropping (1s) Cropping (3s) Cropping (5s) BER(%) 0 0 0 Attack Type Cropping (2s) Cropping (4s) Cropping (6s) BER(%) 0 0 0

Table 3. Robustness performance to the attacks in StirMark Benchmark for Audio v0.2 Attack Type AddBrumm 100 AddBrumm 1100 AddBrumm 2100 Compressor Amplify Exchange ZeroCross Stat1 Nothing Original Invert ZeroLength AddSinus AddDynNoise FFT Invert Echo FFT HLPass CutSample FFT Stat1 FFT Test BER(%) 0 6.23 18.15 0 0 0 5 0 0 0 0 0 15.07 0 0 6.84 6.69 Failed Failed Failed Attack Type AddNoise 500 AddNoise 700 AddNoise 900 ExtraStereo 30 ExtraStereo 50 ExtraStereo 70 Normalize Stat2 Smooth Smooth2 RC LowPass Lsbzero ZeroCross ZeroRemove FFT RealReverse FlippSample RC HighPass CopySample AddFFTNoise VoiceRemove BER(%) 0 1.23 2.22 0 0 0 0 0 0 0 0 0 0 0 0 5.61 6.23 Failed Failed Failed

low pass of 3 kHz, etc. It is owed to that the watermark bits are embedded into Zernike moments in lower orders which have been veried robust to common signal processing. Table 2 shows the strong robustness to cropping with the threshold T = 3, referred to Equation (17). In our experiments, by randomly cropping one portion of the audio even with the length of 6s, it is noted that a portion of watermark, 4 frame in 13, is lost, but the remanent watermark, 9 frame, is still extracted at a

14 low bit error rate. The reason is that the displacement of sample positions in the embedding and extracting is tracked by resynchronization via synchronization codes. Stirmark Benchmark for Audio is a common robustness evaluation tool for audio watermarking techniques. All listed operations are performed by using default parameters implemented in the system. From Table 3, it is found that the watermark shows stronger resistance to those common attacks. In the cases of failure ( F ailed means the BER is over 20%), the audio quality is distorted largely.

Conclusions

In this paper, we propose a multi-bit audio watermarking method based on the robustness of the low-order Zernike moments. Via extensive experiments, we show the advantages of the proposed features, the merits of the low-order moments. The moments in lower orders are very robust to common signal processing, such as MP3 compression. Accordingly, by applying the investigated feature combined with synchronization match technique, a robust audio watermarking scheme is designed. By using the linear relation between the audio and its moments, the low-order moments are watermarked by scaling audio sample values directly. As a result, the generated watermarked audio has been avoided the reconstruction distortion, and the watermark is imperceptible. Finally, the performance of the proposed algorithm is investigated. The extensive experimental works have shown that the proposed watermark strategy has strong robustness to common signal processing and most attacks in Stirmark Benchmark for Audio. The watermark also achieves good robustness against cropping. The DA/AD conversion (a common signal processing operation) [17] and the TSM (Time-scale Modication) attacks [18] are still challenging issues in audio watermarking community. One consideration of the further work is to improve the robustness of the watermark to the two attacks according to their distortion models [17][18]. Additionally, more detail evaluation based on the actual benchmark [19] will be reported in future researches.

Acknowledgments
Authors appreciate the support by NSFC (60325208,90604008), 973 Program (2006CB303104), NSF of Guangdong (04205407). We also thank the anonymous reviewers for their constructive suggestions.

References
1. Cho-Huak Teh and Roland T. Chin: On Image Analysis by the Methods of Moments. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.10 (1988) 496-513

15
2. Khotanzad and Y. H. Hong: Invariant Image Recognition by Zernike Moments. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.12, (1990) 489-497 3. M. Farzam and S. Shahram Shirani: A Robust Multimedia Watermarking Technique Using Zernike Transform. Proc. of IEEE International Workshop Multimedia Signal Processing, (2001) 529-534 4. H. S. Kim and H. K. Lee: Invariant Image Watermark Using Zernike Moments. IEEE Transaction on Circuits and Systems for Video Technology, Vol.13, No.8, (2003) 766-775 5. Y. Q. Xin, Simon Liao and Miroslaw Pawlak: A Multibit Geometrically Robust Image Watermark Based on Zernike Moments. Proc. of the 17th International Conference on Pattern Recognition, (2004) 861-864 6. J. Chen, H. X. Yao, W. G. and S. H. Liu: A Robust Watermarking Method Based on Wavelet and Zernike Transform. Proc. of the 2004 International Symposium on Circuits and Systems, Vol.2 (2004) 23-26 7. H. M. Liu, J. L Lin and J. W. Huang: Image Authentication Using Content Based Watermark. Proc. of the 2004 International Symposium on Circuits and Systems, (2005) 4014-4017 8. S. X. Liao and M. Pawlak: On the Accuracy of Zernike Moments for Image Analysis. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.20 (1998) 1358C1364 9. W. N. Lie and L.C. Chang: Robust and High-Quality Time-Domain Audio Watermarking Subject to Psychoacoustic Masking. Proc. of IEEE International Symposium on Circuits and Systems, Vol.2 (2002) 45-48 10. I. K. Yeo and H. J. Kim. Modied patchwork algorithm: A Novel Audio Watermarking Scheme. IEEE Transaction on Speech and Audio Processing, Vol.11 (2003) 381-386 11. S. Q Wu, J. W. Huang, D. R. Huang and Y. Q. Shi: Eciently Self-Synchronized Audio Watermarking for Assured Audio Data Transmission. IEEE Transactions on Broadcasting, Vol.51 (2005) 69-76 12. http://www.mp3-tech.org/programmer/sources/eaqual.tgz 13. International Telecommunication Union: Method for Objective Measurements of Perceived Audio Quality (PEAQ). ITU-R BS 1387, (1998) 14. M. Arnold: Subjective and Objective Quality Evaluation of Watermarked Audio Tracks. Web Delivering of Music, (2002) 161-167 15. M. Steinebach, F.A.P. Petitcolas, F. Raynal, J. Dittmann, C. Fontaine, S. Seibel, N. Fates and L.C. Ferri: StirMark benchmark: audio watermarking attacks. Proc. of International Conference on Information Technology: Coding and Computing, (2001) 49-54 16. http://www.petitcolas.net/fabien/watermarking/stirmark/ 17. S. J. Xiang and J. W. Huang: Analysis of D/A and A/D Conversions in Quantization-Based Audio Watermarking. International Journal of Network Security, Vol. 3 (2006) 230-238 18. S. J. Xiang, J. W. Huang and R. Yang: Time-scale Invariant Audio Watermarking Based on the Statistical Features in Time Domain. Proc. of the 8th Information Hiding Workshop, (2006) 19. A. Lang, J. Dittmann: Proles for Evaluation - the Usage of Audio WET. Proc. of SPIE Symposium on Electronic Imaging, Vol. 6072, 60721J, (2006)

You might also like