You are on page 1of 4

Sixth International Conference on Advanced Language Processing and Web Information Technology

A Novel Steganalysis Algorithm of Phase Coding in Audio Signal

Wei Zeng, Haojun Ai, Ruimin Hu
National Engineering Research Center for Multimedia Software, 430072 Wuhan, China
recognition by the Human Auditory System (HAS).
The HAS has very low sensitivity to changes to the
phase components of an audio signal. By using the
phase components of the sound segment as a data
space, a fairly large amounts data can be coded into the
host signal. The embedded data is fairly transparent to
the HAS if the relative relations between the phase
components of preceding segments are well preserved.
The modification of the off-set of all the phase
components results in no distortion to the sound signal.
Unlike LSB Coding, phase coding is robust to
small amounts of additive noise, since the noise wont
affect to the distortion of the phase in most of the
frequency slots. The wave form of the signal is more
important than the absolute value of each data point in
Phase Coding.
Some steganography algorithms of phase coding
[1] ~ [4] have emerged recently. But compared with
steganalysis of other audio hiding algorithms, effective
steganalysis methods of phase coding are relatively
unexplored. Concerning this, in this paper, we propose
a potent steganalysis algorithm for typical phase
coding algorithm proposed by Bender [1]. In the next
section, we explain the phase steganography algorithm.
In Section 3, we introduce the feature extraction and
the steganalysis method. The experimental results are
given in Section 4. Section 5 concludes the paper.

Audio steganalysis has attracted more attentions
recently. Phase steganalysis is one of the most
challenging research fields. In this paper, a novel
algorithm to detect phase coding steganography in
audio signal is proposed. It is based on analysis of the
phase discontinuities, and can be described as follows.
Firstly, it takes FFT transform of special segment of
audio and unwraps the phases of each audio sample,
then extracts the phase difference between neighboring
samples. Secondly, in order to monitor the change of
phase difference, it calculates the five statistical
features of phase difference for steganalysis. Thirdly,
the SVM classifier is utilized for classification. All of
the 800 various audios are trained and tested in our
experimental work. With various embedding
parameters for training and testing audios, the
proposed algorithm can achieve a good classification,
and the correct rate of detecting is up to 95%.

1. Introduction
Recently, digital watermarking and data hiding
have become a vibrant research area. Various kinds of
multimedia files can be downloaded freely from the
Internet. Terrorists might have seen this as an
opportunity to communicate secretly with each other.
Thus, various steganalysis methods have emerged as
means to deter covert communication by terrorists.
Steganalysis is the scientific technology to decide if a
medium carries some hidden messages or not and, if
possible, to determine what the hidden messages are.
In addition to preventing secret communication among
terrorists, steganalysis serves a way to judge the
security performance of steganography techniques.
Audio steganography is a useful means for
transmitting covert battlefield information via an
innocuous cover audio signal. Phase coding is a coding
schemes that introduces least perceptible noise to the
host .The off-set of the phase of a sound is irrelevant to

0-7695-2930-5/07 $25.00 2007 IEEE

DOI 10.1109/ALPIT.2007.41

2. Phase coding algorithm

The phase coding algorithm embeds data into an
audio signal by taking advantage of the HAS response
to phase information. Bend et al proposed phase
coding algorithms based on the HAS sensitivity to
phase. In their approach, they divide the host audio
sequence into a set of equal-length segments and
compute the DFT for each segment, equivalent to
computing the STFT. As described in [1], the first step
of the phase coding algorithm is to compute the STFT
of the current block in the host signal xm ( n) . This is
performed by dividing xm ( n) into a set of L equal-


Paraskevas [5] proposed two phase discontinuities as

Extrinsic discontinuities
Extrinsic discontinuities are the result of the
computation of the inverse tangent function which
gives values of the phase modulo 2 . The arctangent
function calculates phase angles limited between
to rad, although the true phase angles are not
limited to this range. Consequently, any angle outside
this range is wrapped around zero, which can be
detected practically by identifying phase jumps that
can be up to 2 rad. An empirical way of unwrapping
the phase is by detecting where these jumps occur
and adding or subtracting 2 accordingly. The
disadvantage of this method is that it cannot
discriminate whether the phase jumps are due to
rapidly changing angles or due to the wrapping
ambiguity. The literature reports a number of methods
used to unwrap phase, to give a smooth phase
spectrum. These are also techniques for avoiding this
source of discontinuity, including the differential of
phase and the calculation of phase from a geometric
analysis of the z-plane.
Intrinsic discontinuities
The intrinsic discontinuities, found in phase
spectra, arise from properties of the physical system
that are responsible for generating the data under
analysis. The intrinsic discontinuities appear when
both the real and imaginary parts of the Fourier
spectrum of the signal are crossing zero
simultaneously. This second type of discontinuity is
due to the intrinsic nature of the signal itself and not
due to computational artifacts. There are two methods
of identifying the occurrence of intrinsic
discontinuities. From the complex Fourier spectrum of
the data, any simultaneous zero-crossing of the real and
imaginary components indicates the presence of a
discontinuity. A z-plane analysis of the data will give
rise to zeros on (or very close to) the unit circle. This
also reveals the existence of intrinsic
In this paper, we employ a conventional phase
unwrapping algorithm to overcome the extrinsic
discontinuities. Typical phase coding does some
modification in phase spectrum, not concerning the
phase discontinuities. So through the method described
in next section, we can detect the change of phase

length subblocks, and for each of these subblocks the

DFT is used to obtain the magnitude spectrum M i ( k )
and the phase spectrum i ( k ) , for 0 i L 1 .
The next step is to determine the phase difference
between subblocks on a frequency-by-frequency basis:
i (k ) = i (k ) i 1 (k )
To embed a bit of information into the current
block, the phase of the first subblock 0 ( k ) is
replaced with a unique phase signature corresponding
to the desired data bit:

1 (k ), if (m) = 1
0 (k ) =
+1 (k ), if (m) = +1


The phase of each subsequent subblock is replaced

with this phase signature plus the sum of phase
differences up to the given subblock. In this manner,
the relative phase of each subblock is preserved, and
the long-term phase of the block itself is maintained:

i (k ) = 0 (k ) + j (k )


j =1

As a final step, the watermarked block x m ( n) is

obtained by computing the inverse DFT of each
subblock using the original magnitude response

M i (k ) and modified phase response i (k ) .

Compared with the encoding algorithm, the
decoding is much simpler to be implemented. The
length of block and subblock, the DFT points, and the
data interval must be known at the receiver. The
embedded bit is extracted by computing the DFT of the
first subblock n, extracting its phase response 0 ( k )
and comparing this response to 1 ( k ) and +1 ( k ) ,
then choosing an appropriate threshold, the embedded
data can be detected as 0 or 1.

3. Phase steganalysis algorithms

3.1. Theory analysis
The definition of phase is:

Im( S ( ) )
f () = tan1
where 0 < f ( ) < 2 (4)
Re ( S ( ) )

where S ( ) is the complex Fourier spectrum of

3.2. Features extraction

the data and Im ( ) , Re ( ) are its imaginary and real

The phase coding method works by substituting the

phase of an initial audio segment with a reference

components, respectively. In audio signal, through the

phase analysis via the Fourier transform, Ioannis


phase that represents the data. The phase of subsequent

is adjusted in order to preserve the relative phase
between segments. Based on the Ioannis Paraskevass
theory, we know that the phase coding corrupts the
extrinsic continuities of unwrapped phase in each
segment, causing the change of phase difference. So
the statistical analysis of phase difference in each
segment can be used to monitor the change and
classify the embedded signal and clean signal. We
divide each signal into frames with a given length, and
then derive the phase differential spectra. The phase
differential spectra are derived from the unwrapped
phase spectra using FFT transform. Then from each
plot, five statistical features [6]: variance, skewness,
kurtosis, median, and mean absolute deviation are
derived in order to compress the large amount of
information that each spectrum conveys.


), > 0

4. Experimental results
The proposed phase steganalysis technique is
implemented and tested on a set of 800 16bit wav files
(44.1 KHz, 20 sec). The audio files include music
types (piano, symphony, violin, and rock), songs,
speech (male, female), nature noise etc. In phase
coding, there are five embedding parameters:
embedded messages, block length N, subblock length
n, phase modifier, frequency slots per bit.
In phase coding algorithm, we must concern the
phase dispersion cause by a break in the relationship of
the phases between each of the frequency components.
Minimizing phase dispersion constrains the data rate of
phase coding. One cause of the phase dispersion is the
substitution of phase 0 ( k ) with binary code. The
magnitude of the phase modifier needs to be close to
the original value in order to minimize dispersion.
The difference between phase modifier states
should be maximized in order to minimize the
susceptibility of the encoding to noise. In our modified
phase representation, a 0-bit is 2 and a 1-bit
is 2 .
Another source of distortion is the rate of change of
the phase modifier. With N-point DFT, theoretically,
we can use up to N-frequency slots of the phase matrix
of the coding. However, because of the noise in the
decoded phase in a typical sound waveform, it is
almost impossible to code on bit frequency slot.
Moreover, the modification of the phase done to each
frequency component will cause severe phase
dispersion. By changing the phase more slowly and
transitioning between phase changes, the audible
distortion is greatly reduced. Here we set interval of
phase modification as 16 in each subblock.
In addition, as to simply the calculation, we choose
one segment of 1024 samples which have most power
in audio file to analysis. In our experiment, we use 200
clean audios and their stego audios as input to train
SVM, and test another 600 clean audios and their stego
audios. The block length N and subblock length n is
N=512, n=128; N=512, n=256; N=1024, n=128;


that y 1, 1} , SVM solves a quadratic optimization


min T + C k ,
,b , 2
k =1


k 0, k = 1, , m,
Where training data are mapped to a higher
dimensional space by the function , and C is a
penalty parameter on the training error. For any test
instance x , the decision function (predictor) is

f ( x) = sgn ( T ( x ) + b )


The train and test audios use the five statistical features
derived from each plot, as described in Sec 3.2.

Given training vectors xk R , k = 1, , m , in

subject to yk ( T ( xk ) + b ) 1 k ,

kernel is K xi , x j = exp xi x j

The processing of distinguishing the audios with

and without hidden data can be viewed as classification
problem. In this paper, we use the Support Vector
Machine (SVM) because of its excellent performance.
We use a set of audios (stego and normal audios) as the
training data to construct the SVM classifier.
SVM is based on Vapniks statistical learning
theory [7]. It creates a maximum-margin hyperplane
which separate the training vectors from different
classes. When the margin is maximized, the
probabilistic test error bound is minimized. Non-linear
classifier can be created by mapping the original input
space into a higher dimensional feature space using a
non-linear kernel function. Some common kernels are
linear, polynomial, radial basis function and sigmoid

called the kernel function. In our work, we use the

freely available package LIBSVM [8] and radial basis
function (RBF) kernel to train SVM. The function

3.3. SVM classifier

two classes, and a vector of labels y R

K ( xi , x j ) ( xi ) ( x j )



N=1024, n=256; N=1024, n=512; N=2048, n=256;

N=2048, n=512; N=2048, n=1024 respectively. We
choose N=512, n=128; N=1024, n=512; N=2048,
n=512; N=2048, n=1024 for train, and test all other
parameter combinations.
Accuracy result of testing 6002 audios is shown
in Figure 1.

5. Conclusion
Phase coding is one of the most effective coding
methods in terms of the signal-to-perceived noise ratio.
In this paper, we present a novel method to detect
hidden message by typical phase coding in audio
signal. We use statistical analysis of phase difference
to monitor the phase discontinuities and use SVM
classifier to capture the faint changes of phase causing
by embedding. Experiments are conducted on a set of
various types of audios and the correct rate of
classification reaches to 95%.
As to monitor the statistical changes caused by
other phase coding algorithm, future work may focus
on analyzing more effective features in audio signal.
Also an appropriate classifier need further study.

[1] W Bender, D Gruh, N Morimoto, et al, Techniques for
data hiding, IBM System.1996, vol.35, no.3&4:313-336.

Figure 1. Testing Accuracy Result

[2] Chris Honsinger, Majid Rabbani, Data Embedding

Using Phase Dispersion, IEE Seminar on Secure Images
and Image Authentication(2000/039), London, UK, April
2000, p.5.

The alarm rate of detecting 600 clean audios is

6.1667%, 4.5%, 4.3333%, 5% respectively. The
missing rate of detecting 600 embedded audios is
shown in Figure 2.

[3] Gopalan, Kaliappan, Wenndt, Stanley J, et al, Audio

Steganography by Amplitude or Phase Modification,
Security and Watermarking of Multimedia Contents V,
Proceedings of the SPIE, Volume 5020, 2003: 67-76.
[4] Akira Takahashi, Ryouichi Nishimura, Yiti Suzuki,
Multiple Watermarks for Stereo Audio Signals Using
Phase-Modulation Techniques, IEEE Transactions on
Signal Processing, Vol 53, No 2, February 2005: 806-815.
[5] Ioannis Paraskevas, Edward Chilton, Combination of
magnitude and phase statistical features for audio
classification, Acoustical Society of America, ARLO 5(3),
July 2004: 111-117.
[6] A. Papoulis, Probability and Statistics, PrenticeHall,
Englewood Cliffs, 1990: Chap.12.

Figure 2. Missing rate of embedded audio

using various embedding parameters

[7] V. N. Vapnik, The nature of statistical learning theory,

New York: Spring Verlag, 1995.

Experimental results show that all train models can

get high detecting accuracy. There is a trade-off
between alarm rate and missing rate. Concerning this
trade off, we find it is reasonable to train SVM models
with audio embedded data using N=2048, n=512
because both alarm rate and missing rate can be
controlled. This conclusion is on the assumption that
each segment has embedded messages.

support vector machines, 2007.


library for