Professional Documents
Culture Documents
ABSTRACT
Steganography, in general, relies on the imperfection of
This paper presents the technique of embedding data in the human auditory and visual systems. Audio
an audio signal by inserting low power tones and its steganography takes advantage of the psychoacoustical
robustness to noise and cropping of embedded speech masking phenomenon of the human auditory system [HAS].
samples. Experiments on the embedding procedure applied Psychoacoustical, or auditory masking property renders a
to cover audio utterances from noise-free TIMIT database weak tone imperceptible in the presence of a strong tone in
and a noisy database demonstrate the feasibility of the its temporal or spectral neighborhood. This property arises
technique in terms of imperceptible embedding, high data because of the low differential range of the HAS even
rate and accurate data recovery. The low power levels though the dynamic range covers 80 dB below ambient level
ensure that the tones are inaudible in the message-embedded [1, 2]. Frequency masking occurs when human ear cannot
stego signal. Besides imperceptibility in hearing, the perceive frequencies at lower power level if these
spectrogram of the stego signal also conceals the existence frequencies are present in the vicinity of tone- or noise-like
of embedded information. Both of these features render the frequencies at higher level. Additionally, a weak pure tone
detection of embedding in the stego signal difficult to is masked by wide-band noise if the tone occurs within a
accomplish. Oblivious detection of the stego signal, instead critical band. This property of inaudibility of weaker
of escrow detection, yields the embedded information sounds is used in different ways for embedding information.
accurately. In addition, results of two cases of attacks on Embedding of data by inserting inaudible tones in cover
the data-embedded stego audio, namely, additive noise and audio signal has been presented recently [3,4]. The
random cropping, show the technique is robust for covert following sections describe the tone insertion technique and
communication and steganography. its robustness in retrieving the embedded information in the
presence of noise and in cropped frames of received speech.
Keywords: Audio Steganography, Imperceptible tone
insertion 2. EMBEDDING BY TONE INSERTION
Frequency
a listener may not be able to perceive the tone because of 4000
its low power, the spectrogram is likely to show
2000
continuous spectral nulls or ‘holes’ at the remaining three
tone frequencies. To a malicious attacker, these artifacts 0
0 2 4 6 8 10 12
are indicative of host manipulation even without the Stego - 2 bits/frame
Time
sets the order of the tones for each frame by frequency 6000
hopping avoids such an obvious detection of embedding
Frequency
[3]. 4000
2000
Figs. 2 and 3 show the host and the stego signals and 0
their spectrograms using the frequency-hopped four-tone 0 2 4 6 8 10
Time
12
Because of the high level of intrinsic noise in the host, With additive noise raised up to each stego frame
the dominant tone power was raised to more than 10 power, all 2502 bits were correctly recovered. At higher
percent. Although the stego signal did not show any noise levels of up to 2.5 times the frame power, the bit
perceptual difference from the host, the higher tone power error was below 80, or a BER of 2.86 percent.
started showing up in the spectrogram. To mask the
dominant tone in the spectrogram, the tones were set to Cropping by zeroing or replacing from 3 to 50 samples
frequencies in the range where the host has significant in each stego frame caused no bit error due to relatively
energy. In the 400 Hz to 1000 Hz range, for example, the higher power of the inserted tones. At higher number of
host has relatively high spectral energy over almost the destroyed samples, stego became highly noisy; still, the
entire duration. Hence, inserting tones at frequencies of bit error was negligible.
5
At the embedding rate of approximately 250 bits/s, the
technique has a high payload capacity. Any attempt to
0 increase the capacity further must use more than four
tones. However, use of eight tones for embedding 3
-5 bits/frame, for example, may lead to audible and/or
0 1 2 3 4 5 6 7 8 9
Sample index 4
x 10
visible artifacts unless the selected tones are absent in the
host audio. Also, noise – intentional or unintentional –
Fig. 4 Greenflag utterance host (top) and GSM-coded bits may cause high bit errors at high capacity. Noisy host
embedded stego signals, on the other hand, can have larger payload and
use higher power without significant loss of data. In
Host - GF
general, tones selected from high energy regions of the
4000 host can be masked in hearing and spectrogram by their
3000
low power levels.
Frequency
2000
Malicious attacks involving replacement of embedded
1000 samples with zeros or neighboring values appear to cause
0
less loss of data at smaller number of samples. Cropping
0 1 2 3 4 5 6 7 8 9 of a large number of samples destroys the cover audio,
Time
4000
Stego - 2 bits/frame however.