You are on page 1of 33

Introduction to Steganalysis


Multimedia Security

• Steganalysis to LSB encoding
• Steganalysis based on JPEG compatibility
• Some discussions

• Steganography
– The art of secret communication
– Stego content (e.g. images) should not
contain any easily detectable artifacts due
to message embedding
– The less information is embedded, the
smaller the probability of introducing
detectable artifacts

Watermarking vs. Steganography Fidelity Robustness Steganography Watermarking Capacity .

Steganalysis of LSB Encoding .

Goal • To inspect one or possibly more images for statistical artifacts due to message embedding in color images using the LSB method – To find out which images are likely to contain secret messages – To estimate the reliability of decisions • Type I error (false-alarm) and Type II error (Miss) .

Application Scenarios Automatic Checking Internet Internet node with a special filter Images in Seized computer Images sent to a certain address Forensics Expert .

LSB Encoding • Replacing the LSB of every gray-level of color channel with message bits – On average 50% of the LSB are changed – Logic behind this scheme • LSB in scanned or camera-taken images are essentially random • Encrypted (randomized) message are random • No statistical artifacts will be introduced .

Important Observation • Number of unique colors in cover images – Typically smaller than the number of pixels in the images • 1:2 for high quality scans in BMP format • 1:6 or lower for JPEG images or video • Many true-color images have a relatively small “palette” • After LSB embedding. new color palette will have a distinct feature – Many pairs of close colors – An evidence of LSB encoding-based steganography .

) # of combination ..G2.G1. C(. .B1) and (R2.B2) are close if |R1-R2|≤1 and |G1-G2|≤1 and |B1-B2| ≤1 • R: ratio between the number of close pairs of colors and all pairs of colors – R=P/C(U. 2) .Formulations • U: number of unique colors in an image • P: number of close color pairs – Two colors (R1.

and we can evaluate the number of unique pairs of P’.The Proposed Scheme • After embedding. U will be increased to U’. • The value of R for an image that does not have a message will be smaller than that of an image that already has a message already embedded in it .

The Proposed Scheme (cont.) • It is impossible to find a threshold of R for all images – Due to a large variation of U • Observations for reliable distinguishing – For an image already contains a large message • Embedding another message in it does not modify R significantly – For an image not containing a message • R increases significantly – Use the relative comparison of R as the decision criterion .

2) Decide whether an image is embedded – – R~=R’  the image already had a large message hidden R’>R the image did not have a message in it R’/R: the separating statistics .Detection Algorithm • To find out whether or not an image has a secret message 1. 4. 2. Calculate R=P/C(U. Size of the test message: 3‧a‧M‧N (for M by N color images) Calculate R’=P’/C(U’. 2) Using LSB embedding in randomly selected pixels – 3.

Limitations • If the secret message size is too small – the two ratio will be very close to each other • We cannot distinguish images with and without messages .

8k bits (350x250*3/8) • A message of length 20KB (2/3 of maximal capacity) was embedded into each image to form a new database of images with messages • The detection algorithm is run for both database and the message presence is tested by embedding a test message of size 1KB (a=1/30) .Experiments • Using an image database of 300 color images – 350x250 pixels – JPEG compressed – Capacity for each image: 32.

1 _ : original database … : embedded database .Experimental Results 1.

5 • Results – μ>μs for all s – s decreases  N(μs. σs) • Different size of secret messages .denoted as s. and test messages are tested – Secret messages: 1% to 50% – Test messages: a=0.01 – 0. σ) and N(μs. σs) become flat and the peak moves right – s increases  N(μs. σs) become narrower and the peak moves left • Easier to separate the two peaks for larger secret message sizes .Parameter Optimization • Model the density functions as Gaussian distributions – N(μ.

Threshold Selection Type I Error = Type II Error (equals minimizing overall error) Change the threshold Th to adjust for the importance of not missing an image with a secret message at the expense of false-alarm .

Experimental Results K K K K .

Experimental Results (cont.) K K .

5MN) may be unreliable .Conclusions • The probability of error prediction is mainly determined by the size of the secret message – The influence of the test message size is much smaller • The optimal test message size is different for different secret message size • The detection algorithm mainly targets for images with smaller number of unique colors – The results for high-quality scanned and loselessly compressed images (U>0.

Steganalysis Based on JPEG Compatibility .

Image Steganography • Image formats – Uncompressed (BMP) • Offering the highest capacity and best overall security – Palette (GIF) • Difficult to provide security with reasonable capacity – Lossy compressed (JPEG. JPEG 2000) • Difficult to hide message in JPEG stream in a secure manner while keeping the capacity practical .

Goal of this Paper • To show that images may be extremely poor candidates for cover images if • Initially acquired as JPEG images and later decompressed to a loseless format • For steganalysis methods. yet is not fully compatible with JPEG compressed image . minimal amount of distortion is to be achieved to reduce visible artifacts – The act of message embedding will not erase the characteristic structure created by JPEG compression – Analyzing the DCT coefficients of images to recover even the values of JPEG quantization table • Evidence for steganography – An image stored in loseless format that bears a strong fingerprinting of JPEG compression.

JPEG Compression DCT Uncompressed Image Huffman coder Borig Zigzag-scan dk(i).63 Dk(i)=Round (dk(i)/Q(i)) JPEG Quantization Matrix Q .…. i=0.

JPEG Decompression • Huffman decoding • QDk(i)=Q(i)*Dk(i) – Multiplying quantized DCT step with quantization step • Braw=DCT-1(QD ) – Inverse DCT • B=[Braw] – rounded to integers in the range of 0-255 .

||·||: L2 norm – Since |Braw(i) –B(i)| ≤0.5 for all i .Observations • If the block B has no pixels saturated at 0 or 255 – ||Braw-B||2 ≤ 16 .

qp(i):integer multiples of Q(i) close to QD(i) .Σ(QD’(i)-qp(i)(i))2 ≤ 16.B=[DCT-1(QD)]. could this block have arisen through the process of JPEG decompression with the quantization matrix Q (if available)? – ||B-Braw||2 =||DCT(B).DCT(Braw)|| =||QD’-QD|| By Parseval’s Equality ≤ 16 ≧Σ|QD’(i)-Q(i)round(QD’(i)/Q(i)| = S .Additional check .The Proposed Scheme • Question – Given an arbitrary 8x8 block B of pixel values. where QD(i)=qp(i)(i) .

Arrange the blocks in a list. Extract the quantization matrix Q from all T blocks • If all elements of Q are 1s. the image is not calculated . and remove all saturated blocks from the list • T: number of remaining blocks 3.Algorithm 1. Divide the images into 8x8 blocks 2.

B is not compatible with JPEG compression. For each block B. 7. After going through T blocks. Repeat the algorithm for different 8x8 division for detecting cropped images .) 4. no evidence of steganography is available. If S>16. if no incompatible blocks is found. calculate S 5. else Perform the additional check 6.Algorithm (cont.

Extracting the Quantization Matrix .

Some Discussions .

April 2005 . Goljan and R. Du and M. Denver. ” ICME 2000. “Steganalysis of LSB encoding in color images. “Steganalysis gets past the hype. 2000 • J. Du. “Steganalysis based on JPEG compatibility. 2001 • G.” SPIE Multimedia Systems and Applications IV.’ IEEE Distributed Systems Online. Fridrich. Fridrich. Long. Goth. M.Reference • J. New York. R.