You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228973369

Analysis of Detecting Steganography contents in corporate Emails

Article · July 2011

CITATION READS

1 644

3 authors, including:

M. Rajaram
Anna University, Chennai
204 PUBLICATIONS   887 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Numerical Algorithms View project

All content following this page was uploaded by M. Rajaram on 06 October 2014.

The user has requested enhancement of the downloaded file.


International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE)
Vol. 1, No. 2, June 2011
ISSN: 2046-5149
Copyright © Science Academy Publisher, United Kingdom
www.sciacademypublisher.com
Science Academy
Publisher

Analysis of Detecting Steganography contents in corporate


Emails

P. T. Anitha1 , M. Rajaram2 and S. N. Sivanandham3


1
Asst. Prof. /MCA, Karpagam College of Engineering, Coimbatore 641032, Tamilnadu, India
2
Vice Chancellor, Anna University, Tirunelveli, Tamilnadu, India
3
Educational Advisor, Karpagam Group of Institutions, Coimbatore 641032, Tamilnadu, India

Email: anitha_pt@yahoo.com, rajaramgct@redifmail.com, Snsprof25@yahoo.com

Abstract - The widespread use of Steganography inevitably leads to a need to detect hidden data. However, compared to
steganography, steganalysis is still in its infancy. Our goal is to establish a solid framework for steganalysis, and design
systems to detect state-of-the-art hiding systems. We are researching three two approaches to accomplish this: 1)
cryptography 2) Steganography 3) steganalysis. Steganography is used to hide the occurrence of communication. Today,
email management is not only a filing and storage challenge. Because law firms and attorneys must be equipped to take
control of litigation, email authenticity must be unquestionable with strong chains of custody, constant availability, and
tamper-proof security. Email is insecure. This proposed will develop a steganalysis framework that will check the Email
content of corporate mails by improving the S-DES algorithm with the help of neural network approach. A new filtering
algorithm is also developed which will used to extract only the JPG images from the corporate emails. We anticipate that this
paper can also give a clear picture of the current trends in Steganography so that we can develop and improvise appropriate
steganalysis algorithms.
Keywords: Steganalysis, Steganography, Information Hiding, LSB, Stegdetect, Stego, Outguess

1. Introduction detect state-of-the-art hiding systems.


Cryptography and Steganography are well known and
widely used techniques that manipulate information 2. Image Steganalysis
(messages) in order to cipher or hide their existence. These There are essentially three types of image formats: raw,
techniques have many applications in computer science and uncompressed formats (BMP, PCX), palette formats (GIF),
other related fields: they are used to protect military and lossy compressed formats (JPEG, Wavelet, JPEG2000).
messages, E-mails, credit card information, corporate data, Only few current steganographic programs offer the
personal files, etc. The widespread use of Steganography capability to embed messages directly in the JPEG stream. It
inevitably leads to a need to detect hidden data. Steganalysis is a difficult problem to devise a steganographic method that
is detecting and ultimately extracting data hidden in an would hide messages in the JPEG stream in a secure manner
innocuous medium. while keeping the capacity practical. Far more programs use
The goal of steganalysis is to detect and/or estimate the BMP, PCX, or the GIF palette-based format. The GIF
potentially hidden information from observed data with little format is a difficult environment for secure steganography
or no knowledge about the steganography algorithm and/or its with reasonable capacity.
parameters. Consequently, if the cover-image was initially stored in
Current trend in steganalysis seems to suggest two the JPEG format, the act of message embedding will not erase
extreme approaches: (a) little or no statistical assumptions the characteristic structure created by the JPEG compression
about the image under investigation. Statistics are learnt using and one can still easily determine whether or not a given
a large database of training images and (b) a parametric image has been stored as JPEG in the past. Actually, unless
model is assumed for the image and its statistics are the image is too small, one can reliably recover even the
computed for steganalysis detection. This proposed research values of the JPEG quantization table by carefully analyzing
work developed a framework which is used to analyze the the values of DCT coefficients in all 8×8 blocks. After
stego content in the corporate emails. Our goal is to establish message embedding, however, the cover-image will become
a solid framework for steganalysis, and design systems to (with a high probability) incompatible with the JPEG format
International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE) 93

in the sense that it may be possible to prove that a particular compression algorithm.
8×8 block of pixels could not have been produced by JPEG
decompression of any block of quantized coefficients. This
finding provides strong evidence that the block has been
modified. It is highly suspicious to find an image stored in a
lossless format that bears a strong fingerprint of JPEG
compression, yet is not fully compatible with any JPEG
compressed image. This can be interpreted as evidence for
steganography. Presented in the figure 1 is an example of a
hidden message inside a picture.

3. Proposed Idea

3.1. Hybrid Algorithm


A new hybrid algorithm is developed by combining the S
DES algorithm and Back propagation algorithm of neural
network which will effectively detect the stego content in the
images. The S_DES is the best known and most widely used
cryptosystem for civilian applications. It was developed at
IBM and adopted by the National Bureau of Standards in the Fig 1 Simplified DES Scheme
mid 1970s, and has successfully withstood all the attacks
published so far in the open literature. 3.3. The Capturing algorithm
The proposed work developed a frame work which This new capturing algorithm checks the mail inbox only
contains the following tasks: Image separation from corporate for JPEG files. This filtering concept helps us to minimize the
mails using the newly developed capturing algorithm, seeking time of filtering the JPEG files. After filtering those
Compression, encryption, hiding, decryption, and files they are stored in a large database for further processing.
decompression steps. A sample image is taken from the database as covert channel
which is used to hide the secret information. For our
3.2. S-DES method of encryption
experiments, we created a database containing more than
This method is an example of a block cipher: the plain 20000 JPG images obtained from corporate mails. For each
text is split into blocks of a certain size, in this case 8 bits. image, we embedded a random binary stream of different
Plaintext = b1b2b3b4b5b6b7b8 lengths using S-DES algorithm. The proposed research
analyzes the performance of the improved version of image
key = k1k2k3k4k5k6k7k8k9k10 steganalysis algorithms in corporate mails. A large database is
used to store the images. The performance and the detection
For subkey generation, first, produce two subkeys K1 and ratio are going to be measured in corporate mails.
K2:
K1 = P8 (LS1 (P10 (key))) 4. Detection based on back propagation method
The neural network back propagation approach is used to
K2 = P8 (LS2 (LS1 (P10 (key)))) check for the discrepancy patterns and train itself for better
where P8, P10, LS1 and LS2 are bit substitution operators. accuracy by automating the whole process [7]. This study
For example, P10 takes 10 bits and returns the same 10 bits in used neural network to analyze object digital image based on
a different order: three different types of transformation which are Domain
Frequency Transform (DFT), Domain Coefficient Transform
P10 (k1k2k3k4k5k6k7k8k9k10) = k3k5k2k7k4k10k1k9k8k6. (DCT) and Domain Wavelet Transform (DWT).
The plain text is split into 8-bit blocks; each block is In this paper, we only consider following transforms,
encrypted separately. Given a plaintext block, the cipher text DFT, DCT and DWT. Firstly we analysis object digital image
is defined using the two subkeys K1 and K2, as follows: according these three different kinds transforms in this
method. The object image is transformed into transform
ciphertext = IP-1( fK2( SW( fK1( IP( plaintext ) ) ) ) ) domain data according these three transforms. Then calculate
and fK ( ) is computed as follows. We write exclusive-or these transforms data’s statistical features which can be
(XOR) as: exploited to detect hided information. The reason for
selecting DFT, DCT and DWT is that most data hiding
fK(L,R) = (L+FK(R),R method operate in these domains. These selected features
should be significantly impacted by the data hiding
FK(R) = P4(S0(lhs( EP(R)+K )), S1(rhs(EP(R)+K ))) processing. But it is difficult to find those features, so we
Once sample image and embedded information are select neural network to process this problem, neural network
finalized then it is compressed with the help of JPEG has the super capability to approximation any nonlinear
International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE) 94

functions. 7. If all blocks are identified as JPEG incompatible or if


For these features which have more effected by data the image does not appear to be previously stored as
hiding process, neural network will assign larger weight JPEG, we should repeat the algorithm for different
coefficients and for these features which have less effected by 8×8 divisions of the image (shifted by 0 to 7 pixels in
data hiding process, neural network will assign less weight the x and y directions). This step may be necessary if
coefficients. the cover image has been cropped prior to message
Let us denote the i-th DCT coefficient of the k-th block as embedding.
dk(i), 0 ≤ i ≤ 64, k = 1, …, T, where T is the total number of
blocks in the image. In each block, all 64 coefficients are 5. Performance Analysis and Experiment Results
further quantized to integers Dk(i) using the JPEG From the measured statistics of training sets of images
quantization matrix Q . with and without hidden information, our destination is to
The quantized coefficients Dk(i) are arranged in a zig-zag determine whether an image has been hidden information or
manner and compressed using the Huffman coder. The not. Artificial Neural Network have the ability to adapt, learn,
resulting compressed stream together with a header forms the generalize, cluster or organize data.
final JPEG file. There are many structures of Aitificial Neural Network
The decompression works in the opposite order. The including, Percepton, Adaline, Madaline, Kohonen,
JPEG bit-stream is decompressed using the Huffman coder BackPropagation and many others. Probably,
and the quantized DCT coefficients Dk(i) are multiplied by BackPropagation Artificial Neural Network is the most
Q(i) to obtain DCT coefficients QDk, QDk(i) = Q(i)Dk(i) for commonly used, as it is very simple to implement and
all k and i. Then, the inverse DCT is applied to QDk and the effective. In this work, we will deal with BackPropagation
result is rounded to integers in the range 0−255. Artificial Neural Network Neural network has an excellent
capability to simulate any nonlinear relation, so we make use
Algorithm description: of neural network to classify images [7]. In this paper we take
1. Divide the image into a grid of 8×8 blocks, skipping use of BP neural network to train and simulate images. [6]
the last few rows or columns if the image dimensions This BP neural network uses three levels: Input level, Hidden
are not multiples of 8. level and Output level. In neural network, the important issue
2. Arrange the blocks in a list and remove all saturated is the slow of convergence.
blocks from the list (a block is saturated if it has at In practice, this is the main limitation of neural network
least one pixel with a gray value 0 or 255). Denote the applications. And many new algorithms claimed fast
total number of blocks as T. convergence were developed. In this paper a single parameter
3. Extract the quantization matrix Q from all T blocks as dynamic search algorithm is used to accelerate network train.
described in Appendix A. If all the elements of Q are
Each time only one parameter to be searched to achieve best
ones, the image was not previously stored as JPEG and
performance, so this learning algorithm has a better
our steganalytic method does not apply (exit this
improvement than other old algorithms ([9, 10]). We set the
algorithm). If more than one plausible candidate exists
for Q, the steps 4−6 need to be carried out for all number of this network’s input as features, and node number
candidates and the results that give the highest number of hidden level is set to be 40, and output is either yes or no.
of JPEG compatible blocks will be accepted as the A typical BackPropagation ANN is as depicted below.
result of this algorithm. The black nodes (on the extreme left) are the initial inputs.
4. For each block B calculate the quantity S (see equation Training such a network involves two phases. In the first
(3)). phase, the inputs are propagated forward to compute the
5. If S>16, the block B is not compatible with JPEG outputs for each output node. Then, each of these outputs are
compression with quantization matrix Q. If S≤16, for subtracted from its desired output, causing an error [an error
each DCT coefficient QDi' calculate the closest for each output node].
multiples of Q(i), order them by their distance from In the second phase, each of these output errors is passed
QDi', and denote them qp(i), p=1, …. For those backward and the weights are fixed. These two phases is
combinations, for which the inequality (4) is satisfied, continued until the sum of [square of output errors] reaches
check if expression (5) holds. If, for at least one set of an acceptable value.
indices {p(1), …, p(64)} the expression (5) is
satisfied, the block B is JPEG compatible, otherwise it
is not.
6. After going through all T blocks, if no incompatible
JPEG blocks are found, the conclusion is that our
steganalytic method did not find any evidence for
presence of secret messages. If, on the other hand,
there are some JPEG incompatible blocks, we can
attempt to estimate the size of the secret message,
locate the message-bearing pixels, and even attempt to
Fig. 2 Back propogation
obtain the original cover image before secret message
embedding started.
International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE) 95

Training the network can be summarized as follows: 8. Conclusion


 Apply input to the network. In summary, each carrier media has its own special
 Calculate the output. attributes and reacts differently when a message is embedded
 Compare the resulting output with the desired output in it. Therefore, the steganalysis algorithms have also been
for the given input. This is called the error. developed in a manner specific to the target stego file and the
 Modify the weights for all neurons using the error. algorithms developed for one cover media are generally not
 Repeat the process until the error reaches an effective for a different media. This paper we conclude that it
acceptable value (e.g. error < 1%), which means that is possible to design efficient web search algorithms to detect
the NN was trained successfully, or if we reach a covert messages in corporate emails.
maximum count of iterations, which means that the
NN training was not successful. References
The program trains the network using JPEG images that [1] Ahmed Ibrahim, 2007, Steganalysis in Computer Forensics, Security
are located in a folder. This folder must be in the following Research Centre Conferences, Australian Digital Forensics
format: Conference, Edith Cowan University.
[2] Avcibas, I. Memon, N. and Sankur, B., 2003, “Steganalysis using
 There must be one (input) folder that contains input image quality metrics,” IEEE Trans. on Image Processing, vol. 12, no.
images [*.jpg]. 2, pp. 221–229,
 Each image's name is the target (or output) value for [3] Chandramouli, R., 2002, A Mathematical Approach to Steganalysis,
Proc. SPIE Security and Watermarking of Multimedia Contents IV,
the network (the pixel values of the image are the California.
inputs, of course). [4] Geetha ,S., Siva, S. and Sivatha Sindhu, 2009, Detection of Stego
Anomalies in Images Exploiting the Content Independent Statistical
Footprints of the Steganograms, Department of Information
6. Test Results Technology, Thiagarajar College of Engineering, Madurai, ,
The cover image was taken from the image database. The Informatica(25–40).
[5] Greg Goth, 2005, Steganalysis Gets Past the Hype, IEEE, Distributed
image was originally in JPEG format in 680x480 resolutions. Systems Online 1541-4922 © 2005 Published by the IEEE Computer
Since a BMP image was also required for the evaluation, a Society Vol. 6, No. 4.
second image in BMP format was generated using the same [6] Sujay Narayana and Gaurav Prasad, 2010, Two new approaches for
JPEG image. Once both the cover images have been obtained, secured image Steganography using cryptographic Techniques and
type conversions, Department of Electronics and
the proposed method generates the secret code for both the Communication,NITK,Surathkal, INDIA
images were created. The encrypted image thus obtained was [7] Liu Shaohui, Yao Hongxun, and Gao Wen, 2003, Neural network
steganographically concealed in the carrier image. based steganalysis in still images, Department of Computer Science,
Harbin Institute of Technology, ICME.
[8] Niels Provos, Peter Honeyman, 2003, Hide and Seek: Introduction to
Steganography, University of Michigan, Published by the IEEE
Computer Society.
[9] Niels Provos, and Honeyman, P.,2007, Detecting steganographic
content on the internet. Retrieved from
http://www.citi.umich.edu/u/provos/papers/detecting.pdf
[10] Samir K Bandyopadhyay, and Debnath Bhattacharyya, 2008, A
+ secretssecrets = Tutorial Review on Steganography, University of Calcutta, Senate
Cover file steganography document House, 87 /1 College Street, Kolkata, UFL & JIITU.
Fig. 3 Steganography based document

M. Rajaram, M.E., Ph.D., is a Professor and Head in


The compression ratio and detection ratio of stego content Electrical and Electronics Engineering and Computer
is also analyzed. By analyzing the images in the sampled Science and Engineering in Government College of
database the probability of occurrences of images with stego Engineering, Tirunelveli. He received B.E Degree in
Electrical and Electronics Engineering from Madurai
content in the corporate mails is zero. University, M.E and PhD degree from Bharathiyar
University, Coimbatore, in 1981, 1988 and 1994 years
7. Discussion and his research interests are Computer Science and
engineering, electrical engineering and Power Electronics. He is the author
In this paper, we have analyzed the steganalysis of over 120 Publications in various International and National Journals. 7
algorithms available for Image Steganography. The proposed PhD scholars and 10 M.S (By Research) Scholars have been awarded under
his supervision. At present, he is supervising 12 PhD Scholars. Further Dr.
mathematical web search model admits a wide variety of Rajaram has become the Vice-Chancellor of Anna University of
resource constraints. Depending on the application, Technology, Tirunelveli, Tamilnadu, India.
implementation, hardware, and steganalysis probability of
error constraints, a suitable resource model can be used to S. N. Sivanadam completed his B.E (Electrical and
derive an optimal web search strategy using the proposed Electronics Engineering) in 1964 from Government
technique. Depending on the reliability of the steganalysis College of Technology, Coimbatore and M. Sc.
(Engineering) in power system in 1966 from PSG
algorithms employed and the storage constraint one of two
College of Technology, Coimbatore (University Second
strategies, namely, coordinated search or random search can Rank). He acquired Ph.D. in Control Systems in 1982
be chosen. It is seen that for a certain range of steganalysis from Madras University. He has received the Best
reliability, both these methods give comparable performance. Teacher Award in the year 2001 and the Dhakshina Murthy Award for
teaching Excellence from PSG College of Technology. He received the
CITATION for best Teaching and Technical contribution in the year 2002,
Government College of Technology, Coimbatore. He has teaching
International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE) 96

experience (UG and PG) of over 44 years. The total number of


undergraduate and postgraduate projects guided by him for both Computer
Science and Engineering and Electrical and Electronics Engineering is
around 950. Formerly he was a Professor and Head for the departments EEE
and CSE, PSG College of technology, Coimbatore. Further he was a
coordinator for seven government funded projects. Dr. Sivanandam has co-
authored 14 books. He has delivered around 100 special lectures of different
specializations in Summer/Winter schools and also in various Engineering
Colleges. He has guided 32 Ph.D. research works and at present 10 Ph.D.
research scholars are working under him. The total number of technical
publications credited to him in various National and International journals
and Conferences is around 750. He has chaired 12 International and 12
National Conferences. He is a member of various professional bodies like IE
(India), ISTE, CSI, ACS, SSI and IEEE. He is a Technical Advisor to
various reputed industries and reputed engineering Institutions. His research
areas include Modeling and Simulation, Neural Networks, Fuzzy Systems
and Genetic Algorithms, Pattern Recognition, Multi-dimensional System
Analysis, Linear and Non-Linear Control Systems, Signal and Image
Processing, Power Systems, Numerical Methods, Parallel algorithms, Data
mining and Database Security.

P T Anitha received B.Sc. Computer Applications and


Master of Computer Applications degree from Bharathiar
University in 1993 and 1996 respectively. Presently
working as an Assistant Professor in the department of
MCA, Karpagam College of Engineering, and Coimbatore.
I am Pursuing Doctorate degree in computer Science under
the guidance of Dr. M. Rajaram, Vice Chancellor of Anna
University of Technology, Tirunelveli, Tamilnadu, India. My area of
research is Steganalysis. Published four papers in the international
conferences, 2 papers in International Journals and 11 in national
conferences. My area of research is Steganalysis. Currently I am working to
improve the performance of the steganalysis algorithms used in corporate E-
mails.

View publication stats

You might also like