You are on page 1of 4

2016 Second International Conference on Computational Intelligence & Communication Technology

Analysis of Different Text Steganography Techniques


: A Survey
1
Shivani Sharma, 2 Dr. Avadhesh Gupta, 3 Munesh Chandra Trivedi, 4 Virendra Kumar Yadav
1, 3, 4
Department of Computer Science & Engineering
1, 3, 4
ABES Engineering College, 2Institute of Management Studies
1, 2, 3, 4
Ghaziabad, India
1
shivanisharma028@gmail.com, avadhesh.gupta@imsgzb.com, 3muneshtrivedi@gmail.com, 4virendra.yadav@abes.ac.in
2

Abstract—Steganography is helping individual to send While designing a secure steganographic system, following
confidential data between two parties. It enables user to hide data points were considered: (i) System is single, practical, stand
in different digital mediums. Steganography is of many types alone should meet user requirement of confidentiality,
such as image steganography, text steganography, audio/video authenticity and integrity (ii) User should be provided by a
steganography etc. Text Steganography is quite difficult than
covert channel to hide secret communication, (iii) Based on
other techniques because of less amount of redundancy and
changes can be detected quite easily. Some of the techniques of the technical and physical requirements user should be able to
text steganography has been discussed along with characteristics have a balanced parameter selection option [3].
and working. Before implementing any steganographic technique both
sender and receiver must agreed on a mutual key exchange
Keywords—linguistic; steganography; cryptography; statistical; mechanism. There are several text steganography techniques,
metamorphic. some of them are discussed in this paper.

I. INTRODUCTION
B. Text Steganography
Information is an important asset of mankind, whose security is
Text steganography is broadly divided into following three
an essential concern. Risk increases if working on real time
categories [4] such as: (i) Format Based Methods - In this
systems which include banking system, railways, flights etc.
method text data is embedded in the cover text by changing
Chances of attack increases when we transmit data via internet.
the formatting of the cover text. This can be done by resizing
Several types of attacks are possible such as eavesdropping,
the font size, inserting spaces between words, non-displayed
man in the middle attack, phishing attack, denial of service etc.
characters (ii) Linguistic Methods: Linguistic analysis is done
So to secure our data, we are left with three main solutions
in this method (iii) Random and Statistical generation method:
which are by using a private dedicated channel, cryptography
Comparison is not done with the known plain text and most
and steganography. Private dedicated is time consuming and
stenographers generates their own cover texts.
user is restricted to a physical point. Cryptography moulds the
message in some other form. Duo of cryptography and Text Steganography is the most difficult kind of
steganography can also be used which are known as steganography because a text file lacks a large scale
metamorphic cryptography [1]. redundancy of information in comparison to other digital
medium like image, audio and video [5]. The structure of the
A. Steganography text document remain same throughout i.e. text document file
is transparent during saving, written and retrieval phase. While
Steganography is an art of hiding data inside any digital embedding data in a text file, the main concern is its structure,
medium like audio, image, video, text, protocol etc which should not change. If the structure is changed, whole
meaning of the text file changes while in other digital mediums
[2].Frequent terms used in steganography are:
changes can be done easily without making any notable change
in the concerned output. Many languages are used to hide data
1) Cover Object: Text, audio, video, image used for embedding like Persian, Arabic [6], Hindi, English etc. English Language
data is known as cover object. It is also known as vessel object. possess some characteristics such as, inflexion, use of
2) Secret data/message: The data which is to be embedded in a periphrases and fixed word order. Inflexion means that with
cover object is known as secret message. minimum change of shape, the relationship of words into a
sentence can be indicated. Periphrases enable to express
3) Stego Object: It is the resultant output obtained after something in different ways. In fixed order, the relationship of
embedding which is known as stego object. a Ease of Use.

978-1-5090-0210-8/16 $31.00 © 2016 IEEE 130


DOI 10.1109/CICT.2016.34
II. RELATED WORK 3) Change Of Spelling
In [13] author has used this method to embed secret data in a
Text steganography methods are broadly classified into two
text file. They presented a method to exploit same words
groups:
which are spelled differently in American and British English
in order to hide secret message bits. The method which
A) Changing the format of the text: The format of the text file
changes the format of the text can hide large amount of data.
is altered in this method.
Table below shows some words that have different spellings in
UK and US.
B) Changing the meaning of the text: The main focus of this
method is to change the meaning of the text. Table 3. Word Spelling method

There are limited methods based on changing the meaning of American English British English
the text. So our main focus is to describe the working of Airplane Aeroplane
changing the format of the text.Some of the methods are Fiscal Financial
described below. Unalike Unlike
A. Changing the Format of the Text
1) Semantic Method 4) Open Spaces Or White Spaces

Semantic stands for meaning of something also known as This method works by inserting spaces in a cover text file. The
synonym of a certain word. This method hides data by using methodology of this technique is, if one space is inserted
synonym of a word. Synonym substitution may hide single bit inside cover text then hidden bit is ‘0’, while two consecutive
or multiple bit of secret information. In case of retyping or spaces represents ‘1’ at the end of the sentence or vice versa.
OCR programs this method provides protection of White spaces can be inserted at the end of line, paragraph or
information. Sometimes meaning of the text is altered by between words or sentences. The inserted white space does
using this method [7][8][9]. M. Hassan Shirali-Shahreza [9] not create any suspicion in the mind of steganalyst. There are
have used semantic method for embedding secret message in a some text editor programs which automatically delete extra
text file. space while doing formatting because of that hidden
information is destroyed [8]. Inserting white spaces between
Table 1. Semantic method HTML tags does not affect viewing the source or visibility of
the web content. The drawback of this technique is due to
WORD SYNONYM insertion of spaces size of the text file increases and a little
Lazy Idle amount of data can be hidden.
Hard Difficult
Unhappy Sad 5) Format Based Text Steganography Method
Sangita Roy et. al [14] proposes a novel approach of format
based text steganography by using the combination of two
2) Text Abbreviation or Acronym popular text steganography methods word shifting and line
Abbreviations and acronym are used for hiding data. The shifting methods along with copy protection technique with
high capacity of the cover object. By using the above methods,
target word is replaced by its acronym like as soon as possible
this approach embeds data in binary form rather than character
is replaced by ASAP etc. This technique is mostly used in
format. More than one bit is embedded in each line of cover
SMS, social networking applications and sites. Mohammad text so this method has good hiding capacity and this method
Sirali-Shahreza and M.Hassan Shirali Shahreza from Iran have posse less distortion in the cover text. This method is hybrid as
used this technique in [10]. Less data can be hidden in a file of it fuses two methods (Line shifting and word shifting) and uses
several kilobytes [7][11] by using this method. This method a special character for performing text steganography.
can also be used to reduce the size of the secret data text file
[12] and then steganography is applied by using some other a) Encoding Procedure: Secret message and cover object is
method. taken as input and then encoding is applied. The secret
message bit is counted to check whether it is even or odd. If the
Table 2. Abbreviation or acronym method no. of bits is odd then “0” is added to the left, otherwise no
change. Then the secret message is divided into no. of blocks,
ACRONYM WORD each of 2 bits size. The no. of blocks is stored into an array.
ID Identification Embedding is done by finding next embedding position in a
block. These four cases arise while embedding secret message:
DOB Date Of Birth
ASAP As Soon As Possible (i) If block[i] = ‘00’, then line shifting method is applied. Go
to the end of line and shrink procedure is called, which will
shrink the font size of the line.

131
(ii) If block[i] = ‘11’, go to the end of line and expand also make sure that all the HTML tags are closed
procedure is called, which will expand the font size of the line. properly.There are sevral approaches to hide data like table
Driven Approach, Lexicographic Approach etc.Table driven
(iii) If block[i] = ‘01’, embed ‘0’, use two spaces instead approach is applied to tags having two or more attributes. One
of one space between two words. attributes is used as the key and the other is secondary. A
(iv) If block[i] = ‘10’, embed ‘1’, use an extra space database is created for pairing key and secondary value.
before any special character. If no special character is there, Lexicographic approach is more efficient than table driven
add one special character to embed ‘1’. approach as the latter can encode at most n/2 bits of
information where n is the number of attributes associated
Same procedure is applied on other block and the resultant with the tag.Lexicographic approach can hide n-1 bits of
is stego text. This method is robust and with respect to the information which is almost twice as that of table driven
existing algorithm, a large amount of data can be hidden. It approach.
requires maximum of four inter spaces. The requirement can be
further reduced by using combination of ‘00’ and ‘11’ bits.
This algorithm is best for centrally aligned messages, better in Table 4. Lexicographical vs. Table Driven HTML Tags in different websites
right and left justified and worst in justified.
b) Decoding Procedure: The stego text is stored in an array
and scanning of stego text is done by using ORC software.
Then all the spaces will be used for extraction purpose. Four
cases arise while extraction:
(i) If two spaces occur simultaneously, then extract ‘01’.
(ii)If a special character is present after a space in the stego
text, then extract ‘10’.
(iii)If line size is less than standard line size, then extract
‘00’.
(iv)If line size is greater than standard line size, then extract
‘11’.
7) XML Document
All the extracted bits are combined into an array. Resultant
is the secret message. In this method secret text is compressed XML is an acronym of Extensible Mark-up Language
and then embedded by using proposed algorithm. This (XML).It is a platform independent language which is
technique encodes ‘0’ by using single space and ‘1’ by using universal in nature. It is mainly used for storing, exchanging
two spaces. This method is fully dependent on the format and transferring information electronically.XML documents
(structure) of the text and can be used for preventing illegal are light weighted and can be used on internet and in
duplication and distribution of text especially electronic data. It messaging. Any user can manipulate the content of XML
has a major disadvantage as many word processing software document. The prime concern in XML documents is security.
remove spaces from the text file, which destroys secret It can be ensured by using different techniques which
message. This method can also be applied to hard copy guarantees integrity and confidentiality.XML documents can
documents. The time complexity of the proposed method is O
be used as the cover medium for text steganography
(n2) whereas the time complexity existing algorithm is O (n).
purpose[17]. When secret data is embedded in XML document
it cannot be altered, traced back or intercepted back to the
6) HTML Tags sender.XML documents follows a database like format. SGML
HTML Tags finetune their effects by using attributes which (Standard Generalized Markup Language) and HTML (Hyper
can be in any order.Steganography can be performed by using Text Markup Language) are also used to send information over
this ordering.In [15] author has used the idea of hiding secret internet .XML is a shortened version of SGML. XML enables
message by using convention of these attributes.They had transmission, validation, definition, interpretation of data
developed a text steganographic technique in HTML using between heterogeneous applications and computing
attribute reordering. This reordering does not add or remove platform.XML deals with providing framework for tagging
any content in the files.While using HTML files ceratin structured data. XML provides flexible document definition
constraints should be taken care of so that the secret message and processing capabilities. One of the special feature of XML
should remain undetectable like size of the HTML file should document is flexibility i.e. user can do formatting of data to be
not be modified and its display should not be effectd either in displayed on multiple devices and platforms. Performing
plain text formant or in web browser.Disadvantage of using steganography on XML documents is efficient as it has been
HTML is lack of redundant bits.Size of the HTML file is used widely for exchanging data as well as it has been
directly proportional to the message size. The author has used considered as a language of digital contents and web pages.
HTMLTidy [16] as the HTML parser for implementing The author in his paper [17] has discussed four methods of
steganography [13] which is used for cleaning HTML files and performing steganography on XML documents. The first

132
technique hides data by inserting random characters in between International Conference on Intelligent System and Knowledge
Engineering.
XML tags and their values. This technique is known as
Random Character Technique, insertion of random characters [11] M. Hassan Shirali-Shahreza, and Mohammad Shirali-Shahreza. 2007.
Text Steganography in Chat. IEEE.
increases from 1 to n after each word of the tag. This process is
[12] Shivani , Yadav.V , Batham.S ,“ A Novel Approach of Bulk Data Hiding
repeated till full stop (.) is encountered. This process is then using Text Steganography”,accepted in Elsevier ICRTC 2015 , in press.
applied recursively to all tags in a XML document. Second [13] Khan Farhan Rafat ,”Enhanced Text Steganography By Changing
technique performs shuffling of tags which occurs in a Word’s Spelling”, FIT’09, December 16–18, 2009, CIIT, 2009, ACM.
predetermined sequence. in this 1st tag is swapped with the last [14] Sangita Roy, Manini Manasmita , “A Novel Approach to Format Based
tag, 2nd tag with second last tag. Same procedure is repeated till Text Steganography”, ICCCS’11 , February 12–14, 2011,ACM.
all the tags are swapped. Position as well as value of the tag is [15] Sudeep Ghosh , StegHTML: A message hiding mechanism in HTML
swapped. Third procedure is known as Attribute Specified tags, December 10,2007,http://www.cs.virginia.edu/~skg5n/main.pdf.
Shuffling of Tags which saves the order of tags in attributes [16] D. Raggett. Htmltidy. In tidy.sourceforge.net, 2004.
before shuffling. Last technique is known as Reverse Character [17] Aasma Ghani Memon, Sumbul Khawaja and Asadullah Shah
“Steganography: A New Horizon For Safe Communication Through
Technique in which sequence of characters in a tag is reversed, Xml”, Journal of Theoretical and Applied Information Technology, 2005
for example if the tag is ‘width’ then it will be reversed as - 2008 JATIT.
‘htdiw’. After reversing the tag its value is also been reversed. [18] V. K. Yadav, et al. “Zero Distortion Technique: An Approach to Image
Procedure is repeated till full stop (.) is encountered. Steganography on color images”. In Proc. International Conference on
Information and Communication Technology for Competitive Strategies,
III. CONCLUSION ICTCS '14, November 14 – 16 pages 79-83 (Published by ICPS-ACM,
Proceedings Volume ISBN No: 978-1-4503-3216-3).
Several research work is carried out in the area of text [19] V.K. Yadav, et al. “ICSECV: An Efficient Approach of Video
steganography. With the advancement of technology and tools Encryption”. In Proc. Contemporary Computing (IC3), 2014 Seventh
available, it is now essential to develop the some International Conference, 7-9 Aug. 2014, Pages: 425 – 430.
steganography algorithm which can withstand against the [20] V.K.Yadav, et al. “Zero Distortion Technique: An Approach to Image
attacks. Steganography using Strength of Indexed Based Chaotic Sequence”. In
SSCC-2014, symposium proceedings published by Springer in
Communications in Computer and Information Science Series(CCIS),
ACKNOWLEDGMENT Volume 467, 2014, pp 407-416, ISSN: 1865:0929.
I would like to thanks Prof. Anuja Kumar Acharya, B.M.
Mehtre, Prof Munesh Chandra Trivedi for the kind of support
and discussions.

REFERENCES

[1] Thomas Leontin Philjon. J, Venkateshvara Rao. N , “Metamorphic


Cryptography -A Paradox between Cryptography and Steganography
Using Dynamic Encryption”,IEEE 2011.
[2] Westfeld A, J. Camenisch et al., “Steganography for Radio Amateurs—
A DSSS Based Approach for Slow Scan Television”, Springer-Verlag
Berlin Heidelberg, pp. 201-215.
[3] Mohammad Shirali-Shahreza, “Text Steganography by Changing Words
Spelling”, ISBN 978-89-5519-136-3, Feb. 17- 20, 2008, ICACT 2008.
[4] Krista Bennett (2004). " Linguistic Steganography: Survey, Analysis,
and Robustness Concerns for Hiding Information in Text". CERIAS TR
2004-13.
[5] Shraddha Dulera, Devesh Jinwala and Aroop Dasgupta,
“EXPERIMENTING WITH THE NOVEL APPROACHES IN TEXT
STEGANOGRAPHY” , International Journal of Network Security & Its
Applications (IJNSA), Vol.3, No.6, November 2011
[6] M. H. Shirali-Shahreza, M. Shirali-Shahreza, “A new approach to
persian/arabic text steganography,” Proc. 5th Int. Conf. Computer and
Information Science, Washington, 2006, pp.310-315.
[7] Khan Farhan Rafat,"Enhanced Text Steganography in SMS”,2008, IEEE.
[8] Mohammad Shirali-Shahreza, and M. Hassan Shirali- Shahreza .2007.
“Text Steganography in SMS”, International Conference on
Convergence Information Technology.
[9] M. Hassan Shirali-Shahreza, and Mohammad Shirali-Shahreza. 2008. “A
New Synonym Text Steganography”. International Conference on
Intelligent Information Hiding and Multimedia Signal Processing, 978-
0-7695-3278-3/08 © 2008 IEEE.
[10] Mohammad Shirali-Shahreza, and Sajad Shirali-Shahreza,2008.
“Steganography in Text Documents”, Proceedings of 2008, 3rd

133

You might also like