Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
3Activity
0 of .
Results for:
No results containing your search query
P. 1
Robust Techniques of Web Watermarking

Robust Techniques of Web Watermarking

Ratings: (0)|Views: 173 |Likes:
Published by ijcsis
Internet is an attractive, rapid and economical way of electronic information distribution. With advent and tremendous growth of Internet, information is going paperless and is transforming into electronic information over the paper distribution. But it also makes protection of its intellectual property very difficult. Once the information is available on the Internet, it’s open to any threats like illegal copying, distribution, tampering and authentication. Intellectual rights for the information available on web are a serious issue. In this paper natural language digital watermarks are proposed for the web based electronic data. And a problem of investigating the authorship of web based text/data is investigated with a improved security. Several robust techniques of web page imperceptible digital watermarking using Verbs, Articles and Prepositions are studied for the protection of content available on www. On this basis, web watermarking algorithm is designed and implemented. A key consisting of natural watermarks along with a unique author id (issued by the CA) is integrated to any content to be published on the web. The key to be integrated is further encrypted suing AES (Advanced Encryption Standard) to add another layer of security. And it is also tested with different web sites to see its functionality and robustness.
Internet is an attractive, rapid and economical way of electronic information distribution. With advent and tremendous growth of Internet, information is going paperless and is transforming into electronic information over the paper distribution. But it also makes protection of its intellectual property very difficult. Once the information is available on the Internet, it’s open to any threats like illegal copying, distribution, tampering and authentication. Intellectual rights for the information available on web are a serious issue. In this paper natural language digital watermarks are proposed for the web based electronic data. And a problem of investigating the authorship of web based text/data is investigated with a improved security. Several robust techniques of web page imperceptible digital watermarking using Verbs, Articles and Prepositions are studied for the protection of content available on www. On this basis, web watermarking algorithm is designed and implemented. A key consisting of natural watermarks along with a unique author id (issued by the CA) is integrated to any content to be published on the web. The key to be integrated is further encrypted suing AES (Advanced Encryption Standard) to add another layer of security. And it is also tested with different web sites to see its functionality and robustness.

More info:

Published by: ijcsis on Mar 14, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

04/13/2011

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
9
, No.
2
, 201
1
 
Robust Techniques of Web Watermarking
Using Verbs, Articles and Prepositions
Nighat Mir
College of EngineeringEffat UniversityJeddah, Saudi Arabianighat_mir@hotmail.com 
Abstract
 
Internet is an attractive, rapid and economical way of electronic information distribution. With advent and tremendousgrowth of Internet, information is going paperless and istransforming into electronic information over the paperdistribution.But it also makes protection of its intellectual property verydifficult. On
ce the information is available on the Internet, it’s
open to any threats like illegal copying, distribution, tamperingand authentication. Intellectual rights for the informationavailable on web are a serious issue.In this paper natural language digital watermarks are proposedfor the web based electronic data. And a problem of investigatingthe authorship of web based text/data is investigated with aimproved security. Several robust techniques of web pageimperceptible digital watermarking using Verbs, Articles andPrepositions are studied for the protection of content available onwww. On this basis, web watermarking algorithm is designed andimplemented. A key consisting of natural watermarks along witha unique author id (issued by the CA) is integrated to any contentto be published on the web. The key to be integrated is furtherencrypted suing AES (Advanced Encryption Standard) to addanother layer of security. And it is also tested with different websites to see its functionality and robustness.
Keywords- Digital Watermarking, Verbs, Articles, Prepositions,encryption, HTML, AES, CA.
I.
 
I
NTRODUCTION
Internet is an attractive, rapid and economical way of electronic information distribution. With advent andtremendous growth of Internet, information is going paperlessand is transforming into electronic information over the paperdistribution.But it also makes protection of its intellectual property very
difficult. Once the information is available on the Internet, it‘s
open to any threats like illegal copying, distribution, tamperingand authentication. Intellectual rights for the informationavailable on web are a serious issue.Different techniques are used for securing information likesteganography, cryptography and watermarking but adoptingdifferent ways. Steganography hides the existence of information and makes it imperceptible for a viewer. A covermedium is used as a carrier in which secret data is embeddedthat the intended recipient is the only one to know the existenceof secret message [1].Cryptography encrypts the information using a key and theparty having a key can only decrypt and reveal the message.So, people are aware of an existence of some hiddencommunication. It makes data unreadable by writing into secretcode and it ensures authentication, confidentiality and integrity[2].Where, watermarking is a process of embedding secretinformation into a digital signal to identify the owner of thatmedia [3].In this paper, several robust techniques of web page digitalwatermarking using common Verbs, Articles and Prepositionsare studied for the protection of content available on www. Onthis basis, web watermarking algorithm is designed andimplemented. And it is also tested with different web sites tosee its functionality, robustness and the capacity.Internet contains different types of data i.e. image, video,audio and text. Based on this organization digital watermarkingmay be classified as image watermarking, video watermarking,audio watermarking, and text watermarking. But the basicprinciples are motives are same to secure the informationagainst different threats. Unauthorized copying, propagationand tampering are very common attacks and are difficult toovercome. A lot of research has been done on different types of data but web based text has not been highlighted in this effect.In view of the fact that digital contents are easy to copy orprocess, they are likely to be wrongly used. A digitalwatermarking method is one of the efficient countermeasuresagainst such wrongness and can be categorized into perceptibleand imperceptible techniques. Many perceptible techniqueshave been studied for the text but few imperceptible techniquesare available for the electronic text.Digital watermarking is proved to be a mode of identification for the creator, owner or distributor of data. Itsaim is to make the data beyond dispute. In case of illicit use,the watermark facilitates the claim of ownership and successfulexamination. It makes large scale distribution simple andeconomical.Hyper Text Markup Language (HTML) is used by webbrowsers to understand, interpret and structure text, image andother types of data. All web browsers have the defaultcharacteristics of every item of HTML. Web developers canuse different languages and tools to create web pages but theseare further interpreted into HTML by all the web browsers.
248 http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
9
, No.
2
, 201
1
 
Hence, HTML is a basic building block of web pages but thegeneral source code of these pages is easily available on asingle right-click of view source. Any data in general and textin particular is open to many threats and attacks. It is observedthat intentionally or unintentionally illegal copying of datafrom the internet has become a universal practice and has agreat effect on the privacy of information and copyright is nomore an optimal solution. Digital Watermarking methods areconsidered a strong mechanism to identify the original ownerand to prove the intellectual property. Imperceptible digitalweb page watermarking techniques can provide solutions forthe intellectual property of content available on these pages.In Digital watermarking a hidden marker is embedded tothe data which is generally un-observable and can be onlydrained by special detector. The goal is not to change theoriginal characteri
stics but to use the human‘s insensitive
perceptual organs.With the ever increasing growth of internet users all overthe world, it is very important to secure the web pages and itscontent. Unlike other forms of carriers, there is a widebandwidth present in web pages for information hiding orembedding watermarks and many robust techniques can bedeveloped for web page watermarking. Web pagewatermarking is to achieve the integrity of web pages which isa very popular and rich source of information.II.
 
R
EALTED
W
ORK
 J. Wu and D.R in [4] have proposed APS Authorship Proof Scheme based on natural language watermarks. A predefinedsecurity level has been defined and as long as it is less than theprobability measure and is considered secure. They haveproposed a solution for catering long text and are robust. Theyhave used meaning and literal representations to embedwatermarks and have also used edit distance against faulttolerance.Qijun Zhao, Hondtao Lu [5] have proposed scheme for thetamper proof web pages in which watermarks are generated onthe basis of the Principal Component Analysis (PCA)technique. Upper and lower cases are considered forembedding watermarks in to HTML tags.Fei, Wang, Zhand and Li in [6] have presented awatermarking scheme to embed different fingerprints in XMLdata which can be used to trace illegal distribution. Theirscheme attempts to reduce the modification attack andmaintains the robustness level.Shi, Kim and S. in [7] have studied approaches for secureembedding and detection of a watermark in an un-trustedenvironment. They have considered Zero-KnowledgeWatermark Detection (ZKWMD) protocols for authorshipproof and a Chameleon-like stream cipher that achievessimultaneous decryption and fingerprinting of data tracingillegal distribution of broadcast messages.Some further techniques have also been proposed in [8] and[9] based on HTML web files. Mohammed and Sun in [8] haveproposed some digital watermarking techniques for HTMLpages where they have focused on exploiting white space, linebreaks, attributes ordering, string delimiter and color values.All above mentioned techniques were just proposed and notimplemented however, some of these have been tested to show
sample results. Ala‘a and Mazin in [
9] have also used HTMLfiles to achieve secret communication. They have exploitedwhite space to hide a secret data in an HTML file and havefurther encrypted by using colored data by using DataEncryption Standard Algorithm.Wu, Jiwu, Huang, and Shi in [10] have proposed a self-synchronization algorithm for audio watermarking to facilitateassured audio data transmission. The synchronization codes areembedded into audio with the informative data, thus theembedded data have the self-synchronization ability. Theyhave embedded the codes and hidden informative data into thelow frequency coefficients in DWT (discrete wavelettransform) domain.Hasan in [11] have explored the morpho-syntactic tools fortext watermarking and develops a syntax-based naturallanguage watermarking scheme for Turkish language. Theunmarked text is first transformed into a syntactic tree diagramin which the syntactic hierarchies and the functionaldependencies are coded. The watermarking software thenoperates on the sentences in syntax tree format and executesbinary changes under control of Word-net to avoid semanticdrops.Chang and Clark in [12] have described a method forchecking the acceptability of paraphrases in context. They haveused Google n-gram data and a CCG parser to certify theparaphrasing grammaticality and fluency. In which they havecollected the human findings for the evaluation and haveintegrated text paraphrasing into a Linguistic Steganographysystem, by using paraphrases to hide information in a covertext.Zhu and Sang in [13] watermarking programs based on thediscrete cosine transform (DCT) domain DC component (DC)has been adopted. Through adjusting the block DCT coefficientof the image the watermarks are hidden. And blocking theselected image according to 8×8 pixel, then dividing theselected image into four non-overlapped sub image blocksaccording to 4×4 pixel, and thus the watermarks are embeddedthrough adjusting their DCT coefficient.Kim, Moon and Oh in [14] have proposed an idea of usingword classification and inter word space statistics. They havesegmented the words to add information in to text content bymodifying the statistics of inter word space.Meral, Unkar, Sankor, OZ and Gunor in [15] have exploredthe morphosyntatic tools for text watermarking and have comeup with a syntax based natural language watermarks. Theyhave developed the system for Turkish language, in whichsyntax free format sentences are executed into binary changes
249 http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
9
, No.
2
, 201
1
under wordnet to avoid semantic drops. The algorithmtransforms the raw sentences into their Treebank representationand syntactic tree by randomizing their occurrences.III.
 
S
YSTEM
M
ODELS
 IV.
 
P
ROPOSED
M
ETHODOLOGY
 When an author/writer contributes his/her text to the web,then one needs to protect his/her intellectual rights. In thispaper, the copyright conventions to be integrated are studied inlight on English grammatical rules (Verbs, Articles andPrepositions) which are the structural part of any text. Thearticles, verbs and prepositions (natural language watermarks)used in this research come under most common and first 100words in English in frequency order. And that make up abouthalf of all the written material. Below there is a composite tableas well as separate tables with respect to their frequencies.To publish and keep the copyrights a key is given to anauthor so that whenever an author publishes something on web,he/she needs to integrate this key along with the content to bepublished. Key is the main part mart and it constitutes of manythings. To make a key first need to have a unique author idfrom the CA (Certified Authority) and then natural watermarksare added to this author id to make a key.Key = (


(1)WhereA=ArticlesV=VerbsP=PrepositionsAID = Author IDLength= size of author id and watermarksNatural Language Watermarks (NLW) are extracted fromthe content. Depending on the numbers of these NLW and keywill be constructed. Each time a different key can be generatedfor the publishing but with the same author id as its uniquelygenerated. So far the size of key and author id is not restrictedto any specific length but can be taken into consideration.
CA can be a registered company issuing ID‘s or can also be
regulated by the website owners.So, in brief a unique author id is concatenated with threesets of natural watermarks (verbs, articles and prepositions) togenerate a secret key which is further encrypted using acryptographic algorithm AES (Advanced Encryption Standard)before adding it to a webpage.The sets of natural watermarks used are:
A.
 
List of most frequently used verbs in English:
List of Verbs
 
Letter/s and values
 Letter/s Frequencyis 15%are 34%
TABLE 1: Verbs and Frequencies
B.
 
List of most frequently used indefinite articles in English:
List of articles
Letter/s and values
Letter/s Frequencya 15%an 23%
TABLE 2: Articles and FrequenciesFigure 1: Embedding Phase
 
Figure 2: Extraction Phase
 
250 http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->