You are on page 1of 4

Copyright Protection for Online Text Information

Using Watermarking and Cryptography

Nighat Mir Mohammad A. U. Khan


Computer Science Department Electrical and Computer Engineering Department
College of Engineering, Effat University College of Engineering, Effat University
Jeddah, Saudi Arabia Jeddah, Saudi Arabia
nmir@effatuniversity.edu.sa Mohammad_a_khan@yahoo.com

Abstract— Information and security are interdependent Text is used as an important medium for the
elements. Information security has evolved to be a matter of communication of information online and requires a secure
global interest and to achieve this; it requires tools, policies and mechanism against threats like reusing, disrupting,
assurance of technologies against any relevant security risks. redistributing, tampering, authenticating, copying, modifying
Internet influx while providing a flexible means of sharing the and forgery of information in an illicit way for dishonest
online information economically has rapidly attracted countless purposes [4]. Digital watermarking is an efficient way for
writers. Text being an important constituent of online copyright protection, authentication, tamper proofing for a text
information sharing, creates a huge demand of intellectual document [3, 5, 6]. In this research digital watermarking jointly
copyright protection of text and web itself. Various visible
with cryptography is used to suggest web watermarking
watermarking techniques have been studied for text documents
but few for web-based text. In this paper, web page
utilizing HTML (Hypertext Markup Language) constituents.
watermarking and cryptography for online content copyrights For this, semantic and syntactic approaches of watermarking
protection is proposed utilizing the semantic and syntactic rules have been unified with HTML for web watermarking. For
using HTML (Hypertext Markup Language) and is tested for Semantic approach, English language verbs, articles and
English and Arabic languages. prepositions have been chosen based on their high frequency of
occurrence in the language. Natural watermarks are generated
Keywords- Information security; Digital watermarking; Web based on these semantics, added security has been provided
text; Internet technologies; semantic; syntactic; cryptography, using cryptographic algorithm namely AES (Advanced
HTML Encryption Standard) and hence embedded into the HTML
code for securing copyrights of web content. For Syntactic
I. INTRODUCTION approach constructs of Arabic and English languages are used
to offer the copyright protection using the proposed approach.
Internet inflex provides a freedom of information sharing HTML <meta> tag is exploited to embed the natural
economically strengthening the concept of global community watermarks and is identified as a novel approach.
however, bringing threats to the authenticity and integrity [1].
Information is considered secure when not accessible by Digital watermarking is proved to be a mode of
unauthorized entities while distributed and accessible and used identification for the owner of data [7,8]. It aims to embed the
by only authorized people. In today’s world, information ownership information inside the data. In case of illegal use,
security management is a very crucial issue with the innovation watermark enables the assert of possession and successful
of new techniques of data destruction, namely attacking examination [9] while making the large scale distribution
communication networks, stealing confidential and secret simple and economical.
information by malicious users. Syntactic and semantic approaches based on the language
Digital watermarking, steganography and cryptography are paradigms have been used for text watermarking [10] and can
common approaches for providing information security with be prolonged towards the web content considering text as a
different applications and scopes. However, they work under dominant part of a web page. Convenience of bandwidth in
different domains. Digital watermarking and steganography are web pages can be employed optimally to hide extra
acknowledged as imperceptible techniques where cryptography information for protecting copyrights of a webpage content.
is perceptible. Information is watermarked or stegoed using an Web browsers to understand, interpret and structure text,
identification code in digital form that remains present in the image and other types of data using HTML and all the
cover file where a simple text is converted into a cipher text. browsers have the default characteristics HTML elements. Web
Watermarking process can be both visible and invisible. developers can use various tools to produce web pages, but
Cryptography stands for hidden or secret writing which is used these are eventually interpreted into HTML by the web
to protect information from third parties that can generally be browser itself. Hence, HTML is considered as a basic building
achieved by using diverse algorithms [2]. block of a web page. Any data in general and text is open to
many threats and attacks. It is observed that, intentionally or

978-1-7281-4213-5/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: University of Warwick. Downloaded on May 23,2020 at 23:38:10 UTC from IEEE Xplore. Restrictions apply.
unintentionally, illegal copying of data from web has become a files are 0(zero) width non joiners (ZWNJ) and "zero width
universal practice where people think that every available joiner" (ZWJ) letters. They have also used the Unicode
information is free to use. standards of two characters “Ya” and “Kaf” for hiding
information as a cover medium in [73]. The two characters are
II. BACKGROUND similar in shape but have different codes depending on their
occurrence as initials or in the middle of a sentence.
J. Wu and D. R in [11] have proposed (APS) Authorship
Proof Scheme based on natural language watermarks. A
predefined security level has been defined and if it is less than III. METHODOLOGY
the probability measure and is considered secure. They have
proposed a solution for catering long text and are robust. They In this research semantic and syntactic approaches of
have used meaning and literal representations to embed watermarking have been incorporated with HTML to offer web
watermarks and have also used edit distance against fault watermarking. English language verbs, articles and
tolerance. prepositions have been selected based on their high frequency
use. Natural watermark is generated based on these semantics,
Tamper proof watermarking by Qijun and Hondtao [12]
added security has been provided using cryptography and
have proposed a PCA based (principal component analysis)
further embedded into HTML code for securing copyrights of
using the letter cases (upper and lower) to embed watermarks
web content.
in HTML tags. Fei, Wang, Zhand and Li in [13] have presented
a watermarking scheme to embed different fingerprints in For Syntactic use approach has been extended further to the
XML data which can be used to trace illegal distribution. Their Arabic language where, two common initials of English
scheme attempts to reduce the modification attack and language and one from Arabic language and an important
maintains the robustness level. connecting letter. Location of embedding watermarks has also
been identified as a novel approach <meta> tag of HTML has
Shi, Kim and S. in [14] have studied approaches using zero
been utilized to embed the natural watermarks. On this basis,
watermarking for an un-trusted environment like internet by
web watermarking algorithm is designed and implemented.
offering protocols with stream ciphering to combine with
And it is also tested with different web sites to see its
watermarking for encryption and decryption. Camenish [15]
functionality, robustness and the capacity.
provided some important moves towards appropriate formal
definitions for robustness of watermarks and the core security
feature of watermark security. Given solutions are compatible
with the cryptographic definitions and security schemas.
Stanislav [16] have proposed two algorithms for hiding
information into HTML file by analysing and using the usage
of HTML characteristics. First method is to exploit the gap and
the horizontal tabulation towards the end of each line and the
second methos is to hide the whole line/lines of the HTML file,
these lines are saved in a separate file to protect the copyright
information of author or information.
Gutub and Fattan in [17] have presented a text Fig. 1 Watermarking Rules
steganographic approach for Arabic language. Subsistence of
points in Arabic letters along with repeated extension in
characters has been exploited to hold the secret information at Semantic copyright conventions to be integrated are studied
bit level. Pointed characters hold a one-bit data where un- for English grammatical rules (Verbs, Articles and
pointed characters hold a zero-bit data for Arabic content. They Prepositions) which are the structural part of any text. The
have also proposed another approach by using the extension articles, verbs and prepositions (natural language watermarks)
character "-" known as Kashida for Arabic content in [18]. used in this research come under most common and in the list
Memon, Khowaja, Kaz in [19] have proposed some feature of first 100 words in English frequency order which make up
coding techniques for Arabic/Urdu content and have about half of all the written material [22]. Rules classification
considered harakaat/Araabs i.e. Fatha, Kasra, Damma and has been presented in Fig. 1.
Reverse Fatha to hide secret information for protecting Where syntactic conventions are also selected based on the
intellectual copyrights. Due to popular use of graphs in high frequency of English and Arabic text. Specific rules have
different sectors like education, business, finance, news, been used to create watermarks which relies on the frequency
analysis etc., there is huge potential of hiding secret data in it and structural aspect of a language. Two highly occurring
[20]. Hidden information in graphs is not embedded as a noise; initials in English language frequently occurring initial letters
instead secret message is masked in plotted area of graphs. "Th", "Wh" make up many words in the context and are used to
Shirali’s [21] have used Unicode for Persian and Arabic generate watermarks. Arabic initial word "Al-‫ " الـ‬and a
language using joining letters. Two special characters, “Fatha” connecting letter "wow-‫ " و‬are noted as high frequency
to hide “1 bit” and “non-Fatha” to hide “0 bit” data producing a symbols in the literature. Initial "‫ " الـ‬occurs mostly with verbs
high capacity. These two characters which are used as cover

Authorized licensed use limited to: University of Warwick. Downloaded on May 23,2020 at 23:38:10 UTC from IEEE Xplore. Restrictions apply.
and nouns and "‫ "و‬provides a readability of structure by
connecting words and highly used in the context.
Protected Extract encrypted
Discussed idea has been described below and shown by Watermarked watermarks and
Fig. 2 and Fig. 3. Fig. 2 shows the steps to parse the text, web content decrypt (Reverse)
generate watermarks, encrypt and embed into a webpage. Text
is parsed and extracted based on the given rules of each
language to generate the natural watermarks and
encrypted/decrypted using the AES (Advanced Encryption Comparison check
Standard) cryptographic algorithm. Encrypted natural verify for between original and
watermark is embedded into a webpage utilizing one of the authenticity generated watermarks
tags of markup language named <meta> to secure copyrights of
an online author.

Fig. 3 Extraction and Verification Process

Generate
natural A. Semantic Watermarks
watermark Frequency of occurrence for verbs, articles and prepositions
has been shown in the tables below. Each table show the sets of
natural watermarks used in the research.
<meta> Embed
encrypted Letter/s and values
Encrypt watermarks List of Letter/s Frequency
natural </meta> Verbs Is 15%
watermarks
Are 34%

Table 1:List of verbs with frequencies

Create natural Register with


watermarks Certifying Letter/s and values
Authority List of Letter/s Frequency
Articles A 15%
An 23%

Extract Protected HTML


Table 2: List of articles and frequencies
defined page to be
features Communicated Letter/s and values
List of Letter/s Frequency
Prepositions Of 9%
To 23%
Screen and In 15%
parse web For 16%
text
Table 3: List of prepositions and frequencies

B. Syntactic Watermarks

Fig. 2 Parsing and Embedding Process Frequency of eminence of syntactic watermarks as shown
below in Table 4 for both Arabic and English.
Fig. 3 shows how to compare the original and later extorted Letter/s and values
watermarks as of the text itself. First the repeal procedure of
decryption is applied to convert watermarks into List of Letter/s Frequency
words Th English
understandable constituents and compared with the actual Wh English
watermark. Upon verification of matching originality is ‫الـ‬ Arabic
confirmed or else a breech to the copyrights is noted. ‫و‬ Arabic

Table 4: Syntactic rules

Authorized licensed use limited to: University of Warwick. Downloaded on May 23,2020 at 23:38:10 UTC from IEEE Xplore. Restrictions apply.
C. Algorithm [3] Saba, T., Bashardoost, M., Kolivand, H. et al., "Information security;
Digital watermarking; Web text; Internet technologies; semantic;
Constituents of watermarks are described below in syntactic", Multimed Tools Appl Springer US, August 2019.
equations of algorithm for English and Arabic text, process is [4] Nighat Mir, "Robust Techniques of Web watermarking", International
followed in implementing the algorithm steps in C# using Journal of Computer Science and Information Security,248-252, 2011.
visual studio framework. [5] Nurul Shamimi, Amirrudin Kamsin, Lip Yee Por, Hameedur Rahman,
“A review of text watermarking: theory, methods, and applications”,
IEEE Access 6:8011–8028, 2018.
[6] Mohammed Hazim Alkawaz, Ghazali Sulong, Tanzila Saba, Abdulaziz
S. Almazyad, Amjad Rehman, “Cocise analysis of current text
automation and watermarking approaches”, Journal of Security and
Communication Networks, Wiley 6365–6378, 2017.
[7] Robert, L., and T. Shanmugapriya, "A Study on Digital Watermarking
Techniques, International Journal of Recent Trends in Engineering", vol.
1, no. 2, pp. 223-225, 2009.
[8] Ingemar J. Cox, Matt L. Miller, “The first 50 years of electronic
watermarking”, NECR Research Institute 4 Independence Way
Princeton, NJ 08540, April 19, 2002.
[9] Ingemar J. Cox, Matt L. Miller, “The first 50 years of electronic
watermarking”, NECR Research Institute 4 Independence Way
Princeton, NJ 08540, April 19, 2002.
[10] Ching-Yun Chang and Stephen Clark, “Linguistic Steganography Using
Automatically Generated Paraphrases”, The 2010 Annual Conference
of the North American Chapter of the ACL, pps. 591–599, California,
June 2010.
[11] J. Wu and D.R. Stinson “Authorship Proof for Textual Document”,
Springer-Verlag Berlin, Heidelberg, ISBN: 978-3-540-88960-1, 2008.
[12] Qijun Zhao, Hondtao Lu, “PCA-based web page watermarking”,
Elsevier Science Inc., vol. 4, 2007.
[13] Fei Guo, Jianmin Wang, Zhihao Zhand and Deyi, “A New Scheme to
Fingerprint XML Data”, Springer LNCS, vol. 3915, 2006.
[14] Y.Q.Shi, H.-J Kim, and S. Katzenbcisser, “The Marriage of
Cryptography and Watermarking — Beneficial and Challenging for
IV. CONCLUSION Secure Watermarking and Detection”, Springer-Verlag Berlin LNCS
Visible watermarking has been discussed for text of Arabic 5041, 2008.
and English based on the semantics and syntactic of certain [15] J. Camenish et. El, “A Computational Model for Watermark
Robustnes”, LNCS, Springer-Verlag Belin Heidelberg, 2007.
rules. Different natural language watermarks have been used
[16] Stanislav Sergeevich Baril’nik, “Adaptation of text steganographic
in this research. Semantic and syntactic based watermarks algorithms for HTML”, NSTU, 2009.
have been proposed and implemented using C# language [17] Adnan Abdul-Aziz Gutub, and Manal Mohammad Fattani, “A Novel
Natural watermarks are combined with an author id to Arabic Text Steganography Method Using Letter Points and
generate a watermark, which is further encrypted using a Extensions”, World Academy of Science, Engineering and Technology,
2007.
public key based cryptography algorithm named Advanced
[18] Adnan Abdul-Aziz Gutub, Wael Al-Alwani, and Abdulelah Bin
Encryption Standard to protect copyrights for a web page. Mahfoodh, “Improved Method of Arabic Text Steganography Using the
Encrypted natural watermarks are rooted in the cover web Extension ‘Kashida’ Character”, (BUJICT), vol.3, Issue 1, pps 68-72,
page for the protection of authorship rights. December 2010.
[19] Jibran Ahmed Memon, Kamran Khowaja, Hameedullah Kaz ,
ACKNOWLEDGMENT “Evaluation of Steganography for Urdu /Arabic text”, Journal of
Theoretical and Applied Information Technology, 2005 – 2008.
This research is supported by Deanship of Research and
[20] Abdelrahman Desoky, Mohamed Younis, “Graphstega: Graph
Graduate Studies (DGSR), Effat University, Jeddah Saudi Steganography Methodology”, Journal of Digital Forensic Practice, vol.
Arabia. 2, January 2008.
[21] M. Hassan Shirali-Shahreza, Mohammad Shirali-Shahreza,
REFERENCES “Steganography in Persian and Arabic Unicode Texts using pseudo-
space and pseudo Connection Characters”, Journal of Theoretical and
Applied Information Technology, 2005-2008.
[1] Nighat Mir, "Copyright for web content using invisible text [22] J. Zhang, Q. Li, C. Wang, and J. Fang, “A novel application for Text
watermarking", Computers in Human Behaviour,648-653, 2014. Watermarking in digital reading”, in Proceedings of International
[2] Nighat Mir, Sayed Afaq Hussain, "Secure Web-based communication", Conference on Artificial Intelligence and Computational Intelligence,
Procedia Computer Science, Elsevier,556-562, 2011. Shanghai, China, LNCS 5855, pp. 103-111, 2009.

Authorized licensed use limited to: University of Warwick. Downloaded on May 23,2020 at 23:38:10 UTC from IEEE Xplore. Restrictions apply.

You might also like