You are on page 1of 6




A Novel Approach to Protect Web Content using Barcode

S.Govinda Rao, B.Srinivas, P.Satheesh and Dr. A Govardhan
Abstract Information can be termed as the root of any action as well as performance. It is the door that breaks the inertia of plans to actions in an appropriate manner. In these leading times with the usage of e-format, a mandatory agent for thousands of transactions, it becomes very important to simultaneously develop the corresponding measures for the security of information that in other words enhances the confidentiality, security and a promise of scope for furnished development. Phishing is a form of online theft, which is used to steal the users information such as email ids and passwords. This type of fraud has grown into one of the most effective scams over the time producing a huge amount of fraud in the phishing attacks where 90% filtered by spam filters, 50% of people who get the email and eventually open it, 10% of phishers read the email and click on the link to attack the web page thereby breaching the door to security and destroying the security of all the information existing on the respective affected system. Many anti-phishing solutions have been developed like content analysis and html analysis, however these techniques were failed to investigate now a days phishers are composing phishing with no analyzable elements like, images, layouts and URLs. Many methods have been developed to avoid the phishing attacks over the time, but the challenge of standing up to the wits of phishing attacks has also increased as the exponentially developing technology is a tool to every user of it ranging from the preventer to te attacker thereby demanding very quick-witted, efficient and dynamic techniques to withstand the corresponding phishing attacks. The method developed in this paper is message based which uses md5 message-digest algorithm. It takes message of arbitrary length as input and produces 128 bit finger print or message-digest as output. Md5 algorithm has been widely used secure hash algorithm particularly in internet standard message authentication. Hash function compress a string of arbitrary length to a string of fixed length, hence they provide a unique similarity between the input and hash code. Index Terms webpage Protection, MD5, Barcode, Phishing.

hishing is a typical attack, which sends a large number of spoofed e-mails to random internet users which seems to be coming from well-known organizations. This spoofed mail advises the victims to update his personal information as a condition that the phishers will access the rights of specific services. By clicking on that link, the victim will be directed to a false website which is implemented by the attacker. The phishing web site is structured same as like that of original website, that the victim is cant able to distinguish which one is original. According to APWG [4], the phishing is defined as online theft that uses both social engineering and technical exploits to steal customers personal information and financial account details. Many methods have been developed to avoid the phishing attacks but none of those methods have been able to efficiently withstand the tide of harm that the phishing

S.Govind is with Department of Computer Science and Engineering,GRIET, Hyderabad, Andhrapradesh. B.Srinivas is with Department of Computer Science and Engineering, MVGR College of Engineering, vizianagaram, Andhra pradesh. P.Satish is with Department of Computer Science and Engineering, MVGR College of Engineering, vizianagaram, Andhra pradesh. Prof. A. Govardhan is with Department of Computer Science and Engineering JNTUH, Hyderabad, Andhrapradesh.

attacks present. San Martino presents the multi factor mutual authentication for securing e-banking applications. This authentication model had been designed for easy applicability in the current Internet banking systems. The central theme of this technique is the reduction of phishing attacks through increase in the security level of the Web banking environment by following the rules defined in this respective proposal. In order to further improve the rate of development, Politecnico di Milano presents a new perspective in the form of DOMAntiPhish. This novel approach is introduced to find the layout similarity information to distinguish between malicious and benign web pages. This approach makes it possible to reduce the involvement of the user thereby significantly declining the false alarm rate. Anthony Y. Fu had proposed an effective approach for detecting phishing Web pages using an algorithm, EMD (Earth movers Distance) to calculate the visual similarity between the Web pages. In this approach, web pages are collected from URLs which are further converted into normal images. Now their image signatures are represented with a dominant color category. The EMD is applied on the signatures to obtain a visual similarity computation of the web page. If the EMD-based web page exceeds the threshold of protected web page, that web page is classified as a phishing web page. These efforts were further enhanced by the Mallikka, Rajalingam who had implemented the effective



image based anti-phishing scheme based on discriminative key point features in Web Pages. This approach uses content descriptor, the Contrast Context Histogram (CCH), to compute the degree of similarity between the suspicious pages and authentic pages. The common approach usually involved is computing the distance between the vectors, which is taken as the degree of visual difference between two images. This approach achieves high accuracy and low error rates.

Phishing is a form of internet threat wherein the attackers try to trick consumers into divulging sensitive personal information. Financial institutions are at risk of being exposed to large numbers of fraudulent transactions through the wrong usage of stolen information. Phishing attacks are often very large-scale events that target thousands of consumers, or more, in the hope that a percentage will be captured into responding. A relatively large percentage of recipients do respond to the E-mails since they appear legitimate and their authenticity cannot be checked easily. Attackers can easily copy images, links, and text from legitimate web sites to make the Email appear authentic. The scale of the attacks rise a potential for huge financial loses. Certain attacks involve one or moremillion phishing E-mails. As noted by the AntiPhishing Working Group [APWG], customers of many banks and financial institutions have been the targets of phishing attacks. Customers of various other fields of businesses have also been targeted as victims of identity theft operations. Many anti-phishing solutions have been developed, such as content analysis and HTML code analysis, to detect fake web pages. However, these techniques have failed, as phishers are now composing phishing pages with non-analyzable elements, such as images, layouts and flash objects.

2.2 Layout-similarity-based anti-phishing This system works on mitigation of phishing attacks and a significantly improvement in the respective field. This approach is called as DOMAntiPhish. Anti-Phish is browser plug-in based that keeps track of the sensitive information that the user enters into web forms [11]. Whenever a piece of sensitive information that is associated with one site is entered on another site, an alert is generated. DOMAntiPhish detects that a password that is associated with a certain domain is reused on another domain. In this case, the system does not immediately raise an alert but compares the layout of the current page with the page where the sensitive information was originally entered. For this purpose, the Document Object Model (DOM) of the original web page and the new page are compared. When the system determines that these pages have a similar appearance, a phishing attack is assumed. Thus, their layouts are expected to be similar. When the layouts of the two pages are different, we assume that the password is reused on a legitimate page. 2.3 Textual Content-Based Anti-Phishing This approach works on textual and visual contents to measure the similarity between the protected web page and suspicious web pages. This Anti-Phishing technique is based on text classifier, image classifier and fusion algorithm. The text classifiers are used as they are able enough to classify a given web page into corresponding categories as phishing or normal based on naive Bayes rule. Image classifier is used to measure the visual similarity between the web pages based on earth movers distance and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from visual and textual content.

3.1 Signature Generation The Key generation algorithm generates a signature which is the encrypted digest of the data to be signed using the signers private key. The combination of the signature and the data itself is referred to as signed data. Prior to the generation of a digital signature, a message digest is used to generate the information to be signed using an appropriate approved hash function. The signatory may optionally verify the digital signature using the signature verification process and the associated public key [14]. This optional verification serves as a final check to detect, otherwise undetected signature generation computation error will occur; this verification may be prudent when signing a high-value message, when multiple users are expected to verify the signature, or if the verifier will be verifying the signature at a much long time. 3.2 Signature verification A digital signature is an electronic analogue of a written signature; the digital signature can be used to provide assurance that the claimed signatory signed the information. In addition, a digital signature may be used to

2.1 Image based anti-phishing Proposed a new phishing detection scheme based on an URL domain identity & webpage image. Initially, it identifies the similar authorized URL, using divide rule approach and approximate string matching algorithm. In this approach, similar URL and input URL along with the IP addresses are identified. If their IP addresses do not match with each other, then it could be a phishing URL page and phase-I phishing report is generated. Now this suspected URLs webpage will be treated as an image during phase-II. In phase-II, key points will be detected and their features are extracted. CCH descriptor is used to extract these features. Then, this suspected image features will be match with the features of authorized webpage [1, 2]. If this matching crosses threshold value, then this webpage is determined to be a phishing web page. At the end, final phishing report shall be generated presenting that the respective web page is a phishing web page.



detect whether or not the information was modified after it was signed. A digital signature algorithm includes a signature generation process and a signature verification process. A signatory uses the generation process to generate a digital signature on data; a verifier uses the verification process to verify the authenticity of the signature.
Protected web page Extracted Content Salt

fied signature format. If the verification and assurance processes are successful, the digital signature and signed data considered to be valid. However, if a verification process fails, the digital signature considered as invalid.

Hash Function

Domain ID 1


Hash Value


Fig. 1 Protecting web Content for legitimate web pages Each signatory has a public and private key and is the owner of that respective key pair. Digital signatures enable the recipient of information to verify the authenticity of the informations origin, and also verify that the information is intact. The method of disguising plaintext in a way that hides its substance is called encryption. Encrypting plaintext results in unreadable gibberish text called ciphertext. The process of converting cipher text into its original plaintext is called decryption.
Domain ID
1 76asda5

3.3 MD5 Algorithm MD5 algorithm takes input as a message of arbitrary length and produces output as a 128-bit "fingerprint" or "message digest". The MD5 algorithm is mainly intended for digital signature applications where a complex file must be "compressed" in a secure manner before being encrypted with a private key under a public-key cryptosystem [18, 20]. The MD5 algorithm is designed to be run on 32-bit machines. In addition to this, the MD5 algorithm does not require any large substitution tables; the algorithm can be coded quite compactly. The MD5 is an extension of the MD4 message-digest algorithm. MD5 is slightly slower than MD4, but this was more "conservative" in design. MD5 has been the most widely used secure hash algorithm particularly in Internet-standard message authentication.
The following steps are performed to compute the message digest of the large message. Step 1: Append Padding Bits The message is "padded" so that its length is congruent to 448, modulo 512. i.e., the message is extended so that it is just 64 bits shy of being a multiple of 512 bits long. Padding will be performed every time, even if the length of the message is already congruent to 448, modulo 512. Padding is performed by the following: a single "1" bit is appended to the message, and then "0" bits are appended so that the length in bits of the padded message becomes congruent to 448, modulo 512. At least one bit and at most 512 bits will be appended. Step 2: Append Length A 64-bit representation of b (the length of the message before the padding bits are added) is appended to the result of the previous step. In the unlikely event that b is greater than 2^64, then only the low-order 64 bits of b are used. (These bits are appended as two 32-bit words and appended low-order word first in accordance with the previous conventions.) At this point the resulting message (after padding with bits and with b) has a length that is an exact multiple of 512 bits. Equivalently, this message has a length that is an exact multiple of 16 (32-bit) words. Let M [0 ... N-1] denote the words of the resulting message, where N is a multiple of 16. Step 3: Initialize MD Buffer A four-word buffer (A, B, C, and D) is used to compute the message digest. Here each of A, B, C, D is a 32-bit register. These registers are initialized to the following values in hexadecimal, low-order bytes first): word A: 01 23 45 67


Hash Value

Suspected web page Extracted Content

Hash Function


Fig. 2 Testing Suspected web Content

3.2 Signature Validation

To verify signed data, the attached signature is decrypted using the signers public key and comparison of this signature with a digest of the data is performed for equivalence. This signers public key is usually retrieved from a Certificate Authority. If the verification process fails, no inference can be made as to whether the data is correct or not. The digital signature cannot be verified only for that data which is using the specified public key and the speci-



word B: 89 ab cd ef word C: fe dc ba 98 word D: 76 54 32 10 Step 4: Process Message in 16-Word Blocks We first define four auxiliary functions that each take as input three 32-bit words and produce as output one 32-bit word. F(X, Y, Z) = XY v not(X)Z G(X, Y, Z) = XZ v Y not (Z) H(X, Y, Z) = X xor Y xor Z I(X, Y, Z) = Y xor (X v not (Z)) In each bit position F acts as a conditional: if X then Y else Z. The function F could have been defined using + instead of v since XY and not(X)Z will never have 1's in the same bit position. It is interesting to note that if the bits of X, Y, and Z are independent and unbiased, the each bit of F(X, Y, and Z) will be independent and unbiased. The functions G, H, and I are similar to the function F, in that they act in "bitwise parallel" to produce their output from the bits of X, Y, and Z, in such a manner that if the corresponding bits of X, Y, and Z are independent and unbiased, then each bit of G(X,Y,Z), H(X,Y,Z), and (X,Y,Z) will be independent and unbiased. Note that the function H is the bit-wise "xor" or "parity" function of its inputs. This step uses a 64-element table T[1 ... 64] constructed from the sine function. Let T[i] denote the i-th element of the table, which is equal to the integer part of 4294967296 times abs(sin(i)), where i is in radians. The elements of the table are given in the appendix. Do the following: /* Process each 16-word block. */ For i = 0 to N/16-1 do /* Copy block i into X. */ For j = 0 to 15 do Set X[j] to M[i*16+j]. end /* of loop on j */ /* Save A as AA, B as BB, C as CC, and D as DD. */ AA = A BB = B CC = C DD = D /* Round 1. */ /* Let [abcd k s i] denote the operation a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */ /* Do the following 16 operations. */ [ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4] [ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8] [ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12] [ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16] /* Round 2. */ /* Let [abcd k s i] denote the operation a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */ /* Do the following 16 operations. */ [ABCD 1 5 17] [DABC 6 9 18] [CDAB 11 14 19] [BCDA 0 20 20] [ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4

20 24] [ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28] [ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32] /* Round 3. */ /* Let [abcd k s t] denote the operation a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */ /* Do the following 16 operations. */ [ABCD 5 4 33] [DABC 8 11 34] [CDAB 11 16 35] [BCDA 14 23 36] [ABCD 1 4 37] [DABC 4 11 38] [CDAB 7 16 39] [BCDA 10 23 40] [ABCD 13 4 41] [DABC 0 11 42] [CDAB 3 16 43] [BCDA 6 23 44] [ABCD 9 4 45] [DABC 12 11 46] [CDAB 15 16 47] [BCDA 2 23 48] /* Round 4. */ /* Let [abcd k s t] denote the operation a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */ /* Do the following 16 operations. */ [ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52] [ABCD 12 6 53] [DABC 3 10 54] [CDAB 10 15 55] [BCDA 1 21 56] [ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60] [ABCD 4 6 61] [DABC 11 10 62] [CDAB 2 15 63] [BCDA 9 21 64] /* Then perform the following additions. (That is increment each of the four registers by the value it had before this block was started.) */ A = A + AA B = B + BB C = C + CC D = D + DD end /* of loop on i */ Step 5: Output The message digest produced as output is A, B, C, and D. i.e., we begin with the low-order byte of A, and end with the high-order byte of D.

3.4 Barcode Generation A barcode can be termed as an optical machine-readable representation of data, which shows data about the object to which it attaches. They are of phat use in many security and identification along with data capture arenas today. Their origin and development however was an evolving process where basically barcodes represented data by varying the widths and spacing of parallel lines called linear or one-dimensional (1D). Later they had expanded their dimensions to rectangles, dots, hexagons and other geometric patterns in two dimensions (2D). Barcodes were scanned by special optical scanners called barcode readers followed y their wide ranged usage through the availability of scanners and interpretive software on devices like desktop printers, smart phones etc...Barcodes are of various types which can be classified under numeric only,alpha-numeric,2D barcodes.These types can further be used appropriately under the roof of barcode



symbology that defines the technical details of a particular type of barcode like the width of the bars, character set, method of encoding, checksum specifications, etc... enhancing the details of the amount and kind of data that can be withheld, common uses, etcalong with the technical details. Code 128 is a kind of barcode with a very high-density barcode symbology. It is used for alphanumeric or numeric-only barcodes. It is capable of encoding all 128 characters of ASCII and carries an extra drop of advantage by the use of an extension character (FNC4).A Code 128 barcode will have six sections and 107 symbols among which 103 are data symbols with 3 start codes and 1 stop code. The six sections of barcode 128 are quiet zone , start character, encoded data, check character, stop character, quiet zone with each section consisting of its own significance.


S. N O Trained legitimate Domain Barcode Testing Suspected Domain Barcode Result
Yes No No Yes

1 2 3 4

It is observed from the below table 1 that the legitimate hash value had been obtained from the respective web page and is compared with the testing suspected domain hash values. There is a possibility that the hash code can be replicated by the phisher but this problem can be overcome by completing the verification process only when the hash value as well as the salt values match. On the performance of this testing for four cases, it can be observed that when the hash and salt values of both the trained legitimate domain hash and testing suspected domain hash are match, then alone is the respective case considered authentic. From the table below it can be seen that from the four cases considered only the 1st case had met the requirements with the same salt value leading to a successful match while the 2nd,3rd and 4th cases do not match the respective requirements.

In the table 3, 400 cases were taken into consideration and subjected to the testing in the proposed approach. The results can be observed from the above table that in various respective varied amounts of testing in which in the first case out of 27 cases the rate of false acceptance had been 2 while false rejection had been just one. Even when high numbered cases were taken into consideration like 123 and 108, the rates of false acceptance and rejection are limited to 3 and 2 as observed from the 2 nd and 3rd case. When a medium amount of cases were taken the results as had been maintained low as shown in the 4 th and 5th cases. This depicts the consistency that this approach carries in maintain a high rate of accuracy regardless of the amount of cases taken. TABLE 3: COMPARISON RESULTS BASED ON SALT
1 2 3 4 5

Hash With Salt

27 123 108 67 75

False Acceptance
2 3 3 1 2

False Reject
1 2 1 2 2


S n o
1 2 3 4

Trained legitimate Domain Hash


Testing Suspected Domain Hash


Same Salt
Yes No No No

Match No Match No Match No Match

The table 4 shows the results of the same amount of cases in compatible with barcodes where the results as observed from the table are consistent and low in amount of false acceptance and rejection cases. TABLE 4: TEST RESULTS USING BARCODES FA AND FR
1 2 3 4 5

27 123 108 67 75

False Acceptance
1 2 1 2 2

False Reject
1 1 4 1 3

The results in the below table are shown with respect to the corresponding barcodes whose results are tested in 4 cases with the 1st case showing a successful match while the 2nd and 3rd cases do not match the respective requirements. The 4th case again is found to be a successful match.

The Fig. 3 shows a performance of the entire proposed approach taking into consideration the number of domains, hash with salt and bar code and their respective amounts of false acceptances and rejections arranged in a serial manner. It can be observed from the diagram that



there is an impressive coverage of all the domains, effective accuracy rate can be observed from the rates of false rejections and acceptances.

Fig. 3 Combined approach on content protection.

It can be observed that the above technique stands deep rooted to protect the information stored in web pages assuring its originality, security and confidentiality [16]. It provides protection at two levels through the use of both hash value and salt value in completion of verification process. The algorithms that have been used are best suited for the respective application since MD-5 algorithm is a security-specific algorithm thereby giving an efficient hash value that is hard to be breached. In order to overcome the subtle probability of problems caused due to the mimic of the respective web pages and assure the level of security, a dynamic dimension is provided by adding the salt value verification. The use of barcode produces a scope of greater application as it instigates various benefits ranging from reduced revenue losses resulting from data collection errors, provision of necessary inventory levels and most of all providing faster access to information. The usage of appropriate algorithms and measures ensure and strengthen the tie of huge scope of a successful anti phishing approach.

BBC News, 2003. Crackdown on spam. 053.stm [7]Earthlink, 2006. Internet Scams and ScamBlocker. [8] Gregg Tally, Roshan Thomas, Tom Van Vleck, Anti-Phishing: Best Practices for Institutions and Consumers. McAfee Research, September 2004 [9] MallikkaRajalingam, Saleh Ali Alomari& Putra Sumari, Prevention of Phishing Attacks Based on Discriminative Key Point Features of WebPages. International Journal of Computer Science and Security (IJCSS), 2012 [10] Madhuri S. Arade, P.C. Bhaskar, Antiphishing Model with URL & Image based Webpage Matching. International Journal of Computer Science and Technology, 2011 [11] Angelo P. E. Rosiello ,EnginKirda, Christopher Kruegel, and FabrizioFerrandi, A Layout-Similarity-Based Approach for Detecting Phishing Pages. IEEE International Conference on Security and Privacy in Communication Networks, 2007 [12] APWG. Phishing Activity Trends - Report for the Month of December, 2007. Technical report, Anti Phishing Working Group, Jan. 2008. Available at report_dec_2007.pdf [13] Tom Jagatic, Natheniel Johnson, Markus Jakobsson, and FilippoMenczer.Social Phishing.Communications of ACM, 2005 [14] G. JULIUS CAESAR, JOHN F. KENNEDY, Cryptography. Security Engineering: A Guide to Building Dependable Distributed Systems [15] Ian Curry, An Introduction to Cryptography and Digital Signatures, Entrust, 2001 [16] Dibia Victor, ARESTful Web Services Implementation. Master of Science in Information Networking (MSIN), April 2011 [17] Gary Locke,Patrick Gallagher,Digital Signature Standard (DSS). FEDERAL INFORMATION PROCESSING STANDARDS PUBLICATION, June 2009 [18] KOSTAS ZOTOS, ANDREAS LITKE, Cryptography and Encryption.Dept. of Applied Informatics, University of Macedonia 54006 Thessaloniki, GREECE {zotos, litke} [19] JanakaDeepakumara, Howard M. Heys and R. Venkatesan, FPGA IMPLEMENTATION OF MD5 HASH ALGORITHM. Memorial University of Newfoundland St.Johns, NF, Canada A1B 3X5 [20] R. Rivest, The MD5 Message-Digest Algorithm, RFC 1321, MIT LCS & RSA Data Security, Inc., April 1992. S. Govinda Rao received M.Tech in Computer Science and Systems Engineering from Andhra University. Currently working as an Associate professor in the department of CSE, Gokaraju Rangaraju Institute of Engineering and Technology. He got 7 years of Teching Experience. B.Srinivas received M.Tech in computer Science and engineering in 2008 from Acharya Nagarjuna University. He has two and half years of industry and four years of teaching experience. He is currently employed as a an assistant professor in CSE department, MVGR College of Engineering. He has more than eight papers in journals. P.Satheesh received M.Tech in computer Science and Systems Engineering from Andhra University; He has ten years of teaching experience. He is currently employed as a an Associate professor in CSE department, MVGR College of Engineering. He has more than eight papers in journals. Dr. A Govardhan received his BE in Computer Science and Engineering from Osmania University College of Engineering, Hyderabad in 1992, M.Tech from Jawaharlal Nehru University(JNU), Delhi in 1994 and he earned his Ph.D from Jawaharlal Nehru Technological Univesity, Hyderabad (JNTUH) in 2003. He has guided 123 M.Tech projects. He has 108 research publications at International/National Journals and Conferences.


[1] Wenyin Liu, Xiaotie Deng, Guanglin Huang, and Anthony Y. Fu, An
Antiphishing Strategy Based on Visual Similarity Assessment. In IEEE Computer Society, City University of Hong Kong,pages 58-652006. [2] Anthony Y. Fu, Liu Wenyin and Xiaotie Deng, Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Movers Distance (EMD). IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, pages 301-311 2006. [3] Haijun Zhang, Gang Liu, Tommy W. S. Chow and Wenyin Liu, Textual and Visual Content Based Anti-Phishing: A Bayesian Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS, pages 1532-1546, 2011. [4]APWG 2004. Phishing Attack Trends Report. [5] APWG, 2006. Origins of the Word Phishing.