You are on page 1of 52




Submitted in partial fulfillment of the requirements for the degree



Submitted by

Under the guidance of



July 2012


I, do hereby, declare that the work presented in this thesis titled DEVELOPING GUI
the partial fulfillment of the requirement of the degree of Bachelor of Technology in
Information Technology at Indian Institute of Information Technology, Allahabad is an
authentic record of my original work carried out under the guidance of DR. MANISH



Date : 26-07-2012



I do hereby recommend that the thesis work prepared under my supervision by SACHIN
JAIN be accepted in the partial fulfillment of the requirement of the degree of BACHELOR



Date: 26-07-2012

It is my privilege to express my sincerest regards to our project coordinator, DR. MANISH
MANGLA for their valuable inputs, able guidance, encouragement, whole-hearted
cooperation and constructive criticism throughout the duration of our project. They have been
very helpful throughout the project proceedings.
I would also like to thank my colleagues at the company without whom the project wont
have been successful. Their constant support and help has been extremely valuable for the
project completion.

Place : NOIDA


Date : 26-07-2012



The proposed work is part of a project that aims for the development for a GUI based
platform where plugin providers can provide their plugins and website administrators can
simply use this platform to integrate plugins into their website without knowing anything
about plugin or without even knowing about the code in the webpage where plugin needs to
be integrated. The project aims at the reduction of manual efforts in including plugins on
every page on the website. Interface of platform is kept so simple that even a non-technical
guy will be able to place plugins on his website. In this document we implement several
algorithms to optimize each and every aspect of the application and which can reduce human
effort and provide a much better user experience.

Table Of Contents

Candidates Declaration 2
Certificate ....3
Chapter 1: Introduction.8-16
1.1 Background
1.2 Literature Survey
1.3 Process of adding plug-ins
1.4 Formulation of the present problem
Chapter 2: Frameworks used17-32
2.1 Hardware
2.2 Framework and Software used
2.3 Database used
(Theoretical Developments)
Chapter 3: Loading a webpage in I-Frame33-39
3.1 Cross Domain Resource Sharing
3.2 Various types of XSS attacks
3.4 CORS Filter

3.5 Proxy controller solution

Chapter 4: Algorithm for locating an HTML DOM object on webpage.40-41
4.1 Pseudo-code for encoding Dom object
4.2 Pseudo-code for decoding saved information to get Dom object
Chapter 5: Algorithm to find whether two documents are structurally identical..42
Outline of Work44
Results & Discussions46
Snapshots of the Application47
Recommendations for the Future Work50

Chapter 1
Fig. 1.1 Use of Social Plugins (Connect) on Web pages12
Fig. 1.2: Use of Like And +1 Plugins in E-Commerce Product Pages..13
Fig. 1.3: Steps of Integrating A Plugins on Webpage...13
Fig. 1.4: Step1 - Configuring Plug-Ins....14
Fig. 1.5: Step2 - Fetch Code of Customized Plug-Ins.15
Chapter 2
Fig. 2.1: Example of Maven Dependency in Spring Frameworks.20
Fig. 2.2: Model and View Controller.21
Chapter 3
Fig.3.1: Example of JSONP37
Fig. 3.2: Use of CORS Filter to Solve Cross Site Resource Sharing Problem..39
Fig. 3.3: Activity Diagram to Load A Webpage in I-Frame41
Outline of work
Fig. 1 Flowchart of Fetching Plug-Ins Using Similar Pages46
Fig. 2 Snapshots of Social Plug-Ins..48
Fig.3 Snapshot Of Webpage without Plug-Ins49
Fig.4 Snapshot of Webpage after Placing Add to wish list Plug-Ins..49

A Plug-in is a set of software components that adds specific abilities to a large software
application. If supported plug-in enable customizing the functionality of an application. For
example, plug-ins are commonly used in web browsers to play video, scan for viruses, and
display new file types. Well-known plug-ins examples include Adobe Flash Player,
QuickTime, and Oxy-tube.

It can be seen that almost every website today use plug-ins in some way whether it be social
plug-ins which connect the website or webpage to social media or some other kind of plugins which extend the websites functionality in some way. But no social platform gives an
easy way to put these plug-ins into their system. E-commerce is one of the popular business
streams today. Till date, hundreds of ecommerce websites have been grown up and came into
Electronic commerce refers to buying and selling of products or services over Internet. These
days, the amount of trade conducted electronically has grown extraordinarily with widespread
Internet usage. We have more than hundred virtual stores running on web every day. There
are many advantages with online shopping like convenience. Online stores are usually
available 24 hours which is not the case with physical store. Virtual stores also provide
service of delivering items at home which can save a lot of time for the user. Another
advantage of virtual stores is reviews about a particular product from other customers so
before purchasing a product user can refer to the reviews from the customers who previously
purchased the same product. This helps customer purchasing an item which meets his needs
fully. Price comparison from various virtual stores is also very easy as compared to
comparing prices of a product in different physical stores. Besides many advantages, virtual
stores also suffer from many disadvantages. Fraud and security are the main concerns. SSL
encryption has generally solved the problem of credit card numbers being intercepted in
transit between consumer and merchant. Phishing is also one of the hacking techniques which
are used to cheat customers. In spite of these problems online shopping is quite successful in
the present time.


Social commerce [1] is a subset of electronic commerce that involves social media, online
media that supports social interaction and user contributions to assist in the online buying and
selling of products and services. The term social commerce was introduced by Yahoo! In
November 2005 to describe a set of online collaborative shopping tools such as shared pick
lists, user ratings and other user-generated content-sharing of online product information. The
concept of social commerce was developed by David Biesel to include collaborative ecommerce tools that enable shoppers to get advice from trusted individuals, find goods and
services and then purchase them. Today, the area of social commerce has been expanded to
include the range of social media tools and content used in the context of e-commerce
especially in the fashion industry. Examples of social commerce include customer ratings and
reviews, user recommendations and referrals, social shopping tools (sharing the act of online
shopping), forums and communities, social advertising.

Use of social plug-ins in ecommerce websites is the next big thing. Even the process has
already started with many of the websites especially US based e-commerce websites are
using social plug-ins to move to ecommerce 2.0 which is also known as social commerce.
Maybe one day someone will crack the magical code and create a true social commerce
platform. Still, even then, the platform will need to be built on a firm foundation of Social
Networks and Commerce. Until then, it makes sense that plug-ins will be the key to building
a successful Social Commerce business.


Online Shopping is continuing to evolve every day. Even CEOs and directors of many
companies have their opinion about social commerce.If I had to guess, social commerce is
the next area to really blow up[2]said by Mark Zuckerburg, CEO of facebook.
Wet-Seal is one of the biggest companies in the e-commerce world. At Wet-Seal about 20%
of the revenue can be attributed to the social media initiatives [2] told by John Cubo, CEO
of Wet-Seal. Wet-Seal is considered to be one of the successful companies which got the
social media strategy right and reaping rich individuals. This Project aims at providing
solution which can make other companies successful like Wet-Seal.


Amazon is e-commerce 1.0, with more than 10 years worth of products on a site. What we
are moving toward now is e-commerce 2.0, which is more about discovering and
browsing[3]- said by Jason Goldberg CEO. Social media is not only about facebook pages
and twitter accounts. Its about building the community, facilitating conversations, listening
and responding.
26% chose to sign up to a website using social media. Visitors who login with their social
media profiles are five times more likely to make purchases than those who create accounts
on their site[3] told by CEO Randall Weidberg,
Facebook also provides various plug-ins to be placed on sites which can increase the user
engagement but information from these plug-ins cant be processed to improve
personalization and re-targeting of users.

Examples of social plugins used on web sites

1) : Connect with facebook plug-in

Connect with facebook[6] plug-in asks for users to connect with facebook and provides
extra discount for doing the same. Basically, it tempts the users for discounts and use their
customer as a potential seller and advertise the company name with the help of social media.

FIG. 1.1 Use of social plug-ins (connect) on webpages

2) : Like and +1 plug-in

Like plug-in from facebook and +1 plug-in from Google is available on the product pages


FIG. 1.2: Use of like and +1 plug-ins in e-commerce product pages


FIG. 1.3: Steps of integrating a plug-ins on webpage


Example of Integrating plug-ins from facebook

Step1: Configure Like plug-in on facebook.

FIG. 1.4 : Step1 - Configuring plug-ins[7]


Step 2: Fetch code of customized plug-in

FIG. 1.5: Step2 Fetch Code of customized plug-ins[7]

Step 3: Add the above code in your website at desired place.

Limitations in existing technology:1. A website administrator requires to understand the webpage code where he needs to
place the plug-in. Only then he will be able to place plug-in at appropriate place.
2. If websites administrator wants to place plug-in on multiple pages then he has to
repeat the process every time. No shortcut is there to reduce such enormous human
3. There is no technique by which administrator can come to know how his webpage
will look after placing the plug-in.
4. No provision is there to visualize the webpage after placing the plug-in.


5. There is no way by which administrator can customize the plug-in settings like color,
size according to the needs of website. He has to try hit and trial method which is
quite time consuming.


Developing a GUI based platform for plugin integration in websites
We basically aim at making a graphical platform using which website administrator can
apply plug-ins with lot of ease. Main aim is to reduce human effort in applying plug-in.
basically the same steps are done again and again. So one of the idea includes to utilize the
page structure to identify the sections where admin places the plug-in and rest of pages will
be taken care by our algorithm. This approach will reduce the human effort by large extent
and a website will be used with plug-ins not within hours but within minutes. While earlier it
was taking days to accomplish the same task.

Webpage with plug-in visualization:

First step in placing plug-in with existing technology is to customize the plug-in on provider
platform. The better approach should be to provide a platform where merchant (website
admin) can render his webpage which needs to be plug-insized and provide a simple interface
like drag and drop kind of thing so that merchant can visualize how his webpage looks after
placing the plug-in. He should be able to customize the plug-ins on the page itself which
improves the user experience.

Manual efforts reduction:

Since all the ecommerce websites use 3-4 templates for their products. Like one kind of
template is used for displaying their product pages, one kind of template is there for their
category pages, one for their home pages and rest others do not matter to us because any way
they are not plug-insized. So our concern is to plug-insized the product pages and category
pages. The problem lies how to reduce manual effort so the idea is to develop an algorithm
which uses a page layout structure to identify all the pages which have the same template or


same page structure. And then we will implement the concept of one done all done. This
means the merchant needs to place plug-in on only one product page and since all other
product pages are applied plug-ins in the same way so our algorithm will handle thousands of
similar pages on his website which saves days of manual effort.


The platform developed needs to be hosted on a web server. So as a hardware only a system
is required with web server running on it.
Web Server
A Web server[8] can refer to either the hardware (the computer) or the software (the
computer application) that helps to deliver Web content that can be accessed through the
Internet. The primary function of a web server is to deliver web pages on the request to
clients using the Hypertext Transfer Protocol (HTTP). This means delivery of HTML
documents and any additional content that may be included by a document, such as images,
style sheets and scripts.
A user agent, commonly a web browser or web crawler, initiates communication by making a
request for a specific resource using HTTP and the server responds with the content of that
resource or an error message if unable to do so. The resource is typically a real file on the
server's secondary memory, but this is not necessarily the case and depends on how the web
server is implemented.
While the primary function is to serve content, a full implementation of HTTP also includes
ways of receiving content from clients. This feature is used for submitting web forms,
including uploading of files.



Spring Framework
The Spring Framework[9] provides a comprehensive programming and configuration model
for modern Java-based enterprise applications - on any kind of deployment platform. A key
element of Spring is infrastructural support at the application level: Spring focuses on the


"plumbing" of enterprise applications so that teams can focus on application-level business

logic, without unnecessary ties to specific deployment environments.

Spring includes:

Flexible dependency injection with XML and annotation-based configuration styles

Advanced support for aspect-oriented programming with proxy-based and Aspectbased variants.

Support for declarative transactions, declarative caching, declarative validation, and

declarative formatting.

Powerful abstractions for working with common Java EE specifications such as JDBC

First-class support for common open source frameworks such as Hibernate and

A flexible web framework for building Restful MVC applications and service

Rich testing facilities for unit tests as well as for integration tests

Spring is modular in design, allowing for incremental adoption of individual parts such as the
core container or the JDBC support. While all Spring services are a perfect fit for the Spring
core container, many services can also be used in a programmatic fashion outside of the
Supported deployment platforms range from standalone applications to Tomcat and Java EE
servers such as WebSphere. Spring is also a first-class citizen on major cloud platforms with
Java support, e.g. on Heroku, Google App Engine, Amazon Elastic Beanstalk and VMware's
Cloud Foundry.


The Spring Framework serves as the foundation for the wider family of Spring open source
projects, including:

Spring Security

Spring Integration

Spring Batch

Spring Data

Spring Web Flow

Spring Web Services

Spring Mobile

Spring Social

Spring Android

FIG. 2.1 : Example of maven dependency in spring frameworks


Spring uses the Model-View-Controller Design Pattern

FIG. 2.2 : Model and View Controller

Spring uses the Model-View-Controller (MVC) design pattern throughout. Models
encapsulate application data, Views display and edit that data, and Controllers mediate the
logic between the two. By separating responsibilities in this manner, you end up with an
application that is easier to design, implement, and maintain.
The MVC pattern means that Interface Builder requires no code to be written or generated
while you focus exclusively on the View of your application. Spring bindings on the Mac
eliminate most of the glue code, making the connection between Controllers coded in Xcode
and the Views designed within Interface Builder a simple matter of graphically wiring the
two together. Interface Builder works with Spring to make localizing your application easy so
you can quickly enter entirely new markets.


H2[10] is a relational database management system written in Java. It can be embedded in
Java applications or run in the client-server mode. The disk footprint (size of the jar file) is
about 1 MB. A subset of the SQL (Structured Query Language) standard is supported. The
main programming APIs are SQL and JDBC, however the database also supports using the
PostgreSQLODBC driver by acting like a PostgreSQL server.


It is possible to create both in-memory tables, as well as disk-based tables. Tables can be
persistent or temporary. Index types are hash table and tree for in-memory tables, and b-tree
for disk-based tables. All data manipulation operations are transactional. Table level locking
and multisession concurrency control are implemented. The 2-phase commit protocol is
supported as well, but no standard API for distributed transactions is implemented. The
security features of the database are: role based access rights, encryption of the password
using SHA-256 and data using the AES or the Tiny Encryption Algorithm, XTEA. The
cryptographic features are available as functions inside the database as well. SSL / TLS
connections are supported in the client-server mode, as well as when using the console
Two full text search implementations are included, a native implementation and one using
Lucene.A simple form of high availability is implemented: when used in the client-server
mode, the database engine supports hot failover (this is commonly known as clustering).
However, the clustering mode must be enabled manually after a failure.[5]The database
supports protection against SQL injection by enforcing the use of parameterized statements.
In H2, this feature is called 'disabling literals.
JavaScript[11] (sometimes abbreviated JS) is a prototype-basedscripting language that is
dynamic, weakly typed and has first-class functions. It is a multi-paradigm language,
supporting object-oriented,[5]imperative, and functional[1][6] programming styles.
JavaScript's use in applications outside web pages. For example in PDF documents, sitespecific browsers, and desktop widgets is also significant. Newer and faster JavaScript VMs
and frameworks built upon them (notably Node.js) have also increased the popularity of
JavaScript for server-side web applications.
JavaScript uses syntax influenced by that of C. JavaScript copies many names and naming
conventions from Java, but the two languages are otherwise unrelated and have very different
semantics. The key design principles within JavaScript are taken from the Self and Scheme
programming languages.


The most common use of JavaScript is to write functions that are embedded in or included
from HTML pages and that interact with the Document Object Model (DOM) of the page.
Some simple examples of this usage are:

Loading new page content or submitting data to the server via AJAX without reloading the
page (for example, a social network might allow the user to post status updates without
leaving the page)

Animation of page elements, fading them in and out, resizing them, moving them, etc.

Interactive content, for example games, and playing audio and video

Validating input values of a web form to make sure that they are acceptable before being
submitted to the server.

Transmitting information about the user's reading habits and browsing activities to various
websites. Web pages frequently do this for web analytics, ad tracking, personalization or
other purposes.
Because JavaScript code can run locally in a user's browser (rather than on a remote server),
the browser can respond to user actions quickly, making an application more responsive.
Furthermore, JavaScript code can detect user actions which HTML alone cannot, such as
individual keystrokes. Applications such as Gmail take advantage of this: much of the userinterface logic is written in JavaScript, and JavaScript dispatches requests for information
(such as the content of an e-mail message) to the server. The wider trend of Ajax
programming similarly exploits this strength.
A JavaScript engine (also known as JavaScript interpreter or JavaScript implementation) is an
interpreter that interprets JavaScript source code and executes the script accordingly. The first
JavaScript engine was created by Brendan Eich at Netscape Communications Corporation,
for the Netscape Navigatorweb browser. The engine, code-named SpiderMonkey, is
implemented in C. It has since been updated (in JavaScript 1.5) to conform to ECMA-262
Edition 3. The Rhino engine, created primarily by Norris Boyd (formerly of Netscape; now at
Google) is a JavaScript implementation in Java. Rhino, like SpiderMonkey, is ECMA-262
Edition 3 compliant.
A web browser is by far the most common host environment for JavaScript. Web browsers
typically use the public API to create "host objects" responsible for reflecting the Document
Object Model (DOM) into JavaScript. The web server is another common application of the


engine. A JavaScript webserver would expose host objects representing an HTTP request and
response objects, which a JavaScript program could then manipulate to dynamically generate
web pages.
Because JavaScript is the only language that the most popular browsers share support for, it
has become a target language for many frameworks in other languages, even though
JavaScript was never intended to be such a language. Despite the performance limitations
inherent to its dynamic nature, the increasing speed of JavaScript engines has made the
language a surprisingly feasible compilation target
Cross-site vulnerabilities
A common JavaScript-related security problem is cross-site scripting, or XSS, a violation of
the same-origin policy. XSS vulnerabilities occur when an attacker is able to cause a target
web site, such as an online banking website, to include a malicious script in the webpage
presented to a victim. The script in this example can then access the banking application with
the privileges of the victim, potentially disclosing secret information or transferring money
without the victim's authorization. A solution to XSS vulnerabilities is to use HTML escaping
whenever displaying untrusted data.
Some browsers include partial protection against reflected XSS attacks, in which the attacker
provides a URL including malicious script. However, even users of those browsers are
vulnerable to other XSS attacks, such as those where the malicious code is stored in a
database. Only correct design of Web applications on the server side can fully prevent
XSS.XSS vulnerabilities can also occur because of implementation mistakes by browser
Another cross-site vulnerability is cross-site request forgery or CSRF. In CSRF, code on an
attacker's site tricks the victim's browser into taking actions the user didn't intend at a target
site (like transferring money at a bank). It works because, if the target site relies only on
cookies to authenticate requests, then requests initiated by code on the attacker's site will
carry the same legitimate login credentials as requests initiated by the user. In general, the
solution to CSRF is to require an authentication value in a hidden form field, and not only in
the cookies, to authenticate any request that might have lasting effects. Checking the HTTP
Referrer header can also help.


"JavaScript hijacking" is a type of CSRF attack in which a <script> tag on an attacker's site
exploits a page on the victim's site that returns private information such as JSON or
JavaScript. Possible solutions include:
1) requiring an authentication token in the POST and GET parameters for any response
that returns private information
2) using POST and never GET for requests that return private information
Browser and plug-in coding errors
JavaScript provides an interface to a wide range of browser capabilities, some of which may
have flaws such as buffer overflows. These flaws can allow attackers to write scripts which
would run any code they wish on the user's system.These flaws have affected major browsers
including Firefox, Internet Explorer, and Safari.
Plug-ins, such as video players, Adobe Flash, and the wide range of ActiveX controls enabled
by default in Microsoft Internet Explorer, may also have flaws exploitable via JavaScript, and
such flaws have been exploited in the past.
In Windows Vista, Microsoft has attempted to contain the risks of bugs such as buffer
overflows by running the Internet Explorer process with limited privileges.Google Chrome
similarly limits page renderers in its own "sandbox".
jQuery[12] is a fast and concise JavaScript Library that simplifies HTML document
traversing, event handling, animating, and Ajax interactions for rapid web development.
jQuery is designed to change the way that you write JavaScript.
1) Lightweight Footprint
2) CSS3 Compliant
3) Cross-browser
jQuery is free, open source software, dual-licensed under the MIT License or the GNU
General Public License, Version 2. jQuery's syntax is designed to make it easier to navigate
a document, select DOM elements, create animations, handle events, and develop Ajax
applications. jQuery also provides capabilities for developers to create plug-ins on top of the


JavaScript library. This enables developers to create abstractions for low-level interaction and
animation, advanced effects and high-level, theme-able widgets. The modular approach to the
jQuery library allows the creation of powerful dynamic web pages and web applications.
jQuery includes the following features:
1) DOM element selections using the cross-browser open source selector engine Sizzle,

a spin-off out of the jQuery project [10]

2) DOM traversal and modification (including support for CSS 1-3)
3) DOM manipulation based on CSS selectors that uses node elements name and node

elements attributes (id and class) as criteria to build selectors

4) Events
5) Effects and animations
6) Ajax
7) Extensibility through plug-ins
8) Utilities - such as user agent information, feature detection
9) Compatibility methods that are natively available in modern browsers but need

fallbacks for older ones - For example the inArray() and each() functions.
10) Cross-browser support

Including the library

The jQuery library is a single JavaScript file, containing all of its common DOM, event,
effects, and Ajax functions. It can be included within a web page by linking to a local copy,
or to one of the many copies available from public servers. jQuery has a CDN sponsored by
Media Temple (previously at Amazon). Google and Microsoft host it as well.
The most popular and basic way to introduce a jQuery function is to use the .ready() function.
$(document).ready(function(){// script goes here});
or the shortcut


$(function(){// script goes here});

Usage styles
jQuery has two usage styles:
1) via the $ function, which is a factory method for the jQuery object. These functions,

often called commands, are chainable as they all return jQuery objects.
2) via $.-prefixed functions. These are utility functions which do not work on the jQuery

object per se.

jQuery plugins
Because of jQuery's architecture, other developers can use its constructs to create plug-in
code to extend its functionality. Currently there are thousands of jQuery plug-ins available on
the web that cover a wide range of functionality such as Ajax helpers, webservices, data
grids, dynamic lists, XML and XSLT tools, drag and drop, events, cookie handling, modal
windows, even a jQuery-based Commodore 64 emulator.
An important source of jQuery plug-ins is the Plug-ins sub-domain of the jQuery Project
website. However, in an effort to rid the site of spam, the plug-ins in this sub domain were
accidentally deleted in December 2011. The new site will include a GitHub-hosted
repository, which will require developers to resubmit their plug-ins and to conform to new
submission requirements. There are alternative plug-in search engines that take more
specialist approaches, such as listing only plug-ins that meet certain criteria (e.g. those that
have a public code repository). The tutorials page on the jQuery site has a list of links to
jQuery plug-in tutorials under the "Plug-in development" section.
Jsoup[13] is a Java library for working with real-world HTML. It provides a very convenient
API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like
jsoup implements HTML5 specification, and parses HTML to the same DOM as modern
browsers do.

1) scrape and parse HTML from a URL, file, or string
2) find and extract data, using DOM traversal or CSS selectors
3) manipulate the HTML elements, attributes, and text
4) clean user-submitted content against a safe white-list, to prevent XSS attacks
5) output tidy HTML

jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and
validating, to invalid tag-soup; jsoup will create a sensible parse tree.
Example :
1) Fetch the Wikipedia homepage
2) parse it to a DOM
3) select the headlines from the in the news section into a list of Elements:
Document doc = Jsoup.connect("").get();
Elements newsHeadlines ="#mp-itn b a");
Ext JS[14] is a pure JavaScript application framework for building interactive web
applications using techniques such as Ajax, DHTML and DOM scripting. GUI controls. Ext
JS includes a set of GUI-based form controls (or "widgets") for use within web applications:
1) text field and textarea input controls
2) date fields with a pop-up date-picker
3) numeric fields
4) list box and combo boxes
5) radio and checkbox controls
6) html editor control
7) grid control (with both read-only and edit modes, sort able data, lockable and drag

gable columns, and a variety of other features)

8) tree control
9) tab panels
10) toolbars
11) desktop application-style menus

12) region panels to allow a form to be divided into multiple sub-sections
13) sliders
14) vector graphics charts

Many of these controls are able to communicate with a web server using Ajax.
JSON [15] or JavaScript Object Notation is a lightweight text-based open standard designed
for human-readable data interchange. It is derived from the JavaScript scripting language for
representing simple data structures and associative arrays, called objects. Despite its
relationship to JavaScript, it is language-independent, with parsers available for many
The JSON format was originally specified by Douglas Crockford. The official Internet media
type for JSON is application/json. The JSON filename extension is .json.
The JSON format is often used for serializing and transmitting structured data over a network
connection. It is used primarily to transmit data between a server and web application,
serving as an alternative to XML.
"firstName": "John",
"lastName" : "Smith",

: 25,

"address" :
"streetAddress": "21 2nd Street",

: "New York",


: "NY",

"postalCode" : "10021"


"type" : "home",
"number": "212 555-1234"
"type" : "fax",
"number": "646 555-4567"


Schema of JSON
There are several ways to verify the structure and data types inside a JSON object, much like
an XML schema; however unlike XML schema, JSON schemas are not widely used.
Additionally JSON Schema has to be written manually; unlike XML, there are currently no
tools available to generate a JSON schema from JSON data.
XML has been used to describe structured data and to serialize objects. Various XML-based
protocols exist to represent the same kind of data structures as JSON for the same kind of
data interchange purposes. When data is encoded in XML, the result is typically larger than
an equivalent encoding in JSON, mainly because of XML's closing tags. Yet, if the data is
compressed using an algorithm like gzip there is little difference because compression is
good at saving space when a pattern is repeated.
XML there are alternative ways to encode the same information because some values can be
represented both as child nodes and attributes. This can make automated data exchange
complicated unless the used XML format is strictly specified as programs need to deal with
many different variations of the data structure. Both of the following XML examples carry
the same information as the JSON example above in different ways.

<streetAddress>21 2nd Street</streetAddress>
<city>New York</city>
<phoneNumbertype="home">212 555-1234</phoneNumber>
<phoneNumbertype="fax">646 555-4567</phoneNumber>


<addressstreetAddress="21 2nd Street"city="New York"state="NY"postalCode="10021"/>
<phoneNumbertype="home"number="212 555-1234"/>
<phoneNumbertype="fax"number="646 555-4567"/>

The XML encoding may therefore be shorter than the equivalent JSON encoding. A wide
range of XML processing technologies exist, from the Document Object
Model to XPath and XSLT. XML can also be styled for immediate display
using CSS.XHTML is a form of XML so that elements can be passed in this form ready for
direct insertion into webpages using client-side scripting.
Which is better: XML or JSON?
1) The XML format is more advanced than shown by the example, though. You can for
example add attributes to each element, and you can use namespaces to partition
elements. There are also standards for defining the format of an XML file, the
XPATH language to query XML data, and XSLT for transforming XML into
presentation data.
2) The XML format has been around for some time, so there is a lot of software
developed for it. The JSON format is quite new, so there is a lot less support for it.
3) While XML was developed as an independent data format, JSON was developed
specifically for use with JavaScript and AJAX, so the format is exactly the same as a
JavaScript literal object.
4) JSON parsing is generally faster than XML parsing.
5) JSON is a more compact format, meaning it weighs far less on the wire than the more
verbose XML.
6) Formatted JSON is generally easier to read than formatted XML.
7) JSON specifies how to represent complex data types, there is no single best way to
represent a data structure in XML.


Example :
JSON object { "foo": { "bar": "baz" } } could be represented in XML as
<foo bar="baz"> or <foo><bar>baz</bar></baz> or <object name="foo">< property
name="bar">baz</property ></object>


Iframes are often used to load third party content, ads and widgets. The main reason to use
the iframe technique is that the iframe content can load in parallel with the main page: it
doesn't block the main page. Loading content in an iframe does however have two downsides
1) Iframes block onload of the main page
2) The main page and iframe share the same connection pool

But the problem will die out very soon if we dont want to put our JS in the webpage in out
iframe. So there was a need to develop some methodology using which we can open a third
party webpage in our iframe with our javascript loaded in his webpage. Another problem
rises as we need to solve cross site scripting issues as we are rendering a page in our iframe
which is having source address different from ours. Basically Cross-site scripting uses known
vulnerabilities in web-based applications, their servers, or plug-in systems they rely on.
Exploiting one of these, they fold malicious content into the content being delivered from the
compromised site. When the resulting combined content arrives at the client-side web
browser, it has all been delivered from the trusted source, and thus operates under the
permissions granted to that system. By finding ways of injecting malicious scripts into web
pages, an attacker can gain elevated access-privileges to sensitive page content, session
cookies, and a variety of other information maintained by the browser on behalf of the user.
Cross-site scripting attacks are therefore a special case of code injection.
Exploit cases using XSS
Attackers intending to exploit cross-site scripting[16] vulnerabilities must approach each
class of vulnerability differently. For each class, a specific attack vector is described here.
The names below are technical terms, taken from the cast of characters commonly used in
computer security.



Non-persistent attack:
1. Alice often visits a particular website, which is hosted by Bob. Bob's website allows Alice to
log in with a username/password pair and stores sensitive data, such as billing information.
2. Mallory observes that Bob's website contains a reflected XSS vulnerability.
3. Mallory crafts a URL to exploit the vulnerability, and sends Alice an email, enticing her to
click on a link for the URL under false pretenses. This URL will point to Bob's website
(either directly or through an iframe or ajax), but will contain Mallory's malicious code,
which the website will reflect.
4. Alice visits the URL provided by Mallory while logged into Bob's website.
5. The malicious script embedded in the URL executes in Alice's browser, as if it came directly
from Bob's server (this is the actual XSS vulnerability). The script can be used to send Alice's
session cookie to Mallory. Mallory can then use the session cookie to steal sensitive
information available to Alice (authentication credentials, billing info, etc.) without Alice's
Persistent attack:
1) Mallory posts a message with malicious payload to a social network.
2) When Bob reads the message, Mallory's XSS steals Bob's cookie.
3) Mallory can now hijack Bob's session and impersonate Bob.
There are three methods to solve XSS:
2) CORS filter
3) Proxy Controller Solution

3.3 JSON-P
JSON is a lightweight data-interchange format. It was formally standardized by Douglas
Crockford, and since has been received almost universally as a simple and powerful
representation of data for transmission between two entities, regardless of what computer
language those entities run in natively.


One such mechanism which can request content cross-domain is the <script> tag. JSONwith-padding) [17] is used as a way to leverage this property of <script> tags to be able to
request data in the JSON format across domains. JSON-P works by making a <script>
element (either in HTML markup or inserted into the DOM via JavaScript), which requests to
a remote data service location. The response (the loaded "JavaScript" content) is the name of
a function pre-defined on the requesting web page, with the parameter being passed to it
being the JSON data being requested. When the script executes, the function is called and
passed the JSON data, allowing the requesting page to receive and process the data.


As you can see, the remote web service knew what function name to call based on being told
the function name in the URL that made the request. As long as that function is in fact
defined in the requesting page, it will be called and passed the data it was asking for.

The problem
Thus far, JSON-P has essentially just been a loose definition by convention, when in reality
the browser accepts any abitrary JavaScript in the response. This means that authors who rely
on JSON-P for cross-domain Ajax are in fact opening themselves up to potentially just as
much mayhem as was attempted to be avoided by implementing the same-origin policy in the
first place. For instance, a malicious web service could return a function call for the JSON-P
portion, but slip in another set of JavaScript logic that hacks the page into sending back
private user's data, etc.
JSON-P is, for that reason, seen by many as an unsafe and hacky approach to cross-domain
Ajax, and for good reason. Authors must be diligent to only make such calls to remote web
services that they either control or implicitly trust, so as not to subject their users to harm.


The proposed solution

For now, JSON-P[17] is a viable solution for cross-domain Ajax. While CORS may represent
a more direct and less hacky way of such communication, it should probably be deployed in
tandem with JSON-P techniques, so as to account for browsers and web services which do
not support CORS. However, the safety concerns around JSON-P are valid and should be
So, a stricter subset definition for JSON-P is called for. The following is the proposed
"standard" for only what should be considered valid, safe, allowable JSON-P.

The intention is that only a single expression (function reference, or object property function
reference) can be used for the function ("padding") reference of the JSON-P response, and
must be immediately followed by a single ( ) enclosing pair, inside of which must be a strictly
valid and parse able JSON object. The function call may optionally be followed by one ;
semi-colon. No other content, other than whitespace or valid JavaScript comments, may
appear in the JSON-P response, and whitespace and comments must be ignored by the
browser JavaScript parser (as would normally be the case).
The most critical piece of this proposal is that browser vendors must begin to enforce
this rule for script tags that are receiving JSON-P content, and throw errors (or at least
stop processing) on any non-conforming JSON-P content.
In order for the browser to be able to know when it should apply such content-filtering to


otherwise be seen as regular JavaScript

content, the MIME-type

"application/json-p" and/or "text/json-p" must be declared on the requesting <script> element.

Furthermore, the browser can enforce that the response must be of the matching MIME-type,
or fail with errors as well.
It is fully known that existing browsers which don't support CORS also will not likely be
updated to support this JSON-P strict enforcement, which means that users in those browsers
will not be protected by their browser. However, all current browsers can add support for this
content filtering, which would provide safer JSON-P for current browser users who are


consuming data from web services which do not yet support CORS (or for which the author
does not want to use CORS for whatever reason).
It's also recognized that this stricter definition may cause some "JSON-P" transmissions,
which rely on the looser interpretation of just arbitrary JavaScript content, to fail. But this
could easily be avoided by having the author (and the server) avoid referring to that content
with the strictly defined JSON-P MIME-types as described above, which would then prevent
the browser from selectively turning on such filtering.


CORS Filter [18]
1) Cross-Origin Resource Sharing (CORS) for your Java web apps
2) Implements the new W3C mechanism for cross-domain requests
3) Quick and transparent fit to new and existing Java web apps

CORS Filter is the first universal solution for fitting Cross-Origin Resource Sharing (CORS)
support to Java web applications. CORS is a recent W3C effort to introduce a standard
mechanism for enabling cross-domain requests in web browsers and participating servers.

The philosophy of CORS

CORS works two-fold:

1) From a browser script perspective: By allowing cross-domain requests, which

however are subject to tighter controls on the types of data that is exchanged.
Cookies, for instance, are blocked unless specifically requested by the XHR author
and allowed by the remote web service. This is done to reduce the said risk of data
2) From a web service perspective: By utilizing the origin URL reported by the
browser the target cross-domain web service can determine, based on its origin policy,
whether to allow or deny the request.

The original CORS specification is available at

Note that in order for CORS to work, it must be supported by both browser and web server.


Bear in mind that CORS is not about providing server-side security. The controls that it
imposes are primarily to protect the browser, and more specifically - the legitimate JavaScript
apps that run in it as well as any confidential user data (cookies) from some cross-site
exploits. Remember, after all, that the Origin request header is supplied by the browser and
the server has no direct means to verify it.

FIG. 3.2 : Use of CORS Filter to solve cross site resource sharing problem
The CORS Filter, as the name implies, implements the clever javax.servlet.Filter interface. It
intercepts incoming HTTP requests and if they are identified as cross-origin, it applies the
proper CORS policy and headers, before passing them on to the actual targets (servlets, JSPs,
static XML/HTML documents).
This transparent nature of the CORS Filter makes it very easy to retrofit existing Java web
services with a CORS capability. Just put the CORS JAR file into your CLASSPATH and
enable it with a few lines of XML in your web.xml file. The CORS Filter implementation is
extremely efficient too - it takes less than 25K of bytecode.


Final solution includes using an intermediate server as proxy server. Now all XHR requests
will be handled by proxy and it again sends request to the concerned server for data. Proxy
receives the data and transfer back the data received to the source who initially made the
request to proxy. This solution seems most elegant for our purpose as it solves both of the
problems. We want to insert our javascript into third party webpage so since the proxy is


implemented by us. We can augment the proxy server code to process the document (or
webpage) and insert the javascript link in the page.
We also need to process the document at proxy side or after receiving in iframe. I have
processed the document at proxy level itself because it reduces the document size which is
shipped over the network and client side work gets reduced. So the processing of document
means removal of hrefs, stopping all the click events, updating the links and sources of
image, java scripts, style sheets to absolute from relative paths so that page when make
request from iframe for a resource, that request can be completed.

FIG. 3.3: Activity Diagram to load a webpage in I-frame


Locating an element on web page
HTML provides an attribute named as id which is used to identify a particular element on
the web page. The problem occurs when we want to locate an element having no id attribute
specified. So we need to develop some algorithm to find out an element when id attribute is
not specified.


Array path = null;
if (element_Id isNot null)
saveelement_id and path;
Update element element_parent;
Goto 2;



getElement(element_id, path[])
size = path.length()-1;
set Element getElementfromId(element_id);
while (size>=0)
Element Element_child(path(i));
Return Element;



Algorithm for identifying whether two web pages are structurally identical
There are two approaches by using which we can identify the structural identicalness of two
documents. These are:
1) URL structure
2) DOM structure
Generally, all the e-commerce websites having similar layout pages have some common URL
For example, analyze these some of the e-commerce websites URL structure- All product pages end with html All product pages ends with /buy All product pages url contain pid of products.


But there is no information as such which can be generalized. Although this kind of
comparison will be quite fast but such algorithm is not scalable and is not applicable to all ecommerce websites.
Moving on to second approach i.e. utilizing DOM structure which seems more appropriate as
there is so much information which can be utilized like the no of elements, location of
specific elements etc. This approach seems to work for all websites.
Algorithm for Similar Page Matching using DOM Structure
1) Fetch all elements of doc1 and doc2.
2) Traverse both the lists and when tagName matches, then compare the elements with
3) if Id matches, then recursively call the isSimilar function for these two elements.
4) After removing all the elements with same id in both documents, we compare the and
their parent tag names as well as their parent class names.
5) If all these things match, then we suspect are these similar element and we recursively
call the isSimilar method over these two elements
6) After removal of these elements we are left with all those elements which are either
additional in one of the documents.
7) But still some of the elements are common but without any identifier like class or id.
8) So this time we make use of location parameter. All the elements which have same
tag name and same location and same parent are called recursively to check if they are
9) Criteria to identify if the elements are similar, if after removal of similar elements the
final number reduces to 10% of its initial number of elements.
10) Then we say both of the documents are structurally similar.


The various steps in the entire problem can be summarized and divided into the following
broad headings:
1) Developing the common social connect plug-ins (like, login, wishlist).
2) Developing a GUI based platform where any merchant can open his webpage which
3) Developing an intermediate proxy server.
4) Using Jsoup HTML Java parser to process the document.
5) Developing a javascript to be inserted by proxy which shows selected element on
6) Developing a javascript which shows all the plug-ins available and sends the information
corresponding to the plug-in selected and placed by the website administrator on his
website to the H2 database.
7) Developing a javascript which merchant will include in his website after placing the plugin which queries the database for the plug-ins corresponding to the URL.
8) Developing an algorithm to find whether two html pages are structurally identical or not.
This algorithm is used to plug-inize all the pages similar to the page on which merchant
has placed the plug-ins.

Database Schema
1. Table FBUserActivity (userId, action, objectName, objectUrl, activityTime, client );
2. Table SimilarPages(pageUrl, baseUrl);
3. Table WidgetStore(pageUrl, widgetName, widgetId, JSONdata);
4. Table pageSketchStore(pageUrl, pageSketch);


On Page Load Steps to fetch Plug-ins

FIG. 1 FlowChart of fetching plug-ins using similar pages



The following are some of the observations made with respect to usage of spritesheets
1) Current Approach: Integrating each plug-in into a webpage is itself a very time

consumable job. Imagine what life becomes when you have to repeat these steps for
thousands of times. So current approach is not well suited as it is very time
consumable and requires a lot of manual work.

2) Give live demo of how page looks after integrating plugin: This feature in our

algorithm allows the merchant to get a live response from his website about how he
should customize his plug-in. He does not need to set the properties again and again
and check the interface separately. All these steps have been integrated into one and
interface of integrating plug-in becomes a lot easy.
3) One done all done algorithm saves a lot of time and manual effort: This

significantly decreases manual effort and a lot of time for the merchants. Now a
merchant is set to integrate social plug-ins into his website and his website is up and
ready within minutes. While earlier it required hours and days of time to do the same

4) Development of algorithm to identify an element uniquely on a page : We have

developed an algorithm which takes an html DOM object as input and encode the
element such that using the encoded information we can reach the element. This
algorithm will work even if the html dom object does not contain id attribute. This
algorithm is inspired from the concept of xpath but it also has certain limitations.
Since it is based on concept of xpath this algorithm will not work if the document
changes or even if the document obhects are shuffled. Any modification in the
document which leads to any disturbance in path of the element stored will make this
algorithm fail. But since ecommerce product pages are not changed frequently so this
algorithm works well for such cases.









Integrating plug-ins into an e-commerce site was never so easy. Imagine what life becomes
when you have to insert some piece of code for thousands of times and you also need to find
track of where to place the plug-ins i.e where the respective code will lie.So current approach
is not well suited as it is very time consumable and requires a lot of manual work. Still many
e-commerce websites have to do the same process as there is no other alternative in the
market. So here we step in with our new enriched and advanced plug-in integration
algorithm. Our algorithm has two main features. Firstly merchant can visualize the page and
customize the plug-in settings accordingly. This feature in our algorithm allows the merchant
to get a live response from his website about how he should customize his plug-in. He does
not need to set the properties again and again and check the interface separately. All these
steps have been integrated into one and interface of integrating plug-in becomes a lot easy.
Secondly, we have tried to reduce manual effort by significant amount. One done all done
algorithm significantly decreases manual effort and a lot of time for the merchants. Now a
merchant is set to integrate social plug-ins into his website and his website is up and ready
within minutes. While earlier it required hours and days of time to do the same process.



Integration of Auto-customization of plug-ins

In current architecture, merchant (website admin) has to customize the plug-ins manually. We
have tried to reduce this manual effort by some extent. But it is also possible to develop an
algorithm which can mark all the empty spaces on merchants page and list all the plug-ins
which can fit into some space. Moreover, plug-ins can customize its color and other styles
according to the merchants page cascading style sheet. So this process of automation will
make this product even more usable and attractive.

Integration of Search
Currently, merchant pays a lot of money to search engines and provide a list of keywords and
corresponding url. So whenever a user comes from the search engine search which is of two
types organic and inorganic (paid search) These paid search results are not fully correct so
there are high chances that some users instantly leave the site when they do not see the
ecpected the products. So in case such a widget comes into market integration of these kinds
of widgets will be very easy using our platform. There is no need to dive into the code.
Currently work is going on this and will be seen in near future.
Improve performance of similar pages algorithm
We have developed an algorithm which utilizes the DOM structure to identify the structural
identical nature of two documents. Using this algorithm we can tell whether two documents
are structurally identical or not. Tbis algorithm is highly useful when we integrate plug-ins
into a website which have a lot of pages with same structure. For example: take an
ecommerce website. All the product pages have same structure and all the category pages
have same structure. So we just need to integrate plug-ins into two page i.e. one product page
and one category page. Rest all the pages will be handled by the algorithm and it will
integrate plug-ins into rest of the pages. Currently the algorithm is not time efficient, use
multiple recursive calls and performance in terms of time and space is not good. So it can be
optimized by handling various cases.


1. as cited on Feb 5, 2012

2. as cited on Feb 5, 2012

3. as cited on Feb 10, 2012

4. as cited on Feb 15, 2012

5. as cited on Feb 15, 2012

6. as cited on Feb 15, 2012

7. as cited on Feb 25,2012

8. as cited on Mar 13, 2012

9. as cited on Mar 23, 2012

10. as cited on Mar 30, 2012

11. as cited on June 10, 2012

12. as cited on June 10, 2012

13. as cited on June 15, 2012

14. as cited on June 15, 2012

15. as cited on June 17, 2012


16. as cited on June 20, 2012

17. as cited on June 20, 2012

18. as cited on June 20, 2012