You are on page 1of 8

Designing Rich Internet Applications

For Search Engine Accessibility


Introduction

Rich Internet Applications create new opportunities. The just link to a particular web site, but links directly to a specific
most fundamental of these is the ability to create Single Page page within the site. It is this MPI paradigm which informs the
Interfaces (SPIs). A SPI is an interface that consists of a single robots which are used by search engines such as Google or
HTML page. Additional information that is required, when Yahoo to index the information in web sites. Search bots are
the user clicks on a ‘link’ or when some other event occurs, software agents that ‘crawl’ through web sites; they start at
is not supplied by means of a traditional full page reload, the index page and, after categorizing all of the information
but is instead retrieved via an XML message. The original on the page; they follow the links on this page to other pages
page remains intact, its contents or state is simply updated on the site. In this way they crawl through the entire web site,
by the contents of the XML message. JavaScript is used to visiting any page that has been linked to using a link tag of
facilitate this whole process. Although it is not mandatory to the type:
create a SPI, when using Backbase’s software, a SPI provides a
more intuitive user interface and smoother user experience. <a href=”nextPage.html”>Next Page</a>
There are a few questions that need to be answered however
when you make use of this new paradigm. One of the main However in an SPI, the linked page structure that the search
questions is that of search engine accessibility and deep bot is expecting, has been extended with BXML commands,
linking. which indicate the use of include files, load commands and
form submissions, which only partially update the page,
The web sites that have been created up until now, consist instead of causing a full reload as is the case with normal
almost entirely of Multi Page Interfaces (MPIs). These web forms. Since search bots aren’t proper web browsers, they
sites and applications consist of multiple unique pages, which don’t understand or execute any JavaScript. This means that
may or may not have been dynamically generated. Since each a Backbase SPI needs to be specifically designed to work
page, and for dynamic pages every page state, has a unique with these search bots.
URI; it is very easy to link to any page or state within this site.
Navigation between pages is done by the user clicking on This article puts forward a set of guidelines, which you can
links or submitting forms, both of which contain the location use to design your SPI for maximal search engine accessibility
and state information for the new page. It is these unique and shows you techniques to allow for deep linking into your
URIs that make deep linking possible. Deep linking does not SPI.

Page  of 8 Designing Rich Internet Applications For Search Engine Accessibility


Making SPIs Search Engine Accessible

Several approaches are available for making your web site Use a keywords meta element with a content attribute
accessible to search engines; these approaches differ in containing some appropriate keywords. For example:
the level of indexing, which is obtainable and how this is
achieved. For certain sites, it is not necessarily a requirement <meta name=”keywords” content=”WebMail, e-
that every part of the site can be indexed by search engines. mail, bxml, mail” />
For example, a site, which provides a web-based e-mail
service, does not require every single piece of information on Use a description meta element with a content attribute,
the site to be indexed by a search bot. Other sites, however, which contains a relevant description of the web page.
do require that every piece of information can easily be found The value of this element is often printed as part of a
and indexed by search engines. For example, a web site with search result by Google. For example:
information about the courses provided by a university is
such a case. Backbase has identified the following strategies <meta name=”description” content=”A Free
for getting a SPI indexed by search engines: BXML WebMail application. This unique
WebMail application offers the look and feel
of a normal Windows application, with the
Lightweight Indexing: no structurally changes are made ease and portability of a web-based client.”
to your site; existing tags such as meta, title and h1 are />
leveraged.
Place key content within the main HTML structure and
Extra Link Strategy: extra links are placed on the site, not in an include file, or some other dynamically loaded
which search bots can follow and thereby index the whole content. If possible, place this important content within
site. a h1, h2 or h3 element, since search bots deem these to
contain more important information. Remember that
Secondary Site Strategy: a secondary site is created, these tags can be styled in anyway you want using CSS.
which is fully accessible to the search engine.
It should be noted that these points can also be put to good
For each of these strategies the following questions will be use, in the design of your SPI, in conjunction with the extra
answered: link strategy or the secondary site strategy.

To what extent is the content of the page indexed? In summary by using this lightweight-indexing strategy
only the content supplied by the title and meta elements and
Can links be followed on the page (e.g. link elements (<a those elements that are directly located on the index page
href=”xx”>) or s:include elements)? is indexed. No links of type s:include are followed; therefore
there is no requirement to deal with redirection. This is not a
When a link is followed by the search bot, what is the very full indexing scheme, but it is extremely simple to apply
status of the URL that is being indexed. Can this URL be to your site.
displayed by browsers or will some type of redirection be
required?
The Extra Link Strategy
Lightweight Indexing There are two main approaches to making a site fully
indexable by search engines: the extra link strategy and the
This strategy should be used if only certain key information secondary site strategy. The extra link strategy is the easiest
needs to be indexed by search engines. In this case it is of these two to implement and it can make the site entirely
recommended that you take the following steps when indexable by search engines, but does not create a secondary
designing your SPI: site in normal HTML and is therefore not accessible to older
browser, which are incompatible with BXML. The essence of
Use a title element in the document head, preferably this strategy is to create an extra link on the main SPI index
containing one or more keywords that specifically relate page for each include file, whose contents you wish to be
to the contents of the site. For example: indexed. Some experimentation has revealed that the extra
links must be of the type:
<title>BXML WebMail – Sign In</title>
<a href=”include1.html”>include 1</a>

Designing Rich Internet Applications For Search Engine Accessibility Page  of 8


The following points must be followed, if you want Google to
index these pages: <meta http-equiv=”refresh”
content=”0;url=index.html” />
The link must be made by an a element and the include
file must be indicated by the href attribute. Once the browser has been redirected to the SPI index page,
this page must parse out the referrer and trigger an event
The include file must have the .html or .htm file extension. handler, which will update the state of the SPI accordingly.
This is a bit of workaround, since in reality include files This process of detecting deep linking and updating the
aren’t proper HTML files but are instead XML files. However page state is explained in much more detail in the appendix
if you use a div element or a similar HTML element as the at the end of this document.
root tag, then all modern browsers will be able to read the
file as if they were HTML and Google will index it. As far In summary the extra link strategy makes the whole site fully
as the BPC (Backbase Presentation Client) is concerned, it indexable. By adding extra link elements search bots are able
merely stipulates that a include file should be well-formed to index all pages of the site. However since the URLs of the
XML and isn’t interested in which file-type extension it pages that get indexed, point to include files, which aren’t
uses. fully BXML-capable pages, it is necessary to redirect normal
browser back to the SPI version of the site and then update
NB: The include files should not have a XML declaration or the state of this SPI accordingly.
a document type definition, otherwise Internet Explorer
will be unable to accept .html or .htm files as include files.
The Secondary Site Strategy
The link tag must have some text content. Without this
Google will simply ignore it. The secondary site strategy is the most complete of all of the
indexing strategies. It is also the most labor intensive. The
No attempt should be made at using HTML to hide these site should be made out of plain HTML and contain a linked
links, since Google frowns on this and may not index such multi-paged structure. Though this may seem laborious;
pages. You can however use BXML to remove or hide having a secondary site to fall back upon makes your site
these links, by way of a construct event handler, as shown available to people that are using older browsers, which
in the example below: aren’t supported by Backbase, as well as browsers on mobile
devices and to disabled people. This gives you a chance to
<div> make your site accessible to all users, not just search engines.
<s:event b:on=”construct”>
This strategy has three important components:
<s:setstyle b:display=”none” />
</s:event>
<a href=”leftPanel.html”>Left Panel</a> 1. Generating the secondary site’s pages.
<a href=”rightPanel.html”>Right Panel</a> 2. User-agent detection of both the search bots and
</div> BXML-compatible browser.
3. Redirection of browsers and the detection of this
It is not necessary to detect the user agent of the search bots redirection, which allows the status of the SPI to be
(see the appendix at the end of this article for full details of updated to reflect this deep linking.
this process), since they will simply follow the extra links that
are provided for them. However it is necessary to do some Generating the Search Engine Accessible Pages
detection when these include files are being served up. This
is tricky since these include files can be requested by the user The search engine accessible pages can be generated
in two different ways. When a user is directed to one of these in several ways. It is possible to manually generate the
pages through a search engine, they need to be redirected secondary fall-back site. It is also possible to automate this
to the main index page. On the other hand when the BPC process using XSLT.
requests theses pages as include files, no redirection should
occur. Due to the fact that both search bots and the BPC Manual Site Generation. This is a simple, lo-tech solution,
ignore meta refresh tags it is possible to solve this problem. but it is also labor-intensive, since you have to build two
Such a meta refresh tag must be included directly inside the versions of your web site. There is also a danger that when
body of the include file. Even though these tags are normally you update your site with new information, you will forget to
placed inside the head element, they will still be executed update the secondary pages. This will cause the two versions
anywhere in the body by all BXML-compatible browsers. of the site to be out of sync with each other and for the
Below is an example of such a meta refresh tag: information found on search engines to not be up to date.

Page  of 8 Designing Rich Internet Applications For Search Engine Accessibility


XSLT-Driven Generation. An alternative strategy, which is The full solution to this problem consists of two parts. Firstly,
especially effective if you use a content management system BXML compatible browsers need to be redirected to the
(CMS), is to store all of the information, or at least the ‘copy’ SPI version of the site. And secondly, the SPI version needs
for your site as plain XML. This can be in a format defined to detect that it has been redirected from one of these
by yourself or your CMS. This XML must then be transformed deep linked pages and then update the state of the page
into BXML using an XSLT. A second, much simpler, XSLT accordingly, so that the information relevant to this link is
is used to transform the XML into the secondary, search- shown.
engine accessible site. Although this approach requires a
little more effort when you initially develop the site, once Browser Redirection. When one of the MPI pages intended
both XSLTs are ready, new content can easily be added to the for the search engine, is requested, the user agent must
XML data source and then both versions will be generated be detected again. However, in this case when a BXML-
automatically. compatible browser is detected, it is redirected and not the
search bot. The browser is sent to the index page of the SPI
version of the site.
User-Agent Detection
Detecting Deep Linking. The BXML version of index.html
A vital component of this two-site strategy is browser needs to ascertain from which page it was referred. This must
detection. Techniques that can be used for user-agent be done as soon as the page is loaded, so that the transition
detection are discussed in the appendix at the end of the appears to be seamless to the user. Full details of how to
article. Once the user agent has been detected it is necessary detect deep linking and how to update the page state can
to make sure that the BXML-compatible browsers get sent be found in the appendix at the end of this article.
to the BXML site and that the search bots and non BXML-
compatible browsers get sent to the accessible site. In summary the secondary site strategy makes the whole
site fully indexable. Since the search bot is redirected to a
normal HTML site, all links are followable by the search bot.
However since the URLs of the pages, which get indexed
Deep Linking and Browser Redirection
when the links are followed, point to non-BXML pages, it is
necessary to redirect normal browser back to the SPI version
This section looks at an issue that arises from having a
of the site and then update the state of this SPI accordingly.
secondary multi-paged version of your site that is indexed
by search engines. The solution to this problem also
immediately offers a solution to the issue of deep linking
in an SPI. The issue boils down to the fact that a site with
multiple pages is being used to represent a site that consists
of only a single page. Lets take an example to illustrate this
problem: a simple SPI, which consists of a main index page
that itself consists of a tabbed interface, which contains
three different tabs. The contents of each of these tabs will
be stored in a separate include file and be loaded into the SPI
as and when they are required. Therefore to make this site
indexable by a search engine, a MPI version of this site would
presumably have been made with one index page (e.g.
index.html) and three separate HTML pages representing
the include files for each of the tabs (e.g. tab1.html, tab2.
html and tab3.html). Now if a user’s search term closely
matched something indexed on the third tab, then the
search engine would point the user to tab3.html. However,
in reality, you do not want your user to be redirected to tab3.
html. Instead, you want him to be sent to the index.html
page of the SPI version of your site and when this page is
opened, the third tab, which correlates to tab3.html should
be selected.

Designing Rich Internet Applications For Search Engine Accessibility Page  of 8


Ethics

Google especially and presumably other search engines


deeply frown upon any attempts to try and unfairly
manipulate search results. Any site that is caught willfully
trying to manipulate Google will be banned from Google’s
index. Redirection to another site, with different content,
based on the user agent is technically called cloaking and
is frowned upon. Therefore, you should make sure that the
information conveyed by any secondary web sites, which
have been set up, with the intention of making your site
indexable by Google and other search engines, is exactly the
same as the information contained by your BXML site.

Page  of 8 Designing Rich Internet Applications For Search Engine Accessibility


Appendix

User-Agent Detection if (iIOGecko >= 0){


//extract the string directly after rv:
//and check value
A vital component of both the secondary site strategy and var iIOrv = sUA.indexOf(“rv:”);
the extra link strategy is browser detection. The technical var sRv = sUA.substr(iIOrv + 3, 3);
term for a web browser or a search robot or any other piece if (sRv >= ‘1.5’)
of software that approaches a web site is a user agent. When bCompatible = true;
a user agent requests a particular page, it supplies details of }
itself by way of one of the HTTP headers that are sent along
with the request. The Firefox browser for instance sends the //now if compatible redirect
following request header: if (bCompatible)
window.location.href = “bxmlIndex.html”;
}
User-Agent: Mozilla/5.0 (Windows; U; Windows
NT 5.1; en-US; rv:1.7.8) Gecko/20050511
Firefox/1.0.4 This function is relatively straightforward but certain parts
may need explaining. Firstly, both Netscape and Firefox
It is therefore relatively straightforward to write a script, browsers use the same Gecko core as Mozilla does. They
which determines what the user agent is, and then redi- also have similar User-Agent strings. Therefore, the function
rects the user agent to the appropriate version of the site. above firstly searches for a ‘Gecko’ sub-string, which all of
The most straightforward technique is not to try to find the their User-Agent string will contain. Once this sub-string has
search bots or other incompatible browsers, since this group been found, the function searches for the ‘rv:’ sub-string. This
is relatively large and hard to qualify. It is easier to determine is short for revision and it is followed by the version number
whether the user agent is a BXML compatible browser and of the Gecko engine. If this number is 1.5 or higher, then the
then assume that if the user agent isn’t one of these then it Gecko engine is BXML compatible. Therefore, this relatively
is either a search bot or an incompatible browser. The follow- simple function is able to test for all compatible Netscape,
ing browsers are BXML compatible: Firefox and Mozilla browsers.

• Internet Explorer 5.0 and newer Obviously, it is also necessary to test for compatible versions
• Mozilla 1.5 and newer of Internet Explorer too. This can be done in a similar way,
• Firefox 1.0 and newer but there is one added complication. All compatible versions
• Netscape 7.2 and newer of Internet Explorer have a User-Agent string that contains
the sub-string: ‘MSIE’, which is directly followed by the ver-
User-agent detection can be done on the server using a PHP, sion number. Below is an example of such a header from an
ASP or JSP script. There are standard libraries, which help take Internet Explorer browser.
care of this. Alternatively if you cannot or do not wish to use
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;
server-side scripts to determine the user agent, it is possible
Windows NT 5.1; SV1; .NET CLR 1.1.4322)
to do this in JavaScript. If you take this approach, you should
be aware of the fact that search bots cannot be expected to
However unfortunately Opera browsers have a very similar
execute any JavaScript. Therefore if you are using the sec-
User-Agent string:
ondary site strategy in conjunction with JavaScript based
detection, the default page provided by the initial page re- User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;
quest must be the non-BXML site, which is intended for the Windows NT 5.1; en) Opera 8.00
search engine bot. When you ascertain that the user agent is
a BXML-compatible browser, then JavaScript should redirect Therefore you must firstly test that the User-Agent string
the browser to the BXML version of your site. The following doesn’t contain the ‘Opera’ sub-string and once this has been
code fragment shows a simple JavaScript function, which ascertained, then simply parse out the version number which
tests whether a BXML-compatible Mozilla-based browser is follows the ‘MSIE’ sub-string.
in use and then redirects the browser based on this.

function testUA(){
var bCompatible = false;
var sUA = window.navigator.userAgent;
var iIOGecko = sUA.indexOf(“Gecko”);
//Test if the User-Agent string contains
//the string Gecko

Designing Rich Internet Applications For Search Engine Accessibility Page  of 8


Detecting Deep Linking and Updating the which page the referrer was, otherwise mistakes can be
Page’s State made. For such cases, more complicated JavaScript will be
required to verify this.
This section looks at how redirection based on deep linking
can be detected and then at how the state of a page can Now finally lets look at an example of the type of event
then be updated using this information. Deep linking can be handler that could be triggered by such an updateState
detected on the server by reading the Referrer HTTP request function:
header using a server-side script. Once the referrer has been
<s:behavior b:name=”redirect”>
read then an appropriate construct event handler must be
created, which updates the initial state. Alternatively, if you
... Other event handlers go here ...
do not have access to server-side scripting, you can use
a JavaScript function to do this. The js action is a special <s:event b:on=”tab3.html”>
BXML action, which is used to call JavaScript functions. The <s:task b:action=”select”
following behavior takes care of calling this function when b:target=”id(‘tab3’)” />
the page is loaded: </s:event>
</s:behavior>
<s:behavior b:name=”updateState”>
<s:event b:on=”construct”> This behavior contains an event handler for the custom event
<s:task b:action=”js” tab3.html, which is triggered by the JavaScript function when
b:value=”updateState();” /> redirection has occurred from the tab3.html page. All it does
</s:event>
is perform a select action on a target with an id of ‘tab3’. If this
corresponds to the appropriate tab, then simply by selecting
... Other event handlers go here ...
this tab, the tab should be loaded and become visible.
</s:behavior>

The updateState function, which this action calls, then needs


to parse out the referrer. Once this value has been found
the JavaScript function triggers an appropriate BXML event,
hereby passing control back to the BPC. This is done by calling
the execute method of the bpc object, with a BXML string. A
simple version of such a function looks like this:

function updateState(){
//first parse out the value of the referrer
//var sReferrer = document.referrer;
//do quick test to make sure that referrer
//is from the same host
if(sReferrer.indexOf(
window.location.hostname) >= 0){
var iLastSlash =
sReferrer.lastIndexOf(‘/’);
var sValue =
sReferrer.substr(iLastSlash + 1);
//trigger an event with the same name as
//the referrer
var sExecute = ‘<s:task
b:action=”trigger” b:event=”’ + sValue
+ ‘” b:target=”id(\’main\’)” />’;
bpc.execute(sExecute);
}
}

You should note that this is a very simplistic implementation of


such a referrer parsing function. For a more complicated web
site structure, it is important that it is totally unambiguous

Page  of 8 Designing Rich Internet Applications For Search Engine Accessibility