You are on page 1of 11

Mark D.

Wilkinson
is a Research Associate of
BioMOBY: An open source
Bioinformatics at the Plant
Biotechnology Institute of the
National Research Council of
biological web services
Canada. His research interests
include database
interoperability, user interface
proposal
design and floral developmental
Mark D. Wilkinson and Matthew Links
Date received (in revised form): 24th September 2002
mutants.
Matthew Links
is a bioinformatician at the Abstract
University of Saskatchewan.
His research interests are
BioMOBY is an Open Source research project which aims to generate an architecture for the
focused on the application of discovery and distribution of biological data through web services; data and services are

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


Beowulf supercomputers to decentralised, but the availability of these resources, and the instructions for interacting with
bioinformatics problems, and them, are registered in a central location called MOBY Central. BioMOBY adds to the web
the application of wavelet
services paradigm, as exemplified by Universal Data Discovery and Integration (UDDI), by
analysis to genome annotation.
having an object-driven registry query system with object and service ontologies. This allows
users to traverse expansive and disparate data sets where each possible next step is presented
based on the data object currently in-hand. Moreover, a path from the current data object to a
Keywords: BioMOBY, web desired final data object could be automatically discovered using the registry. Native BioMOBY
services, interoperability, I3C, objects are lightweight XML, and make up both the query and the response of a simple object
UDDI access protocol (SOAP) transaction.

INTRODUCTION • The project must be fully Open Source


A vast number of biological data hosts and (OS).6
analytical services have arisen in the wake
of the explosion of available genome • Existing standards should be used,
sequence information over the past where appropriate, to enhance
decade. With few exceptions, these interoperability.
disparate hosts and services distribute their
data in an uncoordinated manner via • The system should treat retrieval and
distinct CGI-based interfaces. The analytical services identically.
BioMOBY project was established to
address the problem of discovering and • Applications should not need to be
retrieving related pieces of biological data hard-coded to a remote, possibly
from multiple hosts and services by unstable application program interface
attempting to generate a standardised (API).
query and retrieval interface using
consensus object models. • Biological data centralisation should be
Mark D. Wilkinson, In September 2001, representatives avoided.
BioMOBY Project, from the model organism databases, and
Plant Biotechnology Institute,
other interested parties met at the first • User-defined queries should be
National Research Council Canada,
110 Gymnasium Place, MOBY-DIC (Model Organism Bring facilitated as much as possible.
Saskatoon, Your own Database Interface
Saskatchewan,
Conference) meeting. After considering • Preference should be given to simple
Canada S7N OW9
examples of successful data/service and lightweight data objects to
Tel: +1 306 975 5279 integration such as AceDB,1 DAS2,3 and minimise unnecessary network traffic,
Fax: +1 306 975 4839
E-mail:
ISYS,4,5 principles were established upon simplify the creation of clients and
mwilkinson@gene.pbi.nrc.ca which a solution would be founded: servers, and exploit the most basic

HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002 331


Wilkinson and Links

threads of commonality between pieces complexity of interacting with numerous


of data. disparate data retrieval systems by
providing a simple, common format able
The ‘investment’ for
• The ‘investment’ for service providers to describe a wide range of host interfaces
service providers should implementing BioMOBY should be in a machine-readable manner. Finally, it
be minimal minimised to encourage rapid and provides a common format for the
widespread participation in the project. representation of retrieved data, regardless
of origin, eliminating the need for endless
The BioMOBY project shares ‘cutting and pasting’ and enhancing the
similarities with other interoperability and connections between disparate hosts.
integration projects, but has some novel Each host is a ‘service provider’, and all
aspects, such as a knowledge-driven query service providers, and their service types,
system and service-type hierarchy, which are recorded in a central registry. Services

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


strive to better reflect the way biologists may be as simple as the retrieval of a
explore and manipulate their data. In sequence based on an ID, or as complex
addition, the OS and community-driven as performing the determination of, and
nature of the project allows the alignment between, a functional domain
framework to quickly evolve to include of interest among several hundred gene
All competent services new data types and services as they arise. sequences. From the perspective of the
are presented to the
end-user, all services that are competent
end-user
BIOMOBY OVERVIEW to act on the data in-hand are presented,
Scientific questioning rarely begins in a and execution of a chosen service can be
void; researchers usually will have a piece automated. In addition, cross-references
of data in-hand that they wish to may be provided at any step, allowing the
investigate more deeply. This initiates a scientist to branch off and gather
knowledge-driven line of questioning supplementary information about the data
spanning the entirety of the expansive they have just received. Thus, the
biological data space: discovery and analysis process quite
literally becomes as simple as ‘surfing’.
What known genes are similar to the
The BioMOBY project is limited in
gene I just cloned? Of those, which are
scope, focusing on the areas of service
annotated as cyclins? Show me an
description, discovery, transaction and
alignment of those genes. . . . This one
simple input/output object type
looks quite different from the others
definitions. Individual implementations of
and more similar to mine – show me
BioMOBY may add to this foundational
the alignment in this active site
set of functionality; similarly, client
compared to the consensus. What folds
programs may also expand on the
or domains does it have? Who
specification to include additional useful
annotated this as a cyclin? What did
features such as logging and automated
they describe as its mutant phenotype?
workflow discovery, however these
What is its expression pattern through
things do not exist in the BioMOBY
the cell cycle? Bring up the article
specification.
where this information is published.
What is the mailing address of this
author so that I can write to them for a OVERALL ARCHITECTURE
reprint? The primary components of the
The BioMOBY project facilitates this BioMOBY architecture are:
BioMOBY facilitates the
scientist’s flow of
flow of thought by providing an
thought information discovery system that • MOBY Objects – the structured data
identifies and retrieves related data based passed between client and server, the
on a priori or in-hand knowledge. Further, templates for which are stored at
it insulates the bench scientist from the MOBY Central.

332 HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002


BioMOBY

• MOBY Central – a registry holding as the input to new service requests; and
the input and output object types of all backwards compatibility is achieved by
registered services, the URL for these allowing an existing client to interpret
services, and their service types. new complex objects by examining the IS
A/HAS A relationships of those objects
• Object and Service hierarchies – two and extracting only the data it can
hierarchies describing the relationships manage; finally, data providers are
between MOBY Objects and MOBY encouraged to conform to pre-existing
Service types respectively. data formats rather than create new ones.
Non-native objects can
be passed by BioMOBY Non-native objects, though they can
be discovered and transported by the
MOBY OBJECTS BioMOBY system, are currently not fully
An ID number is often sufficient to described by the inheritance hierarchy,

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


uniquely describe a piece of data, and is nor are they fully described by the XML
common to all representations of that data Schema. Though the BioMOBY project
regardless of its instantiation, or the is more concerned with the discovery and
application displaying it. Similar to the transport of objects, it is likely that
ISYS application-integration system,4 additional ontologies will be developed to
BioMOBY attempts to avoid issues of describe the full relationship between the
data standardisation, preferring to focus various object types passed in the
on data integration. In contrast to many BioMOBY system. These issues are
other data representation projects,7,8 discussed in more detail in a later section.
Objects in BioMOBY BioMOBY is ‘minimalist’ in its data
are ‘minimalist’ templates. MOBY Objects are The BioMOBY Triple
lightweight XML, whenever possible
The smallest unit of information that is
consisting of only a unique data identifier,
passed by BioMOBY is the MOBY
the ‘MOBY Triple’ (described below).
Triple – a unique identifier consisting of
This minimal data may be supplemented
three elements: the MOBY Object type,
with additional ‘payloads’. A payload may
a commonly used identifier namespace
be native to the MOBY system, or may
(eg a Genbank accession number), and a
be an object from a foreign object model
value representing an instance of that
The MOBY Triple is the system such as Microarray Gene
namespace. The ‘root’ of the MOBY
root of all objects Expression Markup Language
Object hierarchy has the name ‘Object’,
(MAGE-ML9 ) Objects from the Object
thus the base Triple takes on this type
Management Group (OMG).7 The
identifier. Triples may or may not enclose
payloads of native MOBY Objects are
additional information, described as their
organised in a hierarchy.
‘payload’.
Thus, for example, a base MOBY
BioMOBY Object hierarchy Object representing the Genbank record
The structures of native MOBY Objects,
for the Arabidopsis APETALA3 mRNA
defined using World Wide Web
would be represented by the following
Consortium XML Schema (XSD10 ),
Triple:
reflect a hierarchical relationship. This
organisation provides several advantages: ,Object namespace¼"Genbank/AC"
input and output data have a predictable ID¼"AY070397.1"/.
structure; ‘heavy’ objects can be derived
by extending existing objects (Child IS A A slightly more complex object with the
Parent; inheritance-type relationship), or same identifier would be the
combining simple objects to create more VirtualSequence object. VirtualSequence
complex objects (Child HAS A parent; objects inherit from Object, but have an
container-type relationship); sub-objects additional ‘Length’ element as the payload.
within complex objects may be extracted Thus the same data could be cast as:

HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002 333


Wilkinson and Links

,VirtualSequence namespace¼ carry more than simply synonymous


"Genbank/AC" ID¼"AY070397.1". references to the primary triple, but also
,Length.960,/Length. to related data objects, as known by the
,/VirtualSequence.
service provider.
Sequence objects further inherit from A fully fleshed-out MOBY Sequence
VirtualSequence, and add a object for the Apetala3 mRNA might be
‘SequenceString’ payload attribute: structured as follows:
,Sequence namespace¼"Genbank/AC"
,Sequence namespace¼"Genbank/AC"
ID¼"AY070397.1".
ID¼"AY070397.1".
,CrossReferences.
,Length.960,/Length.
,Object namespace¼
,SequenceString.
"PubMed/PMID" ID¼"11959818"/.
aacaaaaagattaaacaaagagag
,Object namespace¼
aagaat

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


"Genbank/GI" ID¼"17979335"/.
atggcgagag ggaagatccagat
,Object namespace¼
caaga. . .
"Genbank/Taxa" ID¼"3702"/.
,/SequenceString.
,Object namespace¼
,/Sequence.
"GO/Acc" ID¼"GO:0016563"/.
,/CrossReferences.
Since every piece of data passed by ,Length.960,/Length.
BioMOBY must inherit from the base ,SequenceString.
‘Object’, all objects in the BioMOBY aacaaaaagattaaacaaagagag
system must have a Triple regardless of aagaat
atggcgagag ggaagatccagat
the contents of their payload. These
caaga. . .
lightweight objects represent the core of ,/SequenceString.
all MOBY transactions, allowing the rapid ,/Sequence.
‘skimming’ over data sets without the
overhead of passing fully fleshed-out data
models, particularly when working with
MOBY CENTRAL
large query/result sets. At any time more
BioMOBY enables the discovery of
details may be retrieved, in the form of
services by having a centralised registry.
either a more complex native MOBY
Service providers register themselves with
Object, or alternatively an object from
‘MOBY Central’, including the
another object standard.
following:
The MOBY Triple is similar in many
ways to the recently proposed Life
Sciences Identifier (LSID11 ). The primary • The service name.
components of the LSID are contained in
the Triple – Authority, Namespace and • Service type – from the Service-type
ID – with the addition of a type field hierarchy.
indicating how the data indicated by the
identifier are instantiated in the MOBY • Input object(s) – by name from the
Object payload. Object-type hierarchy.

BioMOBY cross-references • Output object(s) – by name from the


In addition to the Triple and the payload, Object-type hierarchy.
Objects contain cross- MOBY Objects may optionally carry an
referencing Triples additional Cross-Reference Information • A URI identifying the service provider.
Block (CRIB). The CRIB is a list of
cross-referencing MOBY Triples which, • The URL to the service script.
being the base form of a MOBY Object,
may be used directly as the input to new • A human readable description of the
queries. Cross-references are meant to service.

334 HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002


BioMOBY

The MOBY Central The MOBY Central API allows broadly applicable system in the
Registry API allows the queries for service providers based on biological context.
end-user to wander service-type, input, output and/or
through the data
authority. Queries based on input type Recently, development of UDDI has
allow the user to ‘wander’ through the been assumed by the Organization for the
data space with each next step being Advancement of Structured Information
offered as a list of services able to act on Standards (OASIS),13 and the restrictive
the data objects they have received from licence has been removed. Thus none of
the previous service transaction (or one of the arguments above preclude our use of
their cross-references). In addition, since UDDI at some point in the future;
output objects may immediately be used however, BioMOBY is a research project
as input to a new service, this registry attempting to define biologically intuitive
structure also allows the discovery of a registry behaviour. Once this behaviour is

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


‘workflow’ or ‘pipeline’ through a series determined, we can assess if UDDI is
of services to obtain a desired output sufficient to achieve this end and, if so,
object based on the object in-hand. build the BioMOBY system on top of
Once the desired service has been UDDI, or alternately request an extension
selected, the client requests more detailed of UDDI that is agreed upon by OASIS
service instructions. MOBY Central and other interested parties.
communicates the required transaction
information in the form of a Web MOBY Central API
Services Description Language (WSDL) The current MOBY Central API has
document, which is generated by MOBY three types of calls: register/deregister,
Central on the fly. All information locate and retrieve. Registration calls
needed to generate this document is include adding new object and service
present in the service registration types, registering a new service instance,
information, and thus neither the service or deregistering a service that no longer
The WSDL service provider, nor the registry, are required to exists, has been modified, or has moved.
description is auto- store these documents. WSDL is an Locate calls return lists of service instances
generated at MOBY emerging standard, and was chosen that meet certain specifications. Retrieve
Central
because of its modular structure, which calls allow more general browsing of the
simplifies automated document creation, registry contents. In general, the API calls
as well as its use of another emerging return simple XML documents in
standard, XSD, to describe the structure response. An overview of the current API
of the input and output objects in a for MOBY Central is presented in Table
Simple client programs hierarchical manner. 1. A simple Client could be built using
require only two
registry API calls
At the present time, BioMOBY uses its only two of these methods –
own registry system rather than the draft locateServiceByInput, and retrieveService
UDDI12 registry system. There were – and still exploit the power of the
several considerations that led to this MOBY Central registry model. Such a
decision early in the project’s inception: client could iteratively locate services that
take in-hand data as input, transact the
• The UDDI API had a restrictive service, and use the result as the input to
licence agreement at the time the the next locateServiceByInput query,
BioMOBY project was initiated. leading to a ‘surfing’-style user
interaction.
• The UDDI registry system is business-
centric and complex. BioMOBY Services
A key consideration in the design of the
Ease of deployment • Freedom to explore alternative registry BioMOBY system was ease of
architectures is a useful endeavour and deployment. This led to several design
may lead to a more powerful or more decisions that shifted the burden of

HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002 335


Wilkinson and Links

Table 1: Overview of the current API for MOBY-Central. The API consists of three types of
actions: registration, location/search and retrieval

API method Use

RegisterObject Register a new Object type and its relationships


DeregisterObject Deregister or deprecate an existing Object type
registerServiceType Register a new Service type and its relationships
deregisterServiceType Deregister or deprecate an existing Service type
registerNamespace Register a new name space
registerService Register a new instance of a MOBY Service
registerServiceWSDL Register a new instance of a MOBY Service based on a WSDL service description (not yet
implemented)
deregisterService Deregister an instance of a MOBY Service
locateServiceByType Locate all services which perform a particular type of service (from hierarchy; plus child-
types)
locateServiceByInput Locate all services which accept this Class type(s) as input

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


locateServiceByOutput Locate all services which return this Class type as output
retrieveService Retrieve the WSDL specification for a particular named service. This is called immediately
prior to a service transaction.
retrieveServiceProviders Retrieve list of all registered service providers
retrieveServiceNames Retrieve names of all registered services
retrieveServiceTypes Retrieve all registered Service types
retrieveObjectNames Retrieve all registered Objects
retrieveNamespaces Retrieve all registered name spaces
retrieveObject Retrieve the XSD schema for a particular MOBY Class

API = Application Program Interface


WSDL = Web Services Description Language
XSD = XML Schema Description

implementing and interacting with web providers should implement Services that
services onto either MOBY Central or generate not only the lightweight
the Client program. The use of SOAP,14 Objects, but also the rich Objects and/or
rather than common object request Objects from external representation
broker architecture (CORBA)15 is one schemas, in order to provide clients with
such decision. Modules supporting SOAP all of the available information if they
transactions are freely available for most require it.
Instructions for common programming languages, and Deploying a MOBY Service requires
deploying a service deployment of SOAP services can often the following simple steps:
be achieved with only a few lines of code
in addition to the underlying script that • Set up SOAP server on your web site,
generates service output. Auto-generation with subroutines for each Service you
of WSDL16 documents by MOBY wish to provide.
Central (as described below) alleviates the
need for these somewhat complex • Register these Services with MOBY
documents to be created by the service Central
host for each of their services, and
removes the burden of updating these as • Handle incoming requests
MOBY Objects and Services evolve.
Defined input/output Object structures, – parse Objects from the arguments
the Object hierarchy, and the overall passed to your Service,
simplicity of most MOBY objects were
Simplicity requires
additional services
also intended to simplify service creation. – validate these Objects (and/or
The trade-off to having this simplicity, Namespaces) as correct input to
however, is in the number of services that your service,
must be created to achieve comprehensive
coverage of the available data; service – extract the necessary data from each

336 HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002


BioMOBY

Object (use hierarchy to discover problem BioMOBY is focused on. In


internal object data), some cases, their approach is similar to
BioMOBY, while in other cases our
– perform computation on this data. approach is novel. Below is a brief
description of several select projects
• Generate output dealing with biological data
interoperability, and a comparison of their
– assemble output according to the pursuits with the approach that we have
XSD of the output Class(es), adopted.

– wrap output objects in a MOBY Other biological object model


envelope indicating the URL of the systems
service provider, Numerous projects are building rich

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


XML representations of biological data. It
– return it. may then seem redundant that
BioMOBY builds its own data models.
Example code for deployed MOBY The response to this is twofold: (1)
Re-coding existing CGI Services is available on the BioMOBY BioMOBY function requires ‘minimalist’
services is straight web site.17 Re-deployment of existing objects, and this necessitates the (relatively
forward
CGI services can usually be achieved with simple) task of creating our own
minimal coding. lightweight object models and (2)
It is critical to note that the BioMOBY BioMOBY is more concerned with data
system currently does not enforce a one to discovery and transport than
one correlation between the input standardisation or representation.
MOBY facilitates the namespace and the output object. For BioMOBY is capable of, and intends to,
discovery of objects
from other modelling
example, passing a PubMed/ID to a pass rich objects from other data
systems service may not return the associated modelling standards. It is already the case
Citation object. It is up to the service that complex data such as MAGE-ML,
provider what output he or she wishes to and plain-text data such as BLAST18
generate from the input object. A service reports, are registered in MOBY Central
provider given a PubMed/ID may, for and are passed verbatim by enclosing them
example, return the Sequence objects in a MOBY Triple. Non-native objects
described in that publication. We believe are not part of the object hierarchy per se,
this flexibility is crucial to successful but are passed as the payload of an object
interconnection of the vastly different that inherits directly from the base
biological data types, while registration of Object, having only a MOBY Triple
Input and output object
input, output, service type and a human- indicating the name space, ID and type of
types may be readable description of the service assists foreign object contained in the payload.
tangentially related users in navigating through and In this way, the BioMOBY project
interpreting these tangential relationships. enhances the efforts of these other data
As the BioMOBY project develops, a modelling projects by providing a
new focus on developing richer mechanism for their objects to be
ontologies for service description will discovered and transported between both
emerge in order to enable fully automated MOBY and non-MOBY Services.
navigation of these indirect relationships
between disparate pieces of data. I3C
At the inception of the BioMOBY
COMPARISON TO OTHER project, there was little publicly accessible
PROPOSALS AND information about the activities
PROTOCOLS happening within the Interoperable
Several projects have recently emerged Informatics Infrastructure Consortium
that deal with various aspects of the (I3C19 ), though it was known that

HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002 337


Wilkinson and Links

intellectual property issues were a addition, it is difficult to anticipate all


concern. Thus, it seemed unlikely that the desirable methods an end-user might
resulting I3C proposal would be Open want to invoke – the continually
Source. The desire for an Open Source increasing complexity of the BioPerl21
alternative, together with the need to API shows this to be the case. BioMOBY
move forward in a timely way, led us to avoids the considerable effort of creating
initiate the BioMOBY project in parallel of a new biological manipulation library
with I3C. Since that time, the I3C has by passing only data; however it suffers
become considerably more open; the expense of having ‘flat’ objects whose
however, membership fees are still data must be interconnected through calls
imposed. Nevertheless there is now a fair to the registry or by cross-references.
amount of crossover and discussion Experiments have been planned where
between I3C and BioMOBY developers. serialised rich caBIO objects will be

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


It is hoped that, in the future, I3C passed in MOBY payloads. If this is
software will be developed as truly Open successful, then the power of both
Source, which will allow even closer ties approaches might be realised.
between the two projects.
Like BioMOBY, the current I3C SRS
architecture is SOAP/web services-based; The Sequence Retrieval System (SRS)22
however, I3C has chosen to use UDDI as is an example of a unifying solution to the
its registry system, while BioMOBY is interoperability problem. The major
experimenting with its own registry downside to the use of an SRS-type
MOBY discourages data architecture(s) to ensure that it has the solution to the bioinformatics problem is
centralisation freedom to explore more biologically that it is monolithic. In order to
intuitive data discovery methodologies. In comprehensively bring data and services
addition, the I3C web services model together you would require every tool
currently lacks object and service type and database to be installed and/or
ontologies, which are critical to the configured into one SRS installation. The
functionality of BioMOBY. BioMOBY system aims to reduce this
type of redundancy, and the
caBIO administrative load of maintaining such a
caBIO8 (Cancer Bioinformatics system, by keeping the data distributed
Infrastructure Objects) is a standards- and having individual database
based set of genomic components administrators create/modify their own
developed in cooperation with the data and service registrations. Thus, data
National Cancer Institute Center for under BioMOBY is updated at the
Bioinformatics (NCICB).20 caBIO objects moment that the remote database
are very rich, but are primarily genome- becomes updated, and no re-indexing or
centric; this reflects the questions being mirroring of remote data sources is
studied more than any inherent limitation required. Finally, while the centralisation
in the system itself. The objects are passed of services under SRS allows only the
from server to client in various ways, queries that have been configured on that
including SOAP, using UDDI as a particular installation, BioMOBY can
registry system. support true dynamic ad hoc queries.
One of the most significant differences
between caBIO and BioMOBY is that myGrid
native BioMOBY objects are data-only, myGrid23 is one of a number of multi-
MOBY objects are data-
only
while caBIO objects allow method calls to organisational projects funded by the
obtain pieces of internal or related data. Engineering and Physical Sciences
This approach simplifies client Research Council24 as part of the United
development, but is more difficult to Kingdom Research Council’s e-Science
implement on the Server side. In programme.25 It intends to build on and

338 HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002


BioMOBY

extend the Grid framework of distributed simple prototype MOBY Client;26


computing through the use of a web- moreover, this workflow transparently
services architecture and discovery system. requests services hosted by various
With respect to data discovery and service institutions in Canada and the USA.
execution methodologies, myGrid and
MyGrid and BioMOBY BioMOBY are nearly identical in their OUTLINE OF A MOBY
share many similarities approach, and will likely become more INTERACTION
so. Both projects have similar ideas for Overview
representing knowledge of data and Figure 1 presents an overview of a
services in ontologies, and passing data- MOBY interaction. There are three
only objects through web services. phases: Registration, Query and
myGrid, however, is more ambitious in Transaction.
its inclusion of bench-scientist’s tools such

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


as workflow, personalised data • Registration: this is a one-time only
repositories, provenance and update event, where a Service host registers
notification; such issues are not directly the availability of a new service with
within the scope of the BioMOBY MOBY Central (deregistration of
project, but are seen as ‘client-side’ tools services is also possible). The Service
that could be built in various ways upon a host then waits for incoming service
BioMOBY data discovery and execution requests.
infrastructure. Nevertheless, the two
projects have surprisingly similar visions, • Query: this phase may be quite
and both will benefit from the expanding complex. The client may request a
collaborations that have already begun search for available services based on
between the participants. In particular, their input, output, service type,
plans are already underway for the two authority or various combinations of
projects to adopt the same ontology for these. What is returned from each
representation of the service hierarchy, query is a list of authorities providing
and myGrid will be able to recognise and such services, the service type and a
pass BioMOBY objects. human readable description of the
service. At any time, the client may
Current status select one of these services and request
A number of services are already available a WSDL document describing how to
through the MOBY Central Registry transact the service. Additional methods
which combine to create useful data are currently being added to the
discovery and retrieval workflows. As an MOBY Central API to assist in
example, starting from a keyword it is mapping multi-service paths to desired
possible to retrieve all Genbank entries output Class types.
that share homology to Arabidopsis
sequences annotated to that keyword. A • Transaction: having interpreted the
keyword can be used to retrieve Gene WSDL document, the client then sends
Ontology Term objects, from which the Service the appropriate MOBY
Example MOBY corresponding Arabidopsis sequences may Object(s) (these are generally going to
services exist and are be retrieved. These sequences can be be the in-hand Objects resulting from a
publically available searched against Genbank and the prior Service transaction) and retrieves
Genbank BLAST reports may be parsed the response.
to obtain the Genbank GI numbers for all
hits. Genbank GI numbers may then be During the Query phase, MOBY
used to retrieve the corresponding Central can be made aware of the object
Genbank records as sequence objects. All and service type ontologies, and
of these transactions may be done en masse automatically return appropriate responses
through single mouse clicks using the based on inheritance relationships of the

HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002 339


Wilkinson and Links

amplify the range of next-step Services


MOBY MOBY MOBY MOBY
Client Central Service 1 Service 2 presented to the user after each Service
transaction. This is achieved by adding
Register Service

OK
the cross-referencing Triples as additional
Register Service input object types to the MOBY-Central
OK query phase. What is returned is thus a list
DATA
of services available to work not only on
Data Object Type

Available Services
the requested data object, but also any of
Service Types the cross-references. We also anticipate
**
Selected Service that services will be constructed whose
Service Def. Request
sole purpose is to generate cross-
WSDL

Data Object
references as an additional way to ensure
Data Object
that the user is presented with all possible

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


DATA services that exist for the data in-hand.
Data Type(s)

Available Services
Service Types Collaborations and data
** Selected Service
Service Def. Request
security
WSDL
The BioMOBY architecture facilitates
Data Object collaborative data-sharing in a local area
Data Object network (LAN) environment or a wider
DATA
network. Local installations of MOBY
Data Type(s)
Available Services
Central may exist on which only locally
Service Types available data services are registered.
Client programs would be configured to
check the local MOBY Central in
Figure 1: Progression of a typical BioMOBY session. Messages with a addition to a public server in order to
light grey background represent the Registration phase, those with a present users with both local and publicly
white background represent the Query phase, and a dark grey available data. Alternately, data may be
background represents the Transaction phase. Apart from the initial securely shared by an authentication
seeding of data into the system, user interaction is required only once
system running over an HTTPS
per transaction ( ) in order to select the service they desire to transact.
Other transactions are done between Client and MOBY-Central, or connection to the server. In the latter
Client and Service provider case, HTTPS connections could also be
made to the individual secure services
during the transaction phase.
MOBY Central can be input object type. For example, a request
deployed in a LAN for services that are able to use the Class CONCLUSION
environment ‘Sequence’ as input, could additionally BioMOBY is an Open Source research
stipulate the inclusion of services that are project with the aim of exploring novel
able to use any parent object (eg ways of implementing a web-services
VirtualSequence, since Sequence ISA registry to facilitate the discovery and
VirtualSequence). sharing of biological data. The approach
extends the current web-services
Service ‘amplification’ paradigm by implementing an innovative
Where appropriate, a Service may pass as registry model that allows search and
many cross-references as it is aware of in retrieval based on object and service
Cross-references its response CRIB. Cross-references allow hierarchies. In addition, by passing
amplify the number of us to branch out from our flow of minimalist data objects whenever possible
available next-steps exploration to obtain related information as both the input and output of a query
about the Object in hand. Further, we the BioMOBY system facilitates the
expect that many Client programs will ‘wandering’ through large data sets in a
take advantage of these cross-references to manner similar to the thought process

340 HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002


BioMOBY

biologists use when approaching their heterogenous bioinformatics resources’,


problems. Bioinformatics, Vol. 17(1), pp. 83–94.
BioMOBY is a product of the Model 6. URL: http://www.opensource.org/docs/
Organism database community, and definition_plain.php
enjoys a great deal of support from these 7. URL: http://www.omg.org
data hosts, who are already in the process 8. URL: http://ncicb.nci.nih.gov/NCICB/
of deploying prototype MOBY Services. core/caBIO
In addition, the project has recently won 9. URL: http://www.mged.org/Workgroups/
funding from both Canadian and MAGE/mage-ml.html
American public funding agencies, 10. URL: http://www.w3.org/XML/Schema
ensuring that the project will expand and 11. URL: http://www.i3c.org/workgroup/
develop rapidly. Moreover, the growing technical_architecture/resources/
lsid0/lsid-tawg-5-03-2002a.pdf
collaboration between BioMOBY and

Downloaded from https://academic.oup.com/bib/article/3/4/331/196070 by guest on 13 June 2023


European initiatives such as myGrid will 12. URL: http://www.uddi.org
help to ensure global applicability and 13. URL: http://www.oasis-open.org/cover/
uddi.html
participation in the project.
14. URL: http://www.w3.org/TR/
soap12-part1/
Acknowledgments
Numerous BioMOBY project members have 15. URL: http://www.omg.org/gettingstarted/
contributed to this work through discussions and corbafaq.htm
software development. 16. URL: http://www.w3.org/TR/2001/
NOTE-wsdl-20010315
#National Research Council of Canada, 17. URL: http://www.biomoby.org
Plant Biotechnology Institute 2002. 18. URL: http://www.ncbi.nlm.nih.gov/
BLAST/
References 19. URL: http://www.i3c.org
1. URL: http://www.acedb.org 20. URL: http://ncicb.nci.nih.gov
2. Dowell, R. D., Jokerst, R. M., Day A. et al. 21. Stajich, J. E., Block, D., Boulez, K. et al.
(2001), ‘The distributed annotation system’, (2002), ‘The Bioperl Toolkit: Perl modules for
BMC Bioinformatics, Vol. 2(7). the life sciences’, Genome Res. (in press).
3. URL: http://www.biodas.org 22. URL: http://www.lionbioscience.com/
solutions/srs/srs-7
4. Siepel, A. C., Tolopko, A. N., Farmer, A. D.
et al. (2001), ‘An integration platform for 23. URL: http://www.mygrid.org.uk
heterogeneous bioinformatics software
24. URL: http://www.epsrc.ac.uk
components’, IBM Syst. J., Vol. 40(2), pp.
570–591. 25. URL: http://www.research-councils.ac.uk/
escience
5. Siepel, A., Farmer, A., Tolopko, A. et al.
(2001), ‘ISYS: a decentralized, component- 26. URL: http://mobycentral.cbr.nrc.ca/cgi-bin/
based approach to the integration of MOBY-Client.cgi

HENRY STEWART PUBLICATIONS 1467-5463. B R I E F I N G S I N B I O I N F O R M A T I C S . VOL 3. NO 4. 331–341. DECEMBER 2002 341

You might also like