You are on page 1of 13

Semi-Automatic Composition of Ontologies for

ASKANCE Grid Workflows

Muhammad Junaid Malik and Thomas Fahringer

Institute of Computer Science, University of Innsbruck,


Technikerstraße 21a, A-6020 Innsbruck, Austria
{malik,tf}@dps.uibk.ac.at

Abstract. Automatic workflow composition with the help of ontologies


has been addressed by numerous researchers in the past. While ontologies
are very useful for automatic and semiautomatic workflow composition,
ontology creation itself remains a very difficult and important task.
In this paper we present a novel technique to synthesize ontologies for the
Abstract Grid Workflow Language (AGWL). AGWL has been used for
years to create Grid workflow applications at a high level of abstraction.
In order to semi-automatically generate ontologies we use an AGWL
Ontology (AGWO - an ontological description of the AGWL language),
structural information of one or several input workflows of a given ap-
plication domain, and semantic enrichment of the structural information
with the help of the user. Experiments demonstrate ...

1 Introduction

The Grid [1] has eliminated the need for dedicated computational facilities and
made it possible for the scientists to leverage the computational power of re-
sources located at geographically remote distances.
The Grid workflow programming model [2] has been highly successful in the last
decade in facilitating the composition of medium to large-scale applications from
existing basic activities defining a sequence of tasks to manage computational
science processes. A Grid workflow application represents a collection of com-
putational tasks (activities) interconnected in a directed graph through control
and data flow dependencies that are suitable for execution on the Grid. The
complexity of the workflows has increased over the years with the increasing
complexity of scientific applications.
Domain scientists create Grid workflows with the help of Grid workflow composi-
tion tools and execute them with runtime environments such as available like [3],
[9], [10], [11], [12], and [13]. Although these tools have greatly simplified the ef-
fort of a workflow designer through features such as auto-completion and design
suggestions [14], [13] for workflows. Nevertheless the manual workflow creation is
still a very demanding and tedious task. It not only requires comprehensive ap-
plication domain knowledge but also sufficient skills of the incorporated workflow
composition tool.
2 Muhammad Junaid, Thomas Fahringer

A Grid workflow designer is responsible for creating efficient Grid workflows,


so that the execution of the activities of the workflow takes place in a certain
predefined order and to satisfy all the input and output dependencies of these
activities. Composing a Grid workflow is so demanding and complex that often
the domain scientists have to jointly work with workflow composition tool experts
to create a usable workflow. Commonly there exist a large variety of workflows or
sub-workflows for each different domain targeting specific scientific or technical
experiments. The workflow designer must know all of the details about these
workflows and their corresponding activities. The traditional workflow design
approach relies greatly on the capabilities of a human workflow composer and
thus is prone to errors and very time consuming.
Numerous research projects tried to automatically generate workflows which
include [3], [12], [13], and [19]. Most of these approaches are based on the analysis
of the input data and functional description of the workflow activities with the
help of domain ontologies [20]. Successful approaches rely on the completeness
of domain ontologies (a formal representation of knowledge as a set of concepts
within a given application domain) which commonly must be manually created.
Based on our experience with manual ontology creation for Grid workflows
used in Askalon, we found that there are two major problems when a scientist
tries to create an ontology by hand. i) Manual ontology creation not only requires
extensive domain knowledge and excellent ontology development skills but is
also very time consuming. ii) Changes of the domain knowledge require the
domain ontology to be updated as well. As an example, consider a workflow that
supports movie rendering. A new activity could be introduced that performs the
conversion of movie into a new format with supports better movie quality. Such
changes to the domain knowledge must be reflected in the domain ontology as
well.
Semi-automated ontology synthesis not only expedites the process of ontol-
ogy creation but it also supports automatic updates of ontologies. To address
the above mentioned problems, this paper proposes a new method for semi-
automatic ontology synthesis for Grid workflows specified with AGWL. Although
the system is not fully automatic and requires user interaction, most of the on-
tology structure (intermediate ontology) is created automatically.
Our approach for ontology creation is performed in three phases. In the first
step one or several AGWL compliant input workflows of the same application
domain are parsed and analyzed for the information available in the workflows.
We extract mainly information about activities and their input and output ports
which we denote to as concepts in the following. The next step associates con-
cepts (for instance, all data ports are associated with an AGWL data class)
with the AGWL ontology (an ontological description of the AGWL language)
which results in an intermediate ontology. Section-3 gives more details about the
AGWL ontology. Thereafter, the user has to provide semantic information for all
concepts which is used to derive the final ontology associated with the domain
of the input workflows. The final ontology can then be used to automatically
created workflows of the given domain.
Semi-Automatic Composition of Ontologies for ASKANCE Grid Workflows 3

The paper is organized as follows; section-2 gives the description of the ter-
minology used in the paper, which is followed by section-3 describing the AGWL
Ontology, followed by section-4 which provides the details of the implementa-
tion of the proposed system. Section-5 describes the different algorithm used to
perform the core of ontology synthesis operations. Section-6 gives an account of
the experiments performed using the proposed system and the results. Section-7
provides the related work targeted at ontology synthesis followed by the section-8
which provides the concluding remarks and potential future research prospective
for the system.

2 Prerequisite

An account of the terminology used in this paper is given below to better un-
derstand the process of the ontology creation with the proposed ontology syn-
thesis system. We use a real-world movie rendering algorithm workflow known
as Povray [7] as an example workflow. Figure-2 shows the flowchart of the appli-
cation. The Povray workflow is composed of three independent activities. The
first activity takes scene data as input and initializes the process by creating
multiple ini files for parallel execution of the Render activity. Render activity
takes this input from first activity and generates images for individual frames of
the movie. It then packs the frames into a tarball, this is consumed by the third
activity Convert which combines these individual frames and creates a movie
using these frames.
A Grid workflow is a collection of computa-
tional tasks (activities) orchestrated in a prede-
fined order to achieve the required execution goals input
on the Grid. Apart from activities, the Grid work- Scene
data start
flows also have other elements as well which include
control-flow, data-flow and data-ports etc. Each el-
ement contains a number of attributes which ex-
plain different properties of the workflow for ex- Initialize

ample an activity contains name, type, and func-


tion attributes. Askalon uses AGWL to represent Parallel region
workflows, an AGWL workflow comprises of four
Render
parts Workflow Header, Workflow Input, Workflow
Output, and the Workflow Body. The Workflow
Header contains information about the domain,
name, author, and version of the workflow. Work- Convert
flow Input and Workflow Output parts of the
Output
workflow describe the input data and output data
of the workflow respectively. Workflow body con-
tains the AGWL description of the workflow activ-
ities. Each workflow activity contains data about
the name, type, functions of the activity and its Fig. 1. Povray movie render-
corresponding data inports and outports specify- ing workflow
4 Muhammad Junaid, Thomas Fahringer

ing the input data required by the activity and output data produced by the
activity. An OWL ontology is formal representation of the domain knowledge
using the Web Ontology Language OWL [17]. Ontology is said to be valid if it
contains sufficient domain knowledge to express all the domain concepts. This
is also referred to as completeness of the ontology.
To synthesize ontologies, the proposed system extracts and processes a lot of
information, two databases Lexicon database LDB and the Ontology database
ODB are used to store this information. The LDB database contains the lex-
icon of the semantic concepts, whereas the ODB contains the workflow data
and intermediate ontology structure etc created during the ontology synthesis
process.

3 Abstract Grid Workflow Ontology, AGWO

Abstract Grid Workflow Language [4] (AGWL) is the UML based workflow
language used to create Grid Workflows for Askalon Grid Development environ-
ment. AGWL provides a rich set of constructs to created grid workflows for any
type of scientific applications. AGWL has already been put to use for creating
workflows from diverse scientific domains, like material sciences, meteorology, as-
trology, mathematics and computer graphics. AGWL provides different activity
type constructs to represent computational tasks of different types, control and
data flows to specify execution flow and data flow during the execution of the
workflow. AGWL ontology is the ontological representation of the basic AGWL
constructs in the form of and OWL ontology. These constructs encoded as OWL
concepts are inherited from the &OWL;Thing concept (the base concepts for all
OWL ontologies). These concepts are used as parent concepts for the domain
specific concepts when creating a new ontology. AGWO contains two basic types
of concepts that provide foundations for the extended domain concepts. These
basic concepts present in the AGWO are Function and Data. Each of these
concepts may have one or more associated properties or subconcepts.
An AGWL activity and AGWL data are represented using the concepts
Function and Data respectively in the ontology. Further semantic information
to these concepts is added by specifying owl:onProperty, and by adding their
parent concepts using owl:SubclassOf construct. The object-Properties defined
for the Function concept include hasInput and hasOutput. Whereas the concept
Data has object-Properties hasRepresentation, isInputTo, isOutputOf, isPartOf,
and hasPart etc. AGWO thus contains two distinct types of concepts which are
extended for each new domain ontology. The Function and Data base concepts
are also associated together using hasInput and hasOuput object-Properties.
Figure-2 shows the definition of these ontological concepts for AGWO.

4 Methodology
Semi-Automatic Composition of Ontologies for ASKANCE Grid Workflows 5

The proposed ontology synthesis system is


built as part of ASKALON [3] grid work-
flow development environment. The process
of ontology synthesis is performed in three
phases. The first phase is the analysis and
parsing of the workflow, followed by creation
of intermediate ontology structure based on [Domain] [Domain]

the information collected in parsing phase


and the semantic enrichment of the interme- Fig. 2. AGWL ontology (AGWO)
diate ontology. A block-diagram of the sys- Function and Data concepts
tem is shown in Figure-3.

Input

AGWL Ontology/
AGWL Base Lexicon User Input
workflows Ontology Database

Workflow Parser Ontology Structure Ontology Semantic


Analyzer builder Updater
AGWL Concept Semantic Concept
AGWL Parser Target
extractor builder Ontology
AGWL to OWL intermediate
rmedi
Ontology
Canonicalizer Output
Mapper Ontology Constructor

Fig. 3. Block diagram of the ontology synthesis system.

4.1 Workflow Parsing and Canonicalization


The first component of the system is the workflow parser. It takes one or a set
of workflows belonging to a specific domain provided by the user and performs
the canonicalization operation on these workflows. Figure-5(a) and 5(b) shows
AGWL representation of an activity and its respective canonical form. After
converting a workflow into canonical format the data in the resulting canonical
relations can be easily analyzed and processed using standard SQL commands.
As the proposed system is used to synthesize ontologies for AGWL [4] based
workflows therefore the parser is specifically designed to parse the AGWL work-
flows. AGWL workflows are XML Documents comprising of the four sections as
shown in the Figure-4.
The workflow parser parses and analyzes the different parts of the workflow
and extracts information to convert the workflow into canonical format. The
initial parsing of the workflow yields datasets contained in the workflow (e.g.
an activity and data inport datasets looks like A, name, type, function and I,
name, type, category, source respectively). As all the workflows elements are
not structurally identical, and also the attributes of similar workflow elements
may vary in number so these datasets need to be normalized before they can be
canonicalized and stored in the database. For example some workflow headers
only contain the name attribute and do not contain information like domain,
6 Muhammad Junaid, Thomas Fahringer

<?xml version="1.0" encoding


encoding="UTF-8"?>
<cgwd author="Peter" domain="POVRAY" name="PovrayWorkflow" version=""> {Workflow Header}
<cgwdInput>
<dataIn category="" name="povFile" source="gsiftp://karwendel.dps.uibk.ac.at//home/kassian/pov/scherk.pov" type="agwl:file"/>
<!--more data ports -->
</cgwdInput> {Workflow Input}
<cgwdBody>
<parallelFor name="ParallelFor_1">
<activity function="" name="Render2" type="Povray:Render2">
<dataIns><dataIn category="" name="iniFile" source="PovrayWorkflow/iniFile" type="agwl:file"/></dataIns>
<dataOuts><dataOut category="" name="frameTarball" saveto="" type="agwl:file"/></dataOuts>
</activity>
</parallelFor>
<activity function="" name="Convert" type="Povray:Convert">
<dataIns><dataIn category="" name="frameTarballs" source="ParallelFor_1/frameTarballs" type="agwl:collection"/></dataIns>
<dataOuts><dataOut category="" name="outFile" saveto="" type="agwl:file"/></dataOuts>
</activity>
{Workflow Body}
</cgwdBody>
<cgwdOutput>
<dataOut category="" name="finalMovie" saveto="gsiftp://karwendel.dps.uibk.ac.at/~/mymovie.gif" source="Convert/outFile" type="agwl:file"/>
<!--Data representation ommited -->
{Workflow Output}
</cgwdOutput>
</cgwd>

Fig. 4. AGWL representation of Povray workflow

author, or version. Similarly activities in the workflow may also have varying
number of attributes. Normalization is the process of adding missing attributes
to the workflow elements so that all the elements become identical in terms of
the number and types of attributes.
After the normalization process, the next step is to add information to these
attributes if it is missing. The missing information is collected from existing
data present in the workflow lexicon database, and if no information is found in
the database then with the help of user input. This process is known as canon-
icalization. In the canonicalization process, all the attributes of the extracted
workflow structures are analyzed. When an attribute without an attribute value
is found the user is asked to provide value for this attribute. The process contin-
ues until all the attributes values are filled in these datasets. This canonicalized
information is then stored in the ontology database ODB. Database schema of
the canonical relational database is shown in Figure-4.1. The four database re-
lations in the schema are used to store the required information. Workflow data
is stored in the workflow relation, where as the activity related data is store in
the activity relation. The other two relations act port and wf port are used to
store dataports for activities and workflows respectively.

4.2 Structural foundations of the Ontology


After a canonical relational database from the candidate workflow is created,
the next step is to create the intermediate structure of the ontology. This is
performed by retrieving the canonical information from the Ontology database
ODB, transforming that information into unique class concepts by representing
the canonical data in OWL format. Each newly created concept is assigned a
parent data or function class concept in the AGWO. This process is performed
for all the workflow components i.e. activities and dataports, which results in an
intermediate ontology. At this point the structure of the ontology is similar to
as shown in Figure-6(a). We have just used one activity Convert to create this
intermediate ontology.
Semi-Automatic Composition of Ontologies for ASKANCE Grid Workflows 7

<activity function=“ “ name=“Convert" type=“ ">


<dataIns>
<dataIn category="GenericData" name=“MovieYuv“
source=“gsiftp://" type=“">
<dataRepresentation>
<storageType>FileSystem</storageType>
<contentType>File</contentType>
<archiveType>none</archiveType>
<cardinality>single</cardinality>
</dataRepresentation>
</dataIn>
</dataIns>
<dataOuts>
<dataOut category=“" name=“MovieAvi“
saveto=“Movie.avi" type="agwl:file">
Activity
<dataRepresentation>
ID Sid wf-id Name Type Function
<storageType>FileSystem</storageType> 1 xxxxxxxx 1 Convert Activity Conversion
<contentType>File</contentType>
Act_Port
<archiveType>tgz</archiveType> Wf-id sid a-id Parent Element Name Category Type Source Saveto
<cardinality>single</cardinality> 1 xxxxxxxx 1 1 DataIn MovieYuv GenericData agwl:file gsiftp:// NULL
1 xxxxxxxx 2 1 DataOut MovieAvi GenericData agwl:file NULL gsiftp://
</dataRepresentation>
</dataOut> Act_Port_Data_representation
storge content archive cardinality
</dataOuts> FileSystem File None Single
</activity> FileSystem File tgz Single

(a) Activity data (b) Canonical form of the activity in database

a_id Element sid id name domain version author * wf_port


sid parent-id
wfid Element
workflow
1
saveto archive
sid parent-id
1
source cardinality
saveto archive
category content
1
* source cardinality
name storage activity
category content

act_port * sid id name wf-id type Function


name storage

Fig. 5. Database schema for the canonicalized information

(a) Structure of intermediate ontology (b) Structure of the target ontology

Fig. 6.
8 Muhammad Junaid, Thomas Fahringer

4.3 Semantic Association and Meaning Specification

The next step in the process is semantic enrichment of the intermediate ontology.
Both the former steps (parsing and structure creation) are carried out automat-
ically except for the canonicalization process which may request user input if
the lexicon database does not contain a matching record. This step requires
much more user interaction in comparison to the steps mentioned earlier. In this
step semantic information is acquired from the user to semantically enrich the
existing concepts in the intermediate ontology.
In this phase every concept from the intermediate ontology is presented to
user and the user is asked to classify the concept by giving the meaningful
concept name and by associating it with the appropriate parent. User can also
create additional concepts if a concept cannot be associated to any of the existing
parent concepts. As discussed earlier there are two basic type of concepts, i.e.
Data and Functions, similarly an element from a workflow can be data or an
activity, so the user specifies appropriate meanings representing exactly what an
element is. In case of data a name is given to the data element which specifies
the contents of the element, i.e. data meanings and in case of an activity a
meaningful name is given to the activity which describes the function performed
by this activity. During this interactive process all the information given by the
user is stored in the lexicon database. So for further workflow components if
a user has to specify meanings for an activity he can choose from among the
existing meanings from the lexicon.

4.4 Formation and Merger of Semantic / Structural Constructs

Upon completion of this semantic association the intermediate and semantic


representation of the ontology is combined to form the target ontology which
describes the domain knowledge as represented by the workflow or activities used
for creation of the ontology. The process of merging the structure and semantic
concepts is performed partially with the help of user for atomic activities and
nested activities and other complex activity types, whereas merging of structure
and semantic concepts for data elements is performed automatically. Figure-6(b)
shows a simple ontology with semantically enriched Povray activity type Convert
information.

5 Algorithm

The algorithm1 provides the pseudo-code representation of the ontology synthe-


sis process. The procedure synthesizeOntology() takes two parameters AGWO
and AGWL[]. AGWO is AGWL Ontology whereas AGWL[] is a set of input
AGWL workflows or AGWL activities. The procedure uses two databases LDB
and ODB for information storage and produces the Ontology. Two subproce-
dures normalize() and canonicalize() are invoked by the main program appear-
ing on line 20 and 21 of the algorithm, the objective of these functions calls are
Semi-Automatic Composition of Ontologies for ASKANCE Grid Workflows 9

pretty straight forward and have already been explained in the section-4.1. The
other functions like getWFInports(), getActivities(), getWFoutports() etc. are
the information extraction functions, and their functionality is mentioned in the
comment line appearing next to function call. The functions OWL Structure()
perform the task of creating OWL structure without any semantic information.
The OWL encoded structures are used as placeholders in the intermediate ontol-
ogy. The function createIntermediateOntology() creates the intermediate Ontol-
ogy representation using these OWL structures as placeholders for actual con-
cepts. Once the intermediate Ontology is created each one of the OWL structures
is enriched with semantic information using makeConcept() function calls. The
makeConcept() function requests user input and converts the OWL structures
into meaningful ontological concepts for the domain. The resulting conceptual
representation of these OWL structures is then combined together and added to
the target ontology O using the add Concept function call.

6 Experiments

7 Related work

Use of ontologies to leverage knowledge in the field of semantic web is well


known. Ontologies have widely been used in many scientific domains and have
given promising results. As a acknowledgement to their importance and widely
acclaimed reputation, automatic ontology synthesis has become a focus of many
researchers. Use of ontologies in the context of automatic workflow composition
has driven the scientists in this domain towards the development of automatic
ontology synthesis techniques. Several ontology synthesis approaches have been
adopted to automate the process of ontology synthesis. Our approach of ontology
synthesis is based on a three step process of AGWL parsing and analysis, concept
extraction and semantic enrichment of these concepts.
A similar approach has been used to generate ontologies in the [22] OBO
Edit tool. The process of ontology creation in OBO DOG4DAG plugin also uses
a three step process for ontology creation. These three steps are extracting the
terms or keywords for ontologies from the source text, creating definitions for
these terms using online information source and then creating relationship be-
tween the selected terms, which now represent the ontology concepts. The OBO
framework load concepts from existing ontologies only which can act as potential
parents for the newly discovered concepts. In our approach, besides the use of
lexicon database for searching existing concepts, we extend this functionality by
providing the user a chance to create new concepts on-the-fly if no suitable par-
ent concepts have been found in the database. This makes the target ontology
to be applicable in a broader perspective rather than sticking to defined set of
pre-known concepts. In addition to this, in contrast to the approach of predict-
ing a relationship, our system automatically establishes relations based on the
occurrences of such concepts in the reference ontology AGWO. This reduces the
10 Muhammad Junaid, Thomas Fahringer

Algorithm 1: Ontology Synthesizer synthesizeOntology(AGWO,AGWL[])


Input: AGW O, /* Abstract Grid Workflow Ontology */
AGW L[ ] /* AGWL based workflows or workflow activities */
Data: ODB, /* Database for information Storage */
LDB /* Lexicon Database */
Result: O /* target Ontology for the Given AGWL workflows/
activities */
1 begin
2 if (isSetof W F (AGW L[ ])) /* Input is a set of workflow */
3 then
4 foreach (agwl in AGW L[ ] ) do
5 W Fin [ ] ← getW F Inports(agwl) /* Workflow inports */
6 AT [ ] ← getActivities(agwl) /* get activity types */
7 W Fout [ ] ← getW F Outports(agwl) /* Workflow outports */
8 ODB ← W Fin /* put inPorts in the database */
9 ODB ← W Fout /* put outPorts in the database */
10 end
11 end
12 else
13 AT [ ] ← AGW L[ ] /* Input is a set of activities */
14 end
15 DomainW F ← user
foreach ( at in AT [ ] ) /* process all activity types */
16 do
17 Aa [ ] ← getAttr(at) /* get activity attributes */
18 Ai [ ] ← getActIP (at) /* get activity Inports */
19 Ao [ ] ← getActOP (at) /* get activity Outports */
20 Na , Ni , No ← N ormalize(at, Aa , Ai , Ao ) ← user, LDB
/* Normalize adds missing attributes */
21 Ca , Ci , Co ← Canonicalize(at, Na , Ni , No ) ← user, LDB
/* Canonicalize adds missing information */
22 storeInDB(Ca , Ci , Co )
23 end
24 foreach ( Ca t in ODB ) /* get AT Canonical data from DB */
25 do
26 atOW L ← OW L structure(Cat ) /* OWL code for AT */
27 aiOW L ← OW L structure(Ci ) /* OWL code for Inport */
28 aoOW L ← OW L structure(Co ) /* OWL code for Outport */
29 storeInDB(atOW L , aiOW L , aoOW L /* store OWL Concepts */
30 end
31 IO ← CreateIntermediateOntology(aiOW L , atOW L , aoOW L ) /* Create
intermediate Ontology */
32 foreach (W Fport in ODB) /* get WF ports from the DB */
33 do
34 W FOW L ← OW L structure(W Fport )
35 IO ← addT oIO(W FOW L )
36 end
/* add semantic meanings to the concepts with the help of
concept vocabulary from LDB and by the input from the user */
37 C atOW L ← make Concept(IO atOW L ) ← user
C aiOW L ← make Concept(IO aiOW L ) ← user
C aoOW L ← make Concept(IO aoOW L ) ← user
C wfOW L ← make Concept(IO W FOW L ) ← user
LDB ← C atOW L , C aiOW L , C aoOW L , C W FOW L
O ← CreateOntologyF ile(Domain.owl) /* Create ontology */
38 O ← add Concept(C atOW L , C aiOW L , C aoOW L , C wfOW L )
/* add concepts to ontology */
39 return OntologyF ileO
40 end
Semi-Automatic Composition of Ontologies for ASKANCE Grid Workflows 11

chance of having wrong relations between two concepts because the AGWO is
proven to be correctly defined for all AGWL related concepts.
Another attempt on automatic ontology synthesis presented by [18] performs
the ontology synthesis operation using the existing ontologies for similar input
information. Again the process of synthesis is based on three steps of concepts ex-
traction, analysis and generation. This approach introduces two additional steps
of validation and evolution. The three major steps which actually create the
ontology are much similar to the three steps proposed in our approach towards
ontology synthesis. But the internal functionality of the analysis and genera-
tion phases of this system relies on existing ontologies; it matches/ align the
newly found concepts to existing concepts in the reference ontologies and makes
the decision about the use of these concept classes. The quality of alignment /
matching based approach becomes limited and dependent upon the availability
and richness of the existing ontologies which is not the case in our where an
ontology can be generated from scratch without the help of any existing domain
ontologies.
There are numerous other attempts on the process of ontology synthesis like
[26], [23], [27] and [21], but these researchers have addressed one or the other as-
pects of the whole ontology synthesis scenario. For example [23] a comprehensive
study of the classification of concepts and matching and alignment techniques
is presented. In [21] we see a detailed analysis of the ontology alignment tech-
niques. Moreover in [24] a tutorial is presented which helps understanding the
ontology learning process from raw text input. One interesting approach much
similar to our approach is XML to OWL transformation [25]; it is based on the
idea of mapping XSD file entities to OWL concepts and classes. The limitation
of the approach is the absence of relation building between different types of
concepts.

8 Conclusion and Future Work


The paper proposed a semi-automated mechanism for synthesizing domain on-
tologies for Grid workflows created using AGWL. The ontologies synthesized
using the system has been put to test and domain workflows have successfully
been created using the ontologies.
At this point the user has to provide a fairly large amount of semantic infor-
mation, resulting in a lot of user interaction. To improve the automation process
and to move the burden from the user side to the system side annotation are to
be introduced which will be added to the workflow design making it more se-
mantically enriched. This approach is expected to help the system extracting the
semantic information automatically by requesting much lesser user interaction.

References
1. Foster, I. and Kesselman, C. 2003. The Grid. Blueprint for a New Computing In-
frastructure.: Blueprint for a New Computing Infrastructure (Elsevier Series in Grid
Computing). Morgan Kaufmann, 2. a. edition.
12 Muhammad Junaid, Thomas Fahringer

2. Taylor, I. J., Deelman, E., and Gannon, D. B. (2006). Workflows for e-Science:
Scientific Workflows for Grids. Springer.
3. Fahringer, T., Jugravu, A., Pllana, S., Prodan, R., Seragiotto, C., and Truong,
H. 2005. ASKALON: a tool set for cluster and Grid Computing: Research Ar-
ticles. Concurr. Comput. : Pract. Exper. 17, 2-4 (Feb. 2005), 143-169. DOI=
http://dx.doi.org/10.1002/cpe.v17:2/4 18th ACM international symposium on High
performance distributed computing (HPDC ’09). ACM, New York, NY, USA, 167-
176. DOI=10.1145/1551609.1551637 http://doi.acm.org/10.1145/1551609.1551637
4. Fahringer, T., Qin, J., and Hainzer, S. 2005. Specification of grid workflow applica-
tions with AGWL: an Abstract Grid Workflow Language. In Proceedings of the Fifth
IEEE international Symposium on Cluster Computing and the Grid (Ccgrid’05) -
Volume 2 - Volume 02 (May 09 - 12, 2005). CCGRID. IEEE Computer Society,
Washington, DC, 676-685.
5. Hongyi Zhou and Jeffrey Skolnick. 2009. Protein structure prediction by pro-Sp3-
TASSER. In Proceedings of the Biophysical journal. Volume 96. Pages 2119-27
6. Vipul Kashyap and Amit Sheth. 1996. Semantic and schematic similar-
ities between database objects: a context-based approach. The VLDB
Journal 5, 4 (December 1996), 276-304. DOI=10.1007/s007780050029
http://dx.doi.org/10.1007/s007780050029
7. POVray, http://www.povray.org/.
8. Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context
similarity. In Proceedings of the eighth ACM SIGKDD international conference on
Knowledge discovery and data mining (KDD ’02). ACM, New York, NY, USA,
538-543. DOI=10.1145/775047.775126 http://doi.acm.org/10.1145/775047.775126
9. E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K.
Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. Katz, Pegasus: a
Framework for Mapping Complex Scientific Workflows onto Distributed Systems,
Scientific Programming Journal, vol. 13, no. 2, November 2005.
10. I. Taylor, I. Wang, M. Shields, and S. Majithia, Distributed computing with Triana
on the Grid, Concurrency and Computation: Practice and Experience, 2005.
11. T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K.
G. and Matthew R. Pocock, A. Wipat, and P. Li, Taverna: A tool for the composition
and enactment of bioinformatics workflows, Bioinformatics Journal, vol. 20, no. 17,
pp. 30453054, June 2004.
12. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock, Kepler: An
Extensible System for Design and Execution of Scientific Workflows, in 16th Intl.
Conf. on Scientific and Statistical Database Management (SSDBM04). Santorini
Island, Greece: IEEE Computer Society Press, June 21-23, 2004.
13. Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger,
Cl&#225;udio T. Silva, and Huy T. Vo. 2006. VisTrails: visualization meets data
management. In Proceedings of the 2006 ACM SIGMOD international conference
on Management of data (SIGMOD ’06). ACM, New York, NY, USA, 745-747.
DOI=10.1145/1142473.1142574 http://doi.acm.org/10.1145/1142473.1142574
14. Malik Muhammad Junaid, Maximilian Berger, Tomas Vitvar, and Thomas
Fahringer, Workflow Composition through Design Suggestions using Design-Time
Provenance Information, In Proceedings of the 5th IEEE International Conference
on e-Science, December 09-11, 2009, Oxford, UK. Accepted for publication.
15. http://www.w3.org/TR/xml-c14n
16. Bernstein A, Kaufmann E, Buerki C, Klein M (2005) How similar is it? Towards
personalized similarity measures in ontologies. In: Proceedings of the internationale
Tagung Wirtschaftsinformatik, Germany.
Semi-Automatic Composition of Ontologies for ASKANCE Grid Workflows 13

17. http://www.w3.org/2004/OWL/
18. Ivan Bedini, Benjamin Nguyen. Automatic Ontology Generation:State of the Art.
19. Renata Slota, Joanna Zieba, Bartosz Kryza, and Jacek Kitowski. 2006. Knowledge
Evolution Supporting Automatic Workflow Composition. In Proceedings of the Sec-
ond IEEE International Conference on e-Science and Grid Computing (E-SCIENCE
’06). IEEE Computer Society, Washington, DC, USA, 37-. DOI=10.1109/E-
SCIENCE.2006.97 http://dx.doi.org/10.1109/E-SCIENCE.2006.97
20. Jun Qin, Fahringer T. A novel domain oriented approach for scientific Grid
workflow composition. International Conference for High Performance Comput-
ing, Networking, Storage and Analysis 2008. SC 2008. vol., no. pp.1-12. 2008
DOI=10.1109/SC.2008.5214432
21. Euzenat J., Le Bach T., Barrasa J., Bouquet P., De Bo J., Dieng R., Ehrig M.,
Hauswirth M., Jarrar M., Lara R., Maynard D., Napoli A., Stamou G., Stucken-
schmidt H., Shvaiko P.,Tessaris S., Van Acker S. Zaihrayeu I., State of the Art
on Ontology Alignment. Knowledge Web Deliverable D2.2.3, INRIA, Saint Ismier,
2004.
22. Thomas Wchter, Michael Schroeder. Semi-automated ontology generation within
OBO-Edit. Biotechnology Center (BIOTEC),Bioinformatics (Oxford, England),
Vol. 26, No. 12. (15 June 2010), pp. i88-96. Technische Universitt Dresden, 01062
Dresden,
23. S. Castano, A. Ferrar, S. Montanelli, G. N. Hess, S. Bruno. State of the Art on
Ontology Coorination and Matching. Deliverable 4.4 Version 1.0 Final, March 2007.
BOEMIE Project
24. Paul Buitelaar, Philipp Cimiano, Marko Grobelnik, Michael Sintek. Ontology
Learning from Text Tutorial. ECML/PKDD 2005 Porto, Portugal; 3rd October
- 7th October, 2005. In conjunction with the Workshop on Knowledge Discovery
and Ontologies (KDO-2005)
25. Bohring H, Auer S. Mapping XML to OWL Ontologies, Leipziger Informatik-Tage.
2005: 147-156.
26. York Sure, Rudi Studer. On-To-Knowledge Methodology Final Version Project
Deliverable D18, 2002.
27. Denny Vrandecic, H. Sofia Pinto, York Sure, Christoph Tempich. The DILIGENT
Knowledge Processes. Journal of Knowledge Management 9 (5): 85-96. October
2005. ISSN: 1367-3270.

You might also like