Professional Documents
Culture Documents
1 Introduction
The Grid [1] has eliminated the need for dedicated computational facilities and
made it possible for the scientists to leverage the computational power of re-
sources located at geographically remote distances.
The Grid workflow programming model [2] has been highly successful in the last
decade in facilitating the composition of medium to large-scale applications from
existing basic activities defining a sequence of tasks to manage computational
science processes. A Grid workflow application represents a collection of com-
putational tasks (activities) interconnected in a directed graph through control
and data flow dependencies that are suitable for execution on the Grid. The
complexity of the workflows has increased over the years with the increasing
complexity of scientific applications.
Domain scientists create Grid workflows with the help of Grid workflow composi-
tion tools and execute them with runtime environments such as available like [3],
[9], [10], [11], [12], and [13]. Although these tools have greatly simplified the ef-
fort of a workflow designer through features such as auto-completion and design
suggestions [14], [13] for workflows. Nevertheless the manual workflow creation is
still a very demanding and tedious task. It not only requires comprehensive ap-
plication domain knowledge but also sufficient skills of the incorporated workflow
composition tool.
2 Muhammad Junaid, Thomas Fahringer
The paper is organized as follows; section-2 gives the description of the ter-
minology used in the paper, which is followed by section-3 describing the AGWL
Ontology, followed by section-4 which provides the details of the implementa-
tion of the proposed system. Section-5 describes the different algorithm used to
perform the core of ontology synthesis operations. Section-6 gives an account of
the experiments performed using the proposed system and the results. Section-7
provides the related work targeted at ontology synthesis followed by the section-8
which provides the concluding remarks and potential future research prospective
for the system.
2 Prerequisite
An account of the terminology used in this paper is given below to better un-
derstand the process of the ontology creation with the proposed ontology syn-
thesis system. We use a real-world movie rendering algorithm workflow known
as Povray [7] as an example workflow. Figure-2 shows the flowchart of the appli-
cation. The Povray workflow is composed of three independent activities. The
first activity takes scene data as input and initializes the process by creating
multiple ini files for parallel execution of the Render activity. Render activity
takes this input from first activity and generates images for individual frames of
the movie. It then packs the frames into a tarball, this is consumed by the third
activity Convert which combines these individual frames and creates a movie
using these frames.
A Grid workflow is a collection of computa-
tional tasks (activities) orchestrated in a prede-
fined order to achieve the required execution goals input
on the Grid. Apart from activities, the Grid work- Scene
data start
flows also have other elements as well which include
control-flow, data-flow and data-ports etc. Each el-
ement contains a number of attributes which ex-
plain different properties of the workflow for ex- Initialize
ing the input data required by the activity and output data produced by the
activity. An OWL ontology is formal representation of the domain knowledge
using the Web Ontology Language OWL [17]. Ontology is said to be valid if it
contains sufficient domain knowledge to express all the domain concepts. This
is also referred to as completeness of the ontology.
To synthesize ontologies, the proposed system extracts and processes a lot of
information, two databases Lexicon database LDB and the Ontology database
ODB are used to store this information. The LDB database contains the lex-
icon of the semantic concepts, whereas the ODB contains the workflow data
and intermediate ontology structure etc created during the ontology synthesis
process.
Abstract Grid Workflow Language [4] (AGWL) is the UML based workflow
language used to create Grid Workflows for Askalon Grid Development environ-
ment. AGWL provides a rich set of constructs to created grid workflows for any
type of scientific applications. AGWL has already been put to use for creating
workflows from diverse scientific domains, like material sciences, meteorology, as-
trology, mathematics and computer graphics. AGWL provides different activity
type constructs to represent computational tasks of different types, control and
data flows to specify execution flow and data flow during the execution of the
workflow. AGWL ontology is the ontological representation of the basic AGWL
constructs in the form of and OWL ontology. These constructs encoded as OWL
concepts are inherited from the &OWL;Thing concept (the base concepts for all
OWL ontologies). These concepts are used as parent concepts for the domain
specific concepts when creating a new ontology. AGWO contains two basic types
of concepts that provide foundations for the extended domain concepts. These
basic concepts present in the AGWO are Function and Data. Each of these
concepts may have one or more associated properties or subconcepts.
An AGWL activity and AGWL data are represented using the concepts
Function and Data respectively in the ontology. Further semantic information
to these concepts is added by specifying owl:onProperty, and by adding their
parent concepts using owl:SubclassOf construct. The object-Properties defined
for the Function concept include hasInput and hasOutput. Whereas the concept
Data has object-Properties hasRepresentation, isInputTo, isOutputOf, isPartOf,
and hasPart etc. AGWO thus contains two distinct types of concepts which are
extended for each new domain ontology. The Function and Data base concepts
are also associated together using hasInput and hasOuput object-Properties.
Figure-2 shows the definition of these ontological concepts for AGWO.
4 Methodology
Semi-Automatic Composition of Ontologies for ASKANCE Grid Workflows 5
Input
AGWL Ontology/
AGWL Base Lexicon User Input
workflows Ontology Database
author, or version. Similarly activities in the workflow may also have varying
number of attributes. Normalization is the process of adding missing attributes
to the workflow elements so that all the elements become identical in terms of
the number and types of attributes.
After the normalization process, the next step is to add information to these
attributes if it is missing. The missing information is collected from existing
data present in the workflow lexicon database, and if no information is found in
the database then with the help of user input. This process is known as canon-
icalization. In the canonicalization process, all the attributes of the extracted
workflow structures are analyzed. When an attribute without an attribute value
is found the user is asked to provide value for this attribute. The process contin-
ues until all the attributes values are filled in these datasets. This canonicalized
information is then stored in the ontology database ODB. Database schema of
the canonical relational database is shown in Figure-4.1. The four database re-
lations in the schema are used to store the required information. Workflow data
is stored in the workflow relation, where as the activity related data is store in
the activity relation. The other two relations act port and wf port are used to
store dataports for activities and workflows respectively.
Fig. 6.
8 Muhammad Junaid, Thomas Fahringer
The next step in the process is semantic enrichment of the intermediate ontology.
Both the former steps (parsing and structure creation) are carried out automat-
ically except for the canonicalization process which may request user input if
the lexicon database does not contain a matching record. This step requires
much more user interaction in comparison to the steps mentioned earlier. In this
step semantic information is acquired from the user to semantically enrich the
existing concepts in the intermediate ontology.
In this phase every concept from the intermediate ontology is presented to
user and the user is asked to classify the concept by giving the meaningful
concept name and by associating it with the appropriate parent. User can also
create additional concepts if a concept cannot be associated to any of the existing
parent concepts. As discussed earlier there are two basic type of concepts, i.e.
Data and Functions, similarly an element from a workflow can be data or an
activity, so the user specifies appropriate meanings representing exactly what an
element is. In case of data a name is given to the data element which specifies
the contents of the element, i.e. data meanings and in case of an activity a
meaningful name is given to the activity which describes the function performed
by this activity. During this interactive process all the information given by the
user is stored in the lexicon database. So for further workflow components if
a user has to specify meanings for an activity he can choose from among the
existing meanings from the lexicon.
5 Algorithm
pretty straight forward and have already been explained in the section-4.1. The
other functions like getWFInports(), getActivities(), getWFoutports() etc. are
the information extraction functions, and their functionality is mentioned in the
comment line appearing next to function call. The functions OWL Structure()
perform the task of creating OWL structure without any semantic information.
The OWL encoded structures are used as placeholders in the intermediate ontol-
ogy. The function createIntermediateOntology() creates the intermediate Ontol-
ogy representation using these OWL structures as placeholders for actual con-
cepts. Once the intermediate Ontology is created each one of the OWL structures
is enriched with semantic information using makeConcept() function calls. The
makeConcept() function requests user input and converts the OWL structures
into meaningful ontological concepts for the domain. The resulting conceptual
representation of these OWL structures is then combined together and added to
the target ontology O using the add Concept function call.
6 Experiments
7 Related work
chance of having wrong relations between two concepts because the AGWO is
proven to be correctly defined for all AGWL related concepts.
Another attempt on automatic ontology synthesis presented by [18] performs
the ontology synthesis operation using the existing ontologies for similar input
information. Again the process of synthesis is based on three steps of concepts ex-
traction, analysis and generation. This approach introduces two additional steps
of validation and evolution. The three major steps which actually create the
ontology are much similar to the three steps proposed in our approach towards
ontology synthesis. But the internal functionality of the analysis and genera-
tion phases of this system relies on existing ontologies; it matches/ align the
newly found concepts to existing concepts in the reference ontologies and makes
the decision about the use of these concept classes. The quality of alignment /
matching based approach becomes limited and dependent upon the availability
and richness of the existing ontologies which is not the case in our where an
ontology can be generated from scratch without the help of any existing domain
ontologies.
There are numerous other attempts on the process of ontology synthesis like
[26], [23], [27] and [21], but these researchers have addressed one or the other as-
pects of the whole ontology synthesis scenario. For example [23] a comprehensive
study of the classification of concepts and matching and alignment techniques
is presented. In [21] we see a detailed analysis of the ontology alignment tech-
niques. Moreover in [24] a tutorial is presented which helps understanding the
ontology learning process from raw text input. One interesting approach much
similar to our approach is XML to OWL transformation [25]; it is based on the
idea of mapping XSD file entities to OWL concepts and classes. The limitation
of the approach is the absence of relation building between different types of
concepts.
References
1. Foster, I. and Kesselman, C. 2003. The Grid. Blueprint for a New Computing In-
frastructure.: Blueprint for a New Computing Infrastructure (Elsevier Series in Grid
Computing). Morgan Kaufmann, 2. a. edition.
12 Muhammad Junaid, Thomas Fahringer
2. Taylor, I. J., Deelman, E., and Gannon, D. B. (2006). Workflows for e-Science:
Scientific Workflows for Grids. Springer.
3. Fahringer, T., Jugravu, A., Pllana, S., Prodan, R., Seragiotto, C., and Truong,
H. 2005. ASKALON: a tool set for cluster and Grid Computing: Research Ar-
ticles. Concurr. Comput. : Pract. Exper. 17, 2-4 (Feb. 2005), 143-169. DOI=
http://dx.doi.org/10.1002/cpe.v17:2/4 18th ACM international symposium on High
performance distributed computing (HPDC ’09). ACM, New York, NY, USA, 167-
176. DOI=10.1145/1551609.1551637 http://doi.acm.org/10.1145/1551609.1551637
4. Fahringer, T., Qin, J., and Hainzer, S. 2005. Specification of grid workflow applica-
tions with AGWL: an Abstract Grid Workflow Language. In Proceedings of the Fifth
IEEE international Symposium on Cluster Computing and the Grid (Ccgrid’05) -
Volume 2 - Volume 02 (May 09 - 12, 2005). CCGRID. IEEE Computer Society,
Washington, DC, 676-685.
5. Hongyi Zhou and Jeffrey Skolnick. 2009. Protein structure prediction by pro-Sp3-
TASSER. In Proceedings of the Biophysical journal. Volume 96. Pages 2119-27
6. Vipul Kashyap and Amit Sheth. 1996. Semantic and schematic similar-
ities between database objects: a context-based approach. The VLDB
Journal 5, 4 (December 1996), 276-304. DOI=10.1007/s007780050029
http://dx.doi.org/10.1007/s007780050029
7. POVray, http://www.povray.org/.
8. Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context
similarity. In Proceedings of the eighth ACM SIGKDD international conference on
Knowledge discovery and data mining (KDD ’02). ACM, New York, NY, USA,
538-543. DOI=10.1145/775047.775126 http://doi.acm.org/10.1145/775047.775126
9. E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K.
Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. Katz, Pegasus: a
Framework for Mapping Complex Scientific Workflows onto Distributed Systems,
Scientific Programming Journal, vol. 13, no. 2, November 2005.
10. I. Taylor, I. Wang, M. Shields, and S. Majithia, Distributed computing with Triana
on the Grid, Concurrency and Computation: Practice and Experience, 2005.
11. T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K.
G. and Matthew R. Pocock, A. Wipat, and P. Li, Taverna: A tool for the composition
and enactment of bioinformatics workflows, Bioinformatics Journal, vol. 20, no. 17,
pp. 30453054, June 2004.
12. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock, Kepler: An
Extensible System for Design and Execution of Scientific Workflows, in 16th Intl.
Conf. on Scientific and Statistical Database Management (SSDBM04). Santorini
Island, Greece: IEEE Computer Society Press, June 21-23, 2004.
13. Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger,
Cláudio T. Silva, and Huy T. Vo. 2006. VisTrails: visualization meets data
management. In Proceedings of the 2006 ACM SIGMOD international conference
on Management of data (SIGMOD ’06). ACM, New York, NY, USA, 745-747.
DOI=10.1145/1142473.1142574 http://doi.acm.org/10.1145/1142473.1142574
14. Malik Muhammad Junaid, Maximilian Berger, Tomas Vitvar, and Thomas
Fahringer, Workflow Composition through Design Suggestions using Design-Time
Provenance Information, In Proceedings of the 5th IEEE International Conference
on e-Science, December 09-11, 2009, Oxford, UK. Accepted for publication.
15. http://www.w3.org/TR/xml-c14n
16. Bernstein A, Kaufmann E, Buerki C, Klein M (2005) How similar is it? Towards
personalized similarity measures in ontologies. In: Proceedings of the internationale
Tagung Wirtschaftsinformatik, Germany.
Semi-Automatic Composition of Ontologies for ASKANCE Grid Workflows 13
17. http://www.w3.org/2004/OWL/
18. Ivan Bedini, Benjamin Nguyen. Automatic Ontology Generation:State of the Art.
19. Renata Slota, Joanna Zieba, Bartosz Kryza, and Jacek Kitowski. 2006. Knowledge
Evolution Supporting Automatic Workflow Composition. In Proceedings of the Sec-
ond IEEE International Conference on e-Science and Grid Computing (E-SCIENCE
’06). IEEE Computer Society, Washington, DC, USA, 37-. DOI=10.1109/E-
SCIENCE.2006.97 http://dx.doi.org/10.1109/E-SCIENCE.2006.97
20. Jun Qin, Fahringer T. A novel domain oriented approach for scientific Grid
workflow composition. International Conference for High Performance Comput-
ing, Networking, Storage and Analysis 2008. SC 2008. vol., no. pp.1-12. 2008
DOI=10.1109/SC.2008.5214432
21. Euzenat J., Le Bach T., Barrasa J., Bouquet P., De Bo J., Dieng R., Ehrig M.,
Hauswirth M., Jarrar M., Lara R., Maynard D., Napoli A., Stamou G., Stucken-
schmidt H., Shvaiko P.,Tessaris S., Van Acker S. Zaihrayeu I., State of the Art
on Ontology Alignment. Knowledge Web Deliverable D2.2.3, INRIA, Saint Ismier,
2004.
22. Thomas Wchter, Michael Schroeder. Semi-automated ontology generation within
OBO-Edit. Biotechnology Center (BIOTEC),Bioinformatics (Oxford, England),
Vol. 26, No. 12. (15 June 2010), pp. i88-96. Technische Universitt Dresden, 01062
Dresden,
23. S. Castano, A. Ferrar, S. Montanelli, G. N. Hess, S. Bruno. State of the Art on
Ontology Coorination and Matching. Deliverable 4.4 Version 1.0 Final, March 2007.
BOEMIE Project
24. Paul Buitelaar, Philipp Cimiano, Marko Grobelnik, Michael Sintek. Ontology
Learning from Text Tutorial. ECML/PKDD 2005 Porto, Portugal; 3rd October
- 7th October, 2005. In conjunction with the Workshop on Knowledge Discovery
and Ontologies (KDO-2005)
25. Bohring H, Auer S. Mapping XML to OWL Ontologies, Leipziger Informatik-Tage.
2005: 147-156.
26. York Sure, Rudi Studer. On-To-Knowledge Methodology Final Version Project
Deliverable D18, 2002.
27. Denny Vrandecic, H. Sofia Pinto, York Sure, Christoph Tempich. The DILIGENT
Knowledge Processes. Journal of Knowledge Management 9 (5): 85-96. October
2005. ISSN: 1367-3270.