You are on page 1of 16

CIPRes in Kepler

:
An integrative workflow package for
streamlining phylogenetic data analyses

Zhijie Guan1, Alex Borchers1, Timothy McPhillips2,
Shirley Cohen3, Mark A. Miller1, Ilkay Altintas1

1San Diego Supercomputer Center, UCSD
2University of California, Davis

3University of Pennsylvania

biology.sdsc.edu

and visualization steps  larger. execution.edu . reuse and provenance  Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources  Make technology useful through user’s monitor!!! biology. What is a Scientific Workflow?  Combination of  data integration. sharing.sdsc. automated "scientific process"  Mission of scientific workflow systems  Promote “scientific discovery” by providing tools and methods to generate scientific workflows  Create an extensible and customizable graphical user interface for scientists from different scientific domains  Support computational experiment creation. analysis.

sdsc.edu .Promoter Identification Workflow Source: Matt Coleman (LLNL) biology.

edu .sdsc. A Workflow for Phylogeny Analysis biology.

org  … and a cross-project collaboration  June 2.kepler-project.edu . Kepler is a Scientific Workflow System www. 2006 Beta release  Builds upon the Ptolemy II: A software system used for prototyping engineering open-source system KEPLER: Ptolemy II A platform to design and execute Scientific Workflows framework KEPLER = “Ptolemy II + X” for Scientific Workflows biology.sdsc.

Some Kepler Contributors Ptolemy II Griddles SKIDL Resurgence SRB NLADR Contributor names and Other contributors: funding info are at the .National Digital Archives + UCSD-TV (US) -… biology.DART (Great Barrier Reef.sdsc.Chesire (UK Text Mining Center) . Australia) LOOKING Kepler website!! .edu .

sdsc.edu . A co-development in KEPLER: GEON Dataset Generation & Registration % Makefile $> ant run SQL database access (JDBC) biology.

Phylogeny Analysis Workflows Local Disk Phylogeny Tree Analysis Visualization Multiple Sequence Alignment biology.sdsc.edu .

sdsc. Kepler Workflow: Actors  Actor  Encapsulation of parameterized actions  Interface defined by ports and parameters  Port  Communication between input and output data  The place where data get in/out  Model of computation  Flow of control Actor-Oriented Design  Sequential / parallel execution  Implementation is a framework biology.edu .

CIPRes Workflow: Actors Input Port: Data Matrix Nexus File Content Tree Taxa Info Output Ports: biology.edu .sdsc.

Some actors in place for… • Generic Web Service Client and Web Service Harvester • Customizable RDBMS query and update • Command Line wrapper tools (local. GridFTP-based file access. Gridding. ssh. etc.edu . scp. Vis Support • Textual and Graphical Output • …more generic and domain-oriented actors… biology.) • Some Grid actors-Globus Job Runner. Proxy Certificate Generator • SRB support • Native R and Matlab support • Interaction with Nimrod and APST • Communication with ORBs through actors and services • Imaging. ftp.sdsc.

edu .sdsc. CIPRes Workflow Actor: GUIGen: Parameter Setting Choose the input file Run ClustalW Channel: Convey the data Get the subset of the aligned sequences Run PAUP for Tree Inference Read the tree Parse the tree Results: Display the tree biology.

sdsc. CIPRes Workflows: Demo  Read Sequences  Multiple Sequence Alignment  Display the Alignment  Matrix Alignment  Tree Inference  Consensus Tree  Tree Visualization biology.edu .

and reuse  Quickly prototyping scientific workflows  Building streamlining applications  Visual programming language  Don’t write your application. and computing resources  Capturing your ideas and realizing them  Supporting computational experiment creation. execution. sharing.sdsc.edu . Summary  Kepler is good at:  Integrating data. “draw”/compose it  Cipres-Kepler package can be used to build scientific workflows for phylogenetic data analyses biology. programs.

edu .sdsc. Future Work  Cipres-Kepler can help you  There is (always) a lot more to work on:  More actors for phylogeny analyses  Automatically generating actors based on CORBA services  Database (TreeBase) support to store large amounts of data  More computing power for large dataset processing  Need your collaboration:  Sharing experiences  Teaching each other the domain knowledge  Locating a specific problem and solving it biology.

edu Cipres-Kepler Release: ftp://ftp.sdsc.tgz biology.sdsc.sdsc. Questions? Zhijie Guan guan@sdsc.edu/outgoing/borchers/cipresReleases/20060621/cipresKepler_Dist.edu 1-858-822-3620 www.edu .