CIPRes in Kepler

An integrative workflow package for
streamlining phylogenetic data analyses

Zhijie Guan1, Alex Borchers1, Timothy McPhillips2,
Shirley Cohen3, Mark A. Miller1, Ilkay Altintas1

1San Diego Supercomputer Center, UCSD
2University of California, Davis

3University of Pennsylvania

What is a Scientific Workflow?  Combination of data integration, analysis, and visualization steps  Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources  Make technology useful through user's monitor  Mission of scientific workflow systems  Promote "scientific discovery" by providing tools and methods to generate scientific workflows  Create an extensible and customizable graphical user interface for scientists from different scientific domains  Support computational experiment creation, sharing, reuse and provenance  automated "scientific process"

A Workflow for Phylogeny Analysis

Kepler is a Scientific Workflow System  June 2006 Beta release  Builds upon the Ptolemy II: A software system used for prototyping engineering open-source system KEPLER: Ptolemy II A platform to design and execute Scientific Workflows framework KEPLER = "Ptolemy II + X" for Scientific Workflows

Some Kepler Contributors Ptolemy II Griddles SKIDL Resurgence SRB NLADR National Digital Archives + UCSD-TV (US) DART (Great Barrier Reef, Australia) LOOKING Chesire (UK Text Mining Center) A co-development in KEPLER: GEON Dataset Generation & Registration % Makefile $> ant run SQL database access (JDBC)

Phylogeny Analysis Workflows Local Disk Phylogeny Tree Analysis Visualization Multiple Sequence Alignment .

Kepler Workflow: Actors  Actor  Encapsulation of parameterized actions  Interface defined by ports and parameters  Port  Communication between input and output data  The place where data get in/out  Model of computation  Flow of control Actor-Oriented Design  Sequential / parallel execution  Implementation is a framework

CIPRes Workflow: Actors Input Port: Data Matrix Nexus File Content Tree Taxa Info Output Ports:

Some actors in place for… • Generic Web Service Client and Web Service Harvester • Customizable RDBMS query and update • Command Line wrapper tools (local, ssh, scp, ftp) • Some Grid actors-Globus Job Runner, GridFTP-based file access, Proxy Certificate Generator • SRB support • Native R and Matlab support • Interaction with Nimrod and APST • Communication with ORBs through actors and services • Imaging, Gridding, Vis Support • Textual and Graphical Output • …more generic and domain-oriented actors…

CIPRes Workflow Actor: GUIGen: Parameter Setting Choose the input file Run ClustalW Channel: Convey the data Get the subset of the aligned sequences Run PAUP for Tree Inference Read the tree Parse the tree Results: Display the tree

CIPRes Workflows: Demo  Read Sequences  Multiple Sequence Alignment  Display the Alignment  Matrix Alignment  Tree Inference  Consensus Tree  Tree Visualization

Summary  Kepler is good at:  Integrating data, programs, and computing resources  Capturing your ideas and realizing them  Supporting computational experiment creation, execution, and reuse  Quickly prototyping scientific workflows  Building streamlining applications  Visual programming language  Don't write your application, "draw"/compose it  Cipres-Kepler package can be used to build scientific workflows for phylogenetic data analyses

Future Work  Cipres-Kepler can help you  There is (always) a lot more to work on:  More actors for phylogeny analyses  Automatically generating actors based on CORBA services  Database (TreeBase) support to store large amounts of data  More computing power for large dataset processing  Need your collaboration:  Sharing experiences  Teaching each other the domain knowledge  Locating a specific problem and solving it

Questions? Zhijie Guan 1-858-822-3620 zguan@sdsc.edu Cipres-Kepler Release: ftp://ftp.sdsc.edu/pub/sdsc/biology/cipres/cipres-kepler.tgz