You are on page 1of 1

Creating a Biologist-Oriented Interface and Code Generation System for a Computational Modeling Assistant

Jonathan M. Matthews1, Scott Christley2, Ph.D. and Gary An2, M.D. 1Massachusetts Institute of Technology, Cambridge, MA; 2University of Chicago, Chicago, IL

Abstract
We have previously developed an artificial intelligence system, the Computational Modeling Assistant (CMA), that augments the construction of dynamic computational models from biological conceptual models. To further reduce the threshold for the use of computational methods, we aim to develop a user-friendly interface that allows biologists to interact with the CMA and generate a computational model without the need for programming expertise.

The Computational Modeling Assistant (CMA)


The Computational Modeling Assistant aims to streamline the process of in silico hypothesis evaluation by augmenting computational model generation with an artificial intelligence system and providing a user-friendly interface to do so without the need for programming expertise.

Biologist-Oriented Interface
Biomedical researchers express their conceptual models through natural language. They describe the biological entities, process and interactions with a series of declarative statements.

The T1 Translational Dilemma: Bottleneck at Evaluating Mechanistic Hypotheses


Modern technology generates enormous quantities of observational data, increasing the set of possible hypotheses. However, there is a procedural bottleneck in evaluating this plethora of hypotheses to identify those plausible candidates targeted in the next cycle of experiments. This step, unlike the generation of data, cannot yet be readily accomplished in silico because of the programming expertise required to develop dynamic computational models.

The CMA uses a logical framework based on rewriting logic to reason about biological processes and translate them into computation constructs.
Accepts near-natural language statements about biological entities and concepts. Utilizes ontologies to ascribe semantic context to nouns and verbs in biological statements. Contains a set of logical rewrite rules that maps biological concepts into computational modeling methods. Supports multiple modeling methods such as ODEs, PDEs, Petri nets and agent-based models.

A biological conceptual model of Pseudomonas aeruginosa virulence activation. A series of statements describes the gene, mRNA, proteins and small molecules, and processes such as protein/ small molecule binding and transcriptional regulation.

HTML and CSS were used to create the GUI, jQuery, a JavaScript library, was used to handle client scripting, PHP was used for server-side scripting, and PostgreSQL was used for the database.

Code Generation System


The CMA interface employs Maude, a logic rewrite framework, to generate a model specification from the biological knowledge, creating the necessary Maude code from the database system on-the-fly. A model specification represents a particular mathematical modeling formalism, and the researcher selects the desired modeling method when generating the specification.
order reasoning that cannot be performed with first-order logical inference. Furthermore because the exact sequence of rules can be recovered, this enables an explanatory
398

System Architecture
The CMA interface utilizes a three-tier architecture: a web-based graphical user interface (GUI) for the researcher to input biological conceptual models, a client- and server-side scripting system to manage models and run the back-end logic code, and a database system to store knowledge for future use.

description to be provided to the user about how the biological model was transformed

S. Christley, G. An

into a computational model, which might be useful for pedagogical or debugging 17. LecA is transcribed [SBO:0000183] into PA-I lectin mRNA purposes. For the biological model given in Figure 4, the CMA produces the model 18. PA-I lectin mRNA is translated [SBO:0000184] into PA-I lectin 19 specification PA-I lectin mRNA decays of eight ODEs and specification show in Figure 6. This model is composed 20. Pseudomonas secretes [GO:0030528] PA-I lectin into extracellular compartment three PDEs represented by these equations: [GO:0005615].
d[ muc2 mRNA ] = H ( muc2 gene ) dt d[ muc2] = H ( muc2 mRNA ) ! S ( muc2) dt d[ AmRNA ] = H ( Agene ) dt d[ A ] = H ( AmRNA ) ! B( A, B) dt d[ BmRNA ] = H ( Bgene ) dt d[ B ] = H ( BmRNA ) ! B( A, B) dt d[ AB] = B( A, B) ! S ( AB) dt d[ muc2 E ] = S ( muc2) dt d[ AE ] = " 2 AE + D( AB) ! k1 AE dt d[ BE ] = " 2 BE + D( AB) ! k2 BE dt d[ ABE ] = " 2 ABE + S ( AB) ! D( ABE ) dt

2.1.13 Mapping rules and model specication creation

PDE model

As in the process delineated above, the virulence pathway biological model was presented to the CMA, translated into a set of Maude logical statements, and then the application of the Maude logical rewrite rules shown in Fig. 7 produced a Petri net model. The resulting model is shown in Fig. 8. It should be noted that an additional output of this process was an error statement: polypeptide chain entities OprF, interferon-gamma, RhlRI, has no translate This statement resulted from the fact that some of the entities listed in the biological model did not have a rule leading to their production. However, rather than considering this error statement as rendering the biological model invalid, these entities would be presented back to the researcher as variables in the model that require initialization values; i.e. the input values for simulation execution. The above model specication can be described in standard notation for biochemical rules:

In Silico Augmentation of Scientific Cycle


The ability to execute in silico experiments offers the potential to substantially accelerate and enhance the scientific cycle by rejecting implausible hypotheses structures, helping direct traditional experimental design to separate sets of plausible hypotheses, and provide a wider search capability for plausible solutions.

Web-based GUI

Knowledgebase Logic Engine PHP Scripts Back-end Server

Petri net model


PA-I-lectin-mRNA PA-I-lectin-mRNA PA-I-lectin interferon-OprF-complex interferon-OprF-complex RhlRI-Lux-box-complex where H, S, B and D are the hillFunction, secreteFunction, bindFunction and OprF + interferon-gamma dissociateFunction functions. Lux-box + RhlRI lecA + RhlRI-Lux-box-complex Model Parameters

PA-I-lectin-mRNA + PA-I-lectin PA-I-lectin[e] interferon-OprF-complex + RhlRI Lux-box + RhlRI interferon-OprF-complex RhlRI-Lux-box-complex lecA + RhlRI-Lux-box-complex + PA-I-lectin-mRNA

Management of Knowledgebase
The GUI provides management of CMAs knowledgebase of modeling methods and mapping rules, and allows the knowledgebase to be extended to support new methods and new biological concepts.

CMA uses an internal XML format for model specification (BioSwarm simulation system), but compatible models can be provided in standard formats such as SBML.

As with the gut mucus model example, the conversion of the base Petri net model In our example for gut mucus stratification, the model specification uses generic names into simulation code would involve a selection by the researcher among a set of for various interaction functions such as subcategories hillFunction, of bindFunction, etc. However, these properties, such as deterministic Petri nets based on their specic stochastic, and the of the system under study. Furthermore, as defunctions can be defined in more detail, or allowing the CMA tocharacteristics perform additional scribed in Example #1, the CMA could also have generated an ODE model of the modeling and simulation capabilities. For example, the CMA could query the number and virulence activation pathway; in practice, the biological model of virulence activation would have been inputted into the CMA and both a Petri net and an ODE model (among, perhaps, other types) would be generated, each with suggested parameters based on the requirements of the particular modeling method, and the researcher would select one or the other (or perhaps even both) based on the initialization data available or the desired dynamics to be investigated. Additionally, future development of the CMA would include the capability to parse the biological model and

Summary and Conclusions


In silico experiments can augment the standard scientific cycle to accelerate the process of scientific discovery. The CMA uses a knowledgebase of controlled vocabulary (ontologies), computational modeling methods, and rules for mapping biological concepts into computational methods. Researchers describe their biological conceptual model using structured natural language. The CMA uses logical inference to translate a biological conceptual model into a computational model specification.


Modeling Methods Mapping Rules