Xingdong Bian - X-Machine Model of A Biological System

X-Machine Model of a Biological System
Third year undergraduate dissertation project Final Dissertation
Department of Computer Science University of Sheffield
Author: Xingdong Bian Supervisor: Prof. Mike Holcombe Module code: COM3021 Date: 29/03/2006
This report is submitted in partial fulfilment of the requirement for the degree of Bachelor of Science with Honours in Computer Science by Xingdong Bian.
Signed declaration:
All sentences or passages quoted in this dissertation from other people's work have been specifically acknowledged by clear cross-referencing to author, work and page(s). Any illustrations which are not the work of the author of this dissertation have been used with the explicit permission of the originator and are specifically acknowledged. I understand that failure to do this amounts to plagiarism and will be considered grounds for failure in this dissertation and the degree examination as a whole. Name:
XINGDONG BIAN
Signature: Date: 02/05/2006
II
Abstract:
This project is in the field of computational biology, by using the computer simulation model to display the biological systems spatial and temporal aspects in detail. The aim for this project is develop a simulation of a vital part of the immune system by using X-machine framework and tools such as xparser and xml. By converting the exist models in Matlab code into xml, and then use an xparser parse it to a runnable C source coded programme. Three models are involved in this project: chemical interaction model, NF-kB signalling pathway model and NF-kB & MAP kinase signalling combined model. The first two models have existing Matlab models to be converted, but the last model is needed to do some research and add a new pathway into NF-kB.
III
Acknowledgments
Thanks everyone who helped me with this project. Especially my supervisor Prof. Mike Holcombe, thanks him leading me to the right direction, many ideas and much advice of this project. Also thanks Mr. Simon Coakley helped me with xml specification, xparser and visualisation. Thanks Mr. Mark Pogson help me with Matlab example models. Lastly, thanks Prof. Eva Qwarnstrom helped me with biological knowledge and experimental data.
IV
Contents
Title Signed declaration Abstract Acknowledgments Contents Figure List Chapter 1 Section 1.1 Section 1.2 Section 1.2.1 Section 1.2.2 Section 1.2.3 Section 1.3 Chapter 2 Section 2.1 Section 2.2 Section 2.3 Section 2.4 Section 2.5 Section 2.5.1 Section 2.5.2 Section 2.5.3 Chapter 3 Section 3.1 Section 3.2 Section 3.2.1 Section 3.2.2 Section 3.2.3 Section 3.3 Section 3.3.1 Section 3.3.2 Section 3.4 Chapter 4 Section 4.1 Section 4.1.1 Section 4.1.2 Section 4.1.3 Section 4.2 Section 4.2.1 Section 4.2.2 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Introduction Background About the Project Agent-Based Modelling X-machine HPCx About This Dissertation Literature Review Overview Agent-Based Intracellular Chemical Interactions Model Agent-Based the NF-B Signalling Pathway Model NF-B Signalling Pathway and MAP Kinase Signal Pathway Combined Model Some Agent-Based Modelling Approaches Swarm Agent-Based Modelling MASON Multi-Agent Simulations X-machine Framework and XML Requirements and Analysis Objectives and Requirement for the Project Analysis for Intracellular Chemical Interaction Model Importance and User Requirements Conversion from Matlab Concentrations Rates Analysis for the NF-B Signalling Pathway Model Importance and User Requirements Conversion from Matlab Analysis for the NF-B & MAP Kinase Signalling Pathway Combined Model Design Associated Language with the Project XML Matlab C Overall Design X-machine Frameworks Architecture Main XML File Structure 24 24 24 24 25 25 25 26 12 12 13 14 17 17 18 18 19 20 20 20 21 22 I II III IV V VII 1 1 2 2 3 3 4 5 5 6 8 11
Section 4.2.3 Section 4.3 Section 4.4 Section 4.5 Chapter 5 Section 5.1 Section 5.1.1 Section 5.1.2 Section 5.1.3 Section 5.2 Section 5.2.1 Section 5.2.2 Chapter 6 Section 6.1 Section 6.2 Section 6.3 Chapter 7 Section 7.1 Section 7.2 References Appendices
Iteration XML File Structure Design of Chemical Interaction Model Design of NF-B Signalling Pathway Model Design of NF-B & MAP Kinase Signalling Pathway Combined Model Implementation and Testing Implementation of Three Models Implementation of Chemical Interaction Model Implementation of the NF-B Signalling Pathway Model Implementation of NF-B & MAP Kinase Signal Pathway Combined Model Testing Methods Unix Tool for Single Iteration Testing Getdata Programme for Whole Iteration Files Testing Results and Discussion Results and Discussion of Chemical Interaction Model Result and Discussion of NF-B Pathway model Result and Discussion of NF-B & MAP kinase pathways combined model Conclusions Summary of the Dissertation and Project Future Work of this Project -----------------------------------------------------------------------------------------------------------------------------------------------
27 28 29 30 33 33 33 36 39 40 40 40 42 42 45 49 52 52 52 54 i
VI
Figure List
Figure 2.1 Figure 2.2 Figure 2.3 Figure 2.4 Figure 2.5 Figure 2.6 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 4.1 Figure 4.2 Figure 4.3 Figure 5.1 Figure 5.2 Figure 5.3 Figure 5.4 Figure 5.5 Figure 6.1 Figure 6.2 Figure 6.3 Figure 6.4 Figure 6.5 Figure 6.6 Transmembrane Signalling Biomechanical and Soluble Mediators Chemical Interaction Model Visualisation (Matlab) Chemical Interaction Model Results (Matlab) NF-B Pathway Model Visualisation (Matlab) NF-B Pathway Model Results (Matlab) Summary of MAP kianse pathway Process of combination Chemical reactions concentration of molecule A, B and C against time Possible states and transition of an NF-B Simplify of the MAP Kinase pathway Structure of the Main file (a) Structure of the Main file (b) NF-B & MAP Kinase Signalling Pathway Relation states and relations in X-machine Visualisation of Chemical Interaction Model Visualisation of NF-B signalling pathway model Visualisation of NF-B MAP kinase combined model Concentration against Iterations (time steps) graph Chemical interaction agent model graph one Chemical interaction agent model graph two Visualisation for chemical interaction model NF-B pathway agent model result (a) NF-B pathway agent model result (b) Result of the combined model 5 8 8 10 10 11 18 19 20 22 23 26 27 31 34 35 38 39 41 43 44 44 48 49 51
VII
Chapter 1: Introduction
Section 1.1: Background
This project is in the field of computational biology, computational biology is a term for an interdisciplinary field of the joining of both computer technology and biology. Computational biology has just started in recent years. The field is located at the interface between the two scientific and technological disciplines that can be argued to drive a significant if not the dominating part of contemporary scientific innovation [1]. After more discoveries in biology such as the structure, organisation and behaviour of cells, tissues, organisms and communities of biological systems, more understanding and may be simulation is needed. Computer technology is able to solve this question, and providing prediction for important aspects of the biology systems behaviour. Computer technology gives vitality to the research of biology area. The famous example is the Human Genome Project, it has generated an extraordinary amount of data. Biologists are now faced with the challenge of extracting meaning from linear sequences composed of billions of base pairs. The work of computational biologists is indispensable for this task and for many other biological problems that lend themselves to computational solutions [2]. This is the reason why computational biology field is developed dramatically, more and more people in both areas are starting to work together and get best solution of their research. There are 10 major research areas for computational biology now: sequence analysis, computational evolutionary biology, gene expression analysis, regulation analysis, protein expression analysis, analysis of mutations in cancer, structure prediction, measuring biodiversity, modelling biological systems and high-throughput image analysis. My project is in the 9th area stated above modelling biological systems, this area involves the use of computer simulations of cellular subsystems for both spatial and temporal aspects the complex connections of these cellular processes. The definition for biological computer modelling is using a computer programme which tries to simulate an abstract model of a particular biological system. Biological computer simulation is a subset of computer simulation. Computer simulation is a really useful part in modelling lots of natural systems, which gives insight into the operation of the nature systems are been modelled. The age before computer simulation, people were using mathematical models, but with computer simulation, modelling went in a new stage. Here is history of computer simulation (quoted from the Wikipedia article "Computer Simulation", it is licensed under the GNU Free Documentation License -http://www.gnu.org/copyleft/fdl.html):
1
Computer simulation was developed hand-in-hand with the rapid growth of the computer, following its first large-scale deployment during the Manhattan Project in World War II to model the process of nuclear detonation. It was a simulation of 12 hard spheres using a Monte Carlo algorithm. Computer simulation is often used as an adjunct to, or substitution for, modelling systems for which simple closed form analytic solutions are not possible. There are many different types of computer simulation; the common feature they all share is the attempt to generate a sample of representative scenarios for a model in which a complete enumeration of all possible states of the model would be prohibitive or impossible. Computer models were initially used as a supplement for other arguments, but their use later became rather widespread. The physicist Richard Feynman, was not fond of such models and once called them "a disease"[3].
Section 1.2: About the Project

About my project: the aim for my project is developing a simulation of a vital part of the immune system by using framework and tools. Based on the existing framework which was developed by the Computational Biology Research Group in our department, it can model different kinds of biological systems and the systems are defined in terms of individual agents which play the role of different biological entities such as molecules, receptors etc. Also the simulations they have built can solve thousand of these agents operating and communication with other agents. This is called Agent-Based Modelling.
Section 1.2.1: Agent-Based Modelling

Agent-Based Modelling is developed to deal with the complexities of the system and to extend the capabilities of previous chemical modelling attempts [4][5]. It can provide better understanding of the operation for the cellular reactions for both spatial and temporal aspects. Agent-Based modelling (also known as individual-based modelling) treats each individual component of a system as a single entity (or agent) obeying its own pre-defined rules and reacting to its environment and neighbouring agents accordingly [4][6]. Agent is good for representing component of a system. Also, for the agents, they can be represented by various computational models; the approach chosen here is the X-machine, providing an intuitive and precise method to model the functional behaviour of systems in a flexible and modular manner [5]. A single stream X-machine is used to describe each individual agent, and communication channels are identified between machines to deal with agent interactions [7]. When modelling complex systems, there is an essential feature for X-machine: it is directly to develop by adding new agents to the system and makes the modelling process
2
extensible.
Section 1.2.2: X-machine

The reason we use X-machine is due to its speciality. X-machines are similar to finite state machines, which are models of behaviour based on states and transitions, but the X-machines has a addition feature: memory, it achieve that transitions between states can include the memory and the modification of it [9]. The memory lets X-machine have an important and novel feature. The memory in X-machine contains physical location, so that the number of states required to model the system is manageably small. The using of framework as this: programme using XML with the X-machine specific way and then the Xparser (which is built by the computational biology research group in our department as well) will produce a programme in C code from the X-machine XML specification. By running the programme it will simulate the agents behaviour and it is also possible to visualise the simulation by the special visualisation C programme built for the model. The reason why the framework is based on XML instead of directly writing it into C code is: the XML is simple and it is flexible text format derived from SGML, which will show all the state of each agent clearly and it is really simple to be code compare with C. After the XML code created, the Xparser will parse it into C code easily.
Section 1.2.3: HPCx

The computational biology research team has already done the model for the vital part of immune system in Matlab, what I will do is convert the model into X-machine framework which will be running under C compilers. The reason for that is because of the super computer HPCx cannot run Matlab but C. In order to get this super computer to calculate our simulation, we have to convert our model into C. We can see the super computers hardware specification of it (quoted from http://www.epcc.ed.ac.uk/msc/systems_HPCx.htm):
The HPCx system is located at the UK's CCLRC's Daresbury Laboratory and operated by the HPCx Consortium. The HPCx system uses IBM p690+ Regatta nodes for the compute and IBM p690 Regatta nodes for login and disk I/O. Each Regatta node contains 32 processors. At present there are two p690 service nodes. At the beginning of the user service on HPCx phase2 in April 2004, twenty p690+ nodes were used for compute jobs, offering a total of 640 processors. From Monday, 10 May, there were 38 frames, i.e. 1216 processors, available to users. Then the system had a throughput of at least 4.8 Tflops (4800 AU/hr). This was increased to 50 nodes offering 1600 processors end of May 2004. The peak computational power of the HPCx system is 10.8 Tflops peak, or at least 6 Tflops sustained. The
complete new platform gave a value of 6,188 Gflops for the Rmax value of the Linpack benchmark. The service can thus provide 6,188 AUs per hour, 148,512 AUs per day.
HPCx service is provided by a consortium led by the University of Edinburgh, with the Council for the Central Laboratory of the Research Council and IBM. This super computer will help us by running the simulation by thousands of processors with different agent in different processor to get a much more accurate result. However my project doesnt involve to HPCx directly.
Section 1.3: About This Dissertation

This dissertation consists of seven chapters, after this beginning introduction chapter, the second chapter is literature review, all the related background literature will be mentioned as well as the X-machine framework in detail and the associating three programming language with my project. The third chapter is requirements and analysis, this chapter talks about the project by objectives, requirement and the analysis in a more detailed way. How the project will be evaluated will also be included in this chapter. The next chapter is design the design technique of this project. Then the fifth chapter is implementation and testing, this chapter is about the coding methods and how to test the model. The sixth chapter is results and discussion, this is a important chapter that shows the main results of the model and some discussion. The last chapter is conclusions, a summarisation of the project and the dissertation.
Chapter 2: Literature Review

Section 2.1: Overview
Three models are involved in my project: intracellular chemical interactions model, the NF-B signalling pathway model and a combined of NF-B signalling pathway and MAP Kinase signal pathway model. Also there are three programming languages associated with my project xml, Matlab and C. We can see a picture which shows a part signalling pathways in cell, and some of the molecules are going to appear in the model, this picture was done by Prof. Eva Qwarnstrom:
Figure 2.1 [26]

5
Section 2.2: Agent-Based Intracellular Chemical Interactions Model

Firstly, I will introduce intracellular chemical interactions model. Even the simplest life forms require the interaction of more than 400 chemical processes that are encoded by genes [9]. To track and understand the intracellular chemical interactions, the intracellular signalling pathways should be considered. Intracellular signalling pathways are really important for cell behaviour in control and regulation. With agent-based modelling it will show the intracellular signalling pathways in both spatial and temporal concerns. By using the agent-based modelling, it is possible to provide a framework for calculating chemical interactions with accurate result. Complex interactions of genes, proteins and other molecules within the cell must be addressed in order to gain a better understanding of how these pathways operate [6][10][11]. Also by using mathematical models with the information of physical components of the cell, it is easier to understand the activities of signalling pathways. People used to model intracellular signalling pathways relying on reaction kinetics, by using ordinary differential equations to show each chemicals quantities with time. This is possible only when the chemicals in the cell are well mixed. However, due to internal structure and low numbers and non-uniform distributions of certain key molecules in the cell, this is certainly not true [12]. Also because the signalling pathways are complex, only using mass number of ordinary differential equations is necessary for the reaction kinetics models. However the description will be huge and the solutions will be difficult to be expressed. This kind of models has some other problems as well: they have limitations in function properly and those large numbers of ordinary differential equations are sensitive, only small changes to the equations will cause big changes in behaviours. So this kind of models has a narrow view of the real behaviours in the cells even they can provide useful results sometimes. An important factor needs to be encountered for intracellular modelling is time delays. Time delays in certain cellular processes such as transcription can have very significant effects on pathway behaviour [6]. Differential equation models dont consider this factor because of its attributes, they cant include inside with those ordinary differential equations. An even more important factor for intracellular modelling is spatial effects. Again, differential equation models are hard to consider spatial effects. As all above, even the differential equation models are important, but they still have lots of disadvantages and limitation on modelling of intracellular interactions. So, to gain a higher level of understanding the mechanical and structural effects on intracellular pathways, more transparent and abstract models are needed.
A good modelling approach here, which is called: agent-based modelling. Agent-based modelling models each individual component of a system as a single agent obeying its own pre-defined rules and reacting to its environment and neighbouring agents accordingly [6]. That means agent-based modelling contains new methods of modelling spatial systems that deal with much finer spatial and temporal scales where activity is represented at the level of the individual or agent. Also, processes naturally enter these systems as agent behaviour and then it joins the spatial context naturally as well. Agent-based modelling has recently been applied to a variety of biological systems, including insect communities and epithelial tissue [13][14][15][16]. Agents in a biological system for a biochemical pathway, can be presented as anything from a molecule to a signalling receptor to a an entire chain of interactions can be modelled as an agent, thus providing a modular and extensible modelling framework which allows abstraction of details as necessary [5]. So agent-based modelling is clarified in spatial modelling, which is good for monitoring intracellular interaction and the change cell structures by the interaction processes. Compared with the differential equation models, agent-based models have a lot more freedom: they can model different quantity and different positions of molecules with no limitations if the computer is good enough. Also, the two important factors: time delays and spatial effects can be included in the model easily. But notice the number of agents must be positive. Different from the differential equation models, agent-based models dont need a lot ordinary differential equations in modelling, but they need some other details for each agents position and properties, so that is a large amount of information that needs to be specified. Another thing needs to be noticed is the agent-based model should agree with the associate kinetics model. The two images below is an agent-based model coded in Matlab by Mark Pogson in our department. The Figure 2.2 shows a step in the middle of interaction, it clearly displays all three kinds of molecules position and number in a three dimensional box. The Figure 2.3 shows the number of each kind molecule against time in second. We can see that by the time change molecule A interacts with molecule B produces molecule C. Also the numbers of them are associated. An agent-based intracellular interaction model (A + B C) by Matlab code:
Figure 2.2
Figure 2.3
Section 2.3: Agent-Based the NF-B Signalling Pathway Model

After the intracellular chemical interaction model, now we move on to the second model which is involved with my project, it is called the NF-B signalling pathway. NF-B nuclear factor kappa B, is a heterodimeric protein composed of different
8
combinations of members of the Rel family of transcription factors. The Rel/ NF-kB family of transcription factors are involved mainly in stress-induced, immune, and inflammatory responses. In addition, these molecules play important roles during the development of certain hemopoietic cells, keratinocytes, and lymphoid organ structures. More recently, NF-kB family members have been implicated in neoplastic progression and the formation of neuronal synapses. NF-kB is also an important regulator in cell fate decisions, such as programmed cell death and proliferation control, and is critical in tumorigenesis [17]. So the intracellular NF-B signalling pathway is important to immune systems. Due to its control of cells death and proliferation, the research of NF-B signalling pathway is really important. Imagine if people can control it, let cancer cells kill themselves and normal cells stay alive, then the biggest problem in the world now cancer, will be solved. However, it is not easy to control it so a good model for intracellular NF-B signalling pathway is needed to show both spatial and temporal details of the pathway for research purpose. NF-B activation is tightly controlled by inhibitors of NF-B (IB) proteins [5][18]. IB sequesters the majority of NF-B in the cytoplasm as complexes by masking their nuclear localisation signals [19]. During activation, IB is phosphorylated by IB kinases (IKK), causing its ubiquitination and proteosome-mediated degradation. The newly free NF-B is consequently transported into the nucleus, inducing genes bearing cognate binding motifs [5]. All the information above is for showing how important NF-B signalling pathway is and how NF-B is activated. Now we need a computational model to get the information of the way how it controls the signalling pathways, with the results provided by the experiment. It is the same with intracellular chemical interaction model, people use differential equations to model inhibitors performance. However, as I mentioned above, the differential equation models have limitation to show the actual pathway. So, the best approach here is agent-based modelling. Agent-based modelling is able to give the intracellular NF-B signalling pathway a better scope of analysis and more complete view of the regulatory mechanisms. It shows what is actually happening inside the cell. A single agent is a molecule inside the cell in this model and its behaviour is controlled by the rules of interaction and its environment. Even sometimes it is not possible to model all the individual molecules due to biological or computational limitations, but by using some other agents to separate the system into useful components, it will provide a complete view of the pathway. Again in this model, the agent-based modelling has wilder scope than the reaction
9
kinetics modelling, but the agent-based model must agree with the corresponding reaction kinetics model. The two images bellow is a second agent-based model coded in Matlab by Mark Pogson, from our department. The Figure 2.4 shows a step in the middle of the NF-B signalling pathway simulation, it clearly displays a cells model and the position for each kind of molecule. The Figure 2.5 shows the concentration of each kind of molecule against time in second.
Figure 2.4
Figure 2.5 As we can see, the model is made up of lots of different molecule in a spherical cell with a spherical nuclear centre region. However, in the actual world, some cells have unique and non-spherical free shape. To model those cells, we will need some special
10
software to sort the boundary out, but it is still based on a spherical shaped model with all kinds of coordinates.
Section 2.4: NF-B Signalling Pathway and MAP Kinase Signal Pathway Combined Model
MAP Kinase stands for Mitogen-activated protein kinase. In cell biology, mitogen-activated protein kinases are serine/threonine-specific protein kinases that respond to extracellular stimuli (mitogens) and regulate various cellular activities, such as gene expression, mitosis, differentiation, and cell survival/apoptosis. Extracellular stimuli lead to activation of a MAPK via a signalling cascade composed of MAPK, MAPK kinase (MAPKK), and MAPKK kinase (MAPKKK). A MAPKKK that is activated by extracellular stimuli phosphorylates a MAPKK on its serine and threonine residues, and then this MAPKK activates a MAPK through phosphorylation on its serine and tyrosine residues. This MAPK signalling cascade has been evolutionarily well-conserved from yeast to mammals. [27]
Figure 2.6 [25]
The Figure 2.6 only shows a summary of MAP kinase pathway, but the Figure 2.1
11
shows a more complex and complete signalling pathways. It also shows the cross talk between NF-B and MAP kinase pathways. This pathway can also be modelled by the agent-based model. By introduce each molecule as an agent. Same with NF-B signalling pathway agent-based modelling is also able provide a better scope of analysis and more complete view of the regulatory mechanisms. However, the combined model is more complex and more important for research purpose, what is actually happening inside the cell is necessary to be displayed by computer model. The most important thing is to see if these two pathways interfere with each other when they are in the same model, also the cross interaction between the members of them is fatal. If two pathways behave normal in the same model that means X-machine framework is capable to model more than one pathway. This is also the base of the future models which have three or more pathways inside.
Section 2.5: Some Agent-Based Modelling Approaches

Section 2.5.1: Swarm Agent-Based Modelling
Swarm is a multi-agent software platform for the simulation of complex adaptive systems. In the Swarm system the basic unit of simulation is the swarm, a collection of agents executing a schedule of actions. Swarm supports hierarchical modelling approaches whereby agents can be composed of swarms of other agents in nested structures. Swarm provides object oriented libraries of reusable components for building models and analyzing, displaying, and controlling experiments on those models. Swarm is currently available as a beta version in full, free source code form. It requires the GNU C Compiler, Unix, and X Windows. [33] The modelling formalism that Swarm adopts is a collection of independent agents interacting via discrete events. Within that framework, Swarm makes no assumptions about the particular sort of model being implemented. There are no domain specific requirements such as particular spatial environments, physical phenomena, agent representations, or interaction patterns. Swarm simulations have been written for such diverse areas as chemistry, economics, physics, anthropology, ecology, and political science. [33] Swarm uses each individual agent as a basic unit, each agent generates events affect itself and other agents, and the simulation of Swarm uses a number of agents interacting with each other.
12
Swarm needs libraries to do the simulation. Swarm libraries serve two major functions. The libraries are a set of classes that model builders can use by direct instantiation. For many objects, especially highly technical ones such as schedule data structures, it's likely that all a user will ever do is use the classes as provided. But in addition, one can use Swarm libraries by subclassing them, specializing particular classes for particular modelling needs. Both modes of using the Swarm libraries are important; Swarm is designed to facilitate both as appropriate. [33] This is also the limitation of the Swarm agent-based modelling.
Section 2.5.2: MASON Multi-Agent Simulations

MASON Stands for Multi-Agent Simulator Of Neighbourhoods... or Networks... or something..., MASON is a fast discrete-event multiagent simulation library core in Java, designed to be the foundation for large custom-purpose Java simulations, and also to provide more than enough functionality for many lightweight simulation needs. MASON contains both a model library and an optional suite of visualization tools in 2D and 3D. MASON is a joint effort between George Mason University's ECLab Evolutionary Computation Laboratory and the GMU Center for Social Complexity, and was designed by Sean Luke, Gabriel Catalin Balan, and Liviu Panait, with help from Claudio Cioffi-Revilla, Sean Paus, Keith Sullivan, Daniel Kuebrich, Joey Harrison, and Ankur Desai. [34] MASON has some special features: Simulations can be serialized to checkpoints (freeze-dried and written to disk), which can be recovered from at any time, even to different Java platforms and new MASON visualization toolkits. MASON can be set up to be guaranteed duplicatable, meaning that the same simulation parameters will produce the same results regardless of platform. Libraries are provided for visualizing in 2D and in 3D (using Java3D), to manipulate the model graphically, to take screenshots, and to generate movies (using Java Media Framework). While the visualization toolkits are fairly large, the core simulation model is intentionally very small, fast, and easy to understand. [34] However, from the description above, MASON uses Java technology to simulation models, as in last chapter, we need to run models on HPCx, but HPCx doesnt support Java, so it is not possible to choose this simulation system for my project. As in last two sections, these two models are not suit for my project as the X-machine framework, you will know why the X-machine framework is the most suitable one for my project in next section.
13
Section 2.5.3: X-machine Framework and XML

Due to the mass usage of agent-based modelling for intracellular interactions, it is necessary to develop a common architecture for the large amount of agents systems. The approach here is a framework based on the X-machine. The framework can standardise the expression of agents in a special way. The X-machine framework uses XML code, through a C coded Xparser, it can be parsed into a runnable C code. There are quite a lot of tools for computational biology modelling research, but for agent examples uses, there is not many, only some framework with inadaptable structure based, which wont suit our models. Also there are some agent-based frameworks already exists but they cant reach the needs for intracellular modelling. Because inside actual cells there are millions of molecules and associated cellular signalling. Due to the huge number of agents the need of a common architecture is essential. With running on a super computer like HPCx as I mentioned in the introduction, it makes the modelling result more accurate. The reason why it can be run on the supercomputers is the definition of agents. The agents are defined as autonomous computing machines that communicate with messages the processing of the agents can be spread across many processors and computers that are connected on a network [8]. The messaging between agents is similar with the message communication with computers, so the messages from the agents can be used in computers. MPI (Message Passing Interface) is a library that allows the creation of programs that can be spread across computers and that communicate with messages and has become the de facto standard for distributed memory parallel processing [8]. So we can use computers to simulate the agents and the messages between those agents. It is possible to define a cell as a system which processes some parallel collections of communication. So we need a good model to define the behaviour of agents running in parallel and sending each other data and process them. The X-machine matches all needs, X-machine is similar with other finite state machines, and it has states, input output alphabet and a unique thing which other state machines dont have memory. With this additional memory, it is then really useful and suitable for agent-based modelling. When the transition between states, they can have memory with them and modify it. We can see the definition of a stream X-machine. The definition of a stream X-machine is an 8-tuple [16]: X = (, , Q, M, , F, q0, m0)
and is the input and output alphabets respectively.

Q is the finite set of states. M is the (possibly) infinite set called memory. , the type of the machine X, is a set of partial functions that map an input and a
14
memory state to an output and possibly different memory state, : x M x M. F is the next state partial function, F: Q x Q, which given a state and a function from the type determines the next state. F is often described as a state transition diagram. qo and mo the initial state and initial memory respectively. From now on the term X-machine refers to a stream X-machine [8]. Because the X-machines can communicate, we can use the Communication X-machine. A Communication X-machine model uses X-machines which can exchange messages. The Communication X-machine model can be defined as the tuple [8]:
((Cix) i = 1..n, R)
where: Cix is the i-th Communicating X-machine in the system, and R is a communication relation between the n X-machines
By different method of defining R, we can get different definition of communicating X-machine. One of the most accepted approaches uses the idea of a communication matrix which acts as the means of communication between X-machines [8]. The communication cells in this approach contain message between X-machines. However, this approach still has disadvantages when using X-machines as agents, especially when there are a lot of agents, the communication matrices will be too large to link each other. Also, the target agent to send message is unclear from the point of an agent, due to the changes of the communication. Agents are restricted to interact with surrounding agents in the communicating X-machine agent-based models, so the distance of massages sending between agents is restricted. In this approach, the communication relation between X-machines R consists of two lists: message list and message type list. In the message list, all the X-machines will understand and able to read the messages. It is really important for the concept of this kind of implementation, it means the actions of each X-machine are based on input messages. If the source of the input message is too far from this X-machine, then the message will be ignored; if the source is at a reasonable distance, it will be processed. Also, this method can be extended, just need to put a tag with some intelligent information on it, e.g. the max. distance for the sending X-machine and possible receiving X-machines. There are a lot of ways of communicating and handling messages. There is a useful one, which is the communication between two agents that are processed on distinct computers in a computer cluster or a grid system. What people are doing now is having
15
a local message list for each computer CPU in the computer cluster. The agent only sends and receives message from the local computer CPU, but there will be a separate calculation to see if any other agent need the message on different computer CPUs. The calculation involves the distance between each agent, by giving each of them an influence boundary, it will be easy to decide if an agent needs the message. XML is used for the implementation architecture of X-machine here. By coding with a XML text file, the X-machine architecture can be defined. This is really easy to use for most people, by using any kind of file editor, they can modify the XML code easily. Also, it is possible to develop a graphical interface to modify the XML, without seeing the implementation directly. It is necessary to build a parser for the XML code which can parse the XML into a runnable C programme to run the X-machine agents with the message list relation. The parser itself is coded in C and it is universal for all XML coded X-machine agents models, we call the parser Xparser. To complete an iteration, another XML text file is needed to define the starting state and details for each agent as an initial point to run the programme. By using these files, it is possible to have certain different runs of the model with different result for research. The representation of the X-machine model can be visualised by using a special coded visualisation programme. The visualisation programme is coded in C as well. By using the visualisation, it gives us a direct view of the models structure and interactions procedures. Also, it is possible to screenshot each frame of the visualisation as a photo file, with a set of screenshots, they can be converted to a video file by using a free software which is called VirtualDub ( see http://www.virtualdub.org/).
16
Chapter 3: Requirements and Analysis

Section 3.1: Objectives and Requirement for the Project
This chapter is a mainly about objectives and requirements about the project. Each of the three models will be discussed in detail. For my project, the aim is developing a simulation of a vital part of the immune system by using framework and tools. Based on the existing framework which is the X-machine, and it was developed by the computational biology research group in our department. It can model biological systems which are involved with my project easily. Each individual agent plays as a role of a molecule or a receptor. Based on agent-based modelling, it can solve thousand of these agents operating and communication with other agents. The objective for my project is, based on the existing two Matlab models, convert them into X-machine models. For both intracellular chemical interaction model and the NF-B signalling pathway model, Mark Pogson has used Matlab to model them and I have already received them. However, for the third model, there is no existing Matlab model for combined two pathways. So this is something challenging and needed to be fully tested to see if this works properly in X-machine framework. Clearly, for requirements, the first thing is to understand all the Matlab models in detail, and then I need to sort out the architecture and the method of X-machine modelling. Also, I need to understand how to use the Xparser developed by Simon Coakley. Then, I can make my start: after fully understanding the Matlab model, I need to convert them into X-machine model, which represented by a XML file. Then I need to create an initial state file called 0.xml (based on XML as well) to give the model initial starting agents details, because the Matlab can generate initial agents at every run starting point, but in X-machine, I need to create myself. Then, use Xparser to parse the XML into C. if there is no problem with compiling, then it is possible to get an .exe runnable programme file. Use the programme, assign a iteration number and point the 0.xml initial state file, all the process will be done and I can get a XML file for each iteration. Simon Coakley also has developed a visualisation programme specialised for the X-machine model. With that programme, it can give us a direct view of the model in 3D pictures. After the conversion of the two Matlab models into X-machine model, then it is possible to start the third model. By defining each molecule as an agent, set of binding rules for each new kind of molecules and set of moving rules for them, this model will be made up.
17
As in Figure 3.1, it is possible to start with two individual models for NF-B and MAPK pathways, then put them together into a single model. However, there is an important thing: the state numbers for each pathways molecules should be unique, then it wont clash when they are combined together. Also, the cross-talk between NF-B pathway and MAPK pathway is necessary to be shown in the model, if there is available detailed data for that. I will discuss more about the combination model in a following section and chapter.
NF-B
Mix Combined Mode
MAPK
Cross-Talk
Figure 3.1 Process of combination
Section 3.2: Analysis for Intracellular Chemical Interaction Model

Section 3.2.1: Importance and User Requirements
This model is a very basic and simple model, but everything is from the basic to complex. Many aspects of life involve the interaction of multiple components and subunits and the corresponding emergence of both form and function. This is true whether we are dealing with molecules within an individual cell, cells within tissue, organs within an organism or organisms within a community or ecology. [28] By sorting out how each molecule interaction with another kind, it is possible to build a large and complex model with a number of different kinds molecules or pathways. The key feature for agent-based modelling is model each molecule as an agent, from the Figure 3.2, (a) Reaction kinetics differential equations treat reacting chemicals as well mixed and uniform; (b) Agent-based approach models each individual molecule [28].
18
Figure 3.2 Chemical reactions [28] The agent-based models have greater scope than the reaction kinetics differential equation models, but they need to define a lot more details than the latter one. For example, the movement of a single molecule is needed to be defined, also the binding rules of A molecules to B molecules as well. Incorrect data may course a big difference in result. Agent-based models have to agree with reaction kinetics differential equation models. Because when the agent-based model has large number of molecules and they are mixed well, reaction kinetics differential equation models can be applied. However, there are not many information about individual molecular interactions, so it is necessary to gain some data from reaction kinetics for agent-based model.
Section 3.2.2: Conversion from Matlab

During conversion, there is a big change need to be defined first the state of each molecule. X-machine is a special kind of state machines, so when modelling intracellular actions, each of the molecules is an X-machine, and each of them has a state. So I need to sort out each kind of molecules possible state. The intracellular chemical interaction model only has two kinds of molecule initially, so the states are easy to be defined. Two states for molecule A: free and bond with molecule B, one state for molecule B: free. From the perspective of A, it receive message from molecule B and decide bond or not. After bound with B, they changed to a third kind of molecule, at this time, when we marking the state, we can let molecule B disappear and molecule A changes to the state bond with B it is actually molecule C now, but for easier to compute and display. Also, the requirement for a bind is important as well. Normally the interaction boundary depends on the radius of the molecule. It is necessary to define the radius and interaction boundary for each kind of molecules.
19
Section 3.2.3: Concentrations Rates

There is a good way to check if the result is correct or not, just calculate the number of molecules, for each bond, the molecule A and molecule B will decrease one unit, and molecule C will then increase one unit, this should happen in the same time step, look back to Figure 2.3, you can see the concentration changes easily. And the model will be built based on these. The evaluation for this model will be easy as well if the concentration change in molecule A with a time step t is a, for molecule B is b, the interaction is between molecule A and molecule B and produces molecule C, so the a = b. from Figure 3.3 [6]:
Figure 3.3: concentration of molecule A, B and C against time [6]
Section 3.3: Analysis for the NF-B Signalling Pathway Model

Section 3.3.1: Importance and User Requirements
As in last chapter, we know that NF-B signalling pathway is vital to immune response regulation. Alterations in pathway regulation underlie many diseases, including atherosclerosis and arthritis. The modelling of individual molecules, receptors and genes provides a more comprehensive outline of regulatory network mechanisms than previously possible with equation-based approaches. [28] For this model, all the data is from single cell experimental analysis by the Academic Unit of Cell Biology, Division of Genomic Medicine in the University of Sheffield. For a user using this model, he/she will be able to change and alter each kind of molecules moving speed, radius and initial quantity (concentration). Another thing is user should be able to define the colour for each kind of molecules. That means even the data from the Matlab code is not correct, but as soon as the experiment finished,
20
user is able to correct the model and each kind of molecules is independent to another kind change ones detail wont affect others but will get correct result. NF-B interact with IB should follow the interaction requirement as described in last section, NF-B can be seen as molecule A in last model, IB can be seen as molecule B, so when they bound it will be NF-B & IB, can be seen as molecule C. So the concentration change should follow the Figure 3.4, but there are lots of other kinds of molecule involved, the situation will be a lot more complex.
Section 3.3.2: Conversion from Matlab

From the detail in the Matlab code, it is possible to know that: Activation of the NF-B pathway if controlled by inhibitors of NF-B (IB) proteins, which sequester the majority of NF-B in the cytoplasm as complexes by masking their nuclear localisation signals. During activation, IB is phosphorylated by IB kinases (IKK), causing its degradation. The newly freed NF-B is consequently transported into the nucleus, inducing inflammatory genes, including those encoding IB, thus regulating the pathway through negative feedback.[28][29][30][31] Also from the Matlab code, there are NF-B, IB, IB, IB, Nuclear Importing Receptors and Nuclear Exporting Receptors modelled as agents. The conversion from Matlab is complicated. Each kind of molecules has a set of states. However, the number of IB and IB in the real cell is tiny, from the suggestion of Mark Pogson, it is not necessary to include these two molecules into the model. Then we can have a look the possible state for each kind of molecules: For the NF-B molecule, it is most complicated one in this model, see Figure 3.4 on next page for the possible states and transition of a NF-B. As you can see only one molecule will have those states: bound and unbound with different molecules in cytoplasm and nuclear, also states for free in cytoplasm and nuclear, bound and unbound with importing and exporting receptor. In more detail, NF-B should have a state bound with IB in cytoplasm; a state of free in cytoplasm; a state of bound with nuclear importing receptors; a state of free in nucleus; a state of bound with IB in nucleus; a state of bound with nuclear exporting receptors, a state of bound with IB then bound with nuclear importing receptors and a state of bound with IB then bound with nuclear exporting receptors.
For IB the possible states are: free in cytoplasm, bound with nuclear importing receptor, free in nucleus and bound with exporting receptor. IB is a lot simpler than the NF-B
molecule. For both kinds of nuclear receptors, there are two states: dormant and active. When active, that means something bound with it; when dormant, that means it is free and ready to bind with other kinds of molecules.
21
Figure 3.4: Possible states and transition of an NF-B [5]
Section 3.4: Analysis for the NF-B & MAP Kinase Signalling Pathway Combined Model
This model involves two pathways: NF-B signalling pathway and MAP Kinase Signalling pathway. NF-B pathway has already done in the second model, so the tasks are build the MAP kinase pathway separately and then combine them together. As in the Figure 3.5 (next page), it is possible to simplify the model from the Figure 2.6. Ras, SOS and GRb2 molecules can be seen as a single kind, this can be treated as NF-B in the last model. Active-Ras can be treated as IB. However, both of them cant go inside of nuclear. After they bound, will produce a molecule called MAPK, instead of Raf (MAP KKK), MEK1/2 (MAP KK) and ERK1/2 (MAP K). Raf (MAP KKK), MEK1/2 (MAP KK) and ERK1/2 (MAP K) is a degradation process, so they can be treated as one kind MAPK. MAPK is the only one goes inside nuclear and
22
then it will switch on gene. Same with NF-B and IB, the interaction of Ras_SOS_Grb2, Active-Ras and MAPK, should follow the concentration change as in Figure 3.4. Also the change of NF-B and IB should not be affected in this model, this is the way for evaluation. The cross-talk between these two pathway has not yet been discovered fully. The only thing we know now is a molecule called NIK, it is the important part of cross-talk. The next chapter is design, I will talk about the design of each model in detail.
Ras_SOS_GRb2
Active-Ras
Bound
Raf (MAP KKK)
MAPK
MEK1/2 (MAP KK)
ERK1/2 (MAP K) Nuclear Membrane
Ap-1
Figure 3.5 Simplify of the MAP Kinase pathway
23
Chapter 4: Design
Chapter 4: Design
Section 4.1: Associated Language with the Project
There are three programming languages associated with my project -- XML, Matlab and C. It is necessary to get familiar with these languages before the design of the models.
Section 4.1.1: XML

Firstly, lets have a look at XML. XML, also known as Extensible Markup Language, similar with our familiar language HTML (Hypertext Markup Language), they are all derived from SGML (Standard for General Markup Language). XML is a simple but very flexible language. XML was actually designed for the challenge of large-scale electronic publishing [24]. Also, people are now using XML on exchanging data between the Web and other devices. E.g. the RSS (Really Simple Syndication) feed service, by providing a common format text file in XML, let the users receive most up-to-date information such as news, weather and so on. Compared with HTML, XML are very flexible. Because the tags in HTML are predefined; but in XML, you can define the tags by yourself. With your own-tags compatible XML parser or reader, they can archive a goal with great efficiency.
Section 4.1.2: Matlab

Secondly, we turn to Matlab. Matlab is an interactive mathematical environment and high-level technical computing language, originally based on the FORTRAN packages LINPACK and EISPACK, but now based on LAPACK and BLAS [20]. Matlab is a really useful tool for mathematical modelling; it also has a lot of features [21]:

High-level language for technical computing Development environment for managing code, files, and data Interactive tools for iterative exploration, design, and problem solving Mathematical functions for linear algebra, statistics, Fourier analysis, filtering, optimization, and numerical integration 2-D and 3-D graphics functions for visualizing data Tools for building custom graphical user interfaces Functions for integrating MATLAB based algorithms with external applications and languages, such as C, C++, Fortran, Java, COM, and Microsoft Excel
With these features, Matlab is really a powerful tool for computing and mathematical studies. However, compared with XML architecture, it is not so suitable for agent-based modelling when handling the agents and the communication relation messages. Another reason is the HPCx super computer does not support Matlab, in
24
Chapter 4: Design
order to get the super computer running in parallel with different agent on different CPU, so it is necessary to convert the existing Matlab coded models into X-machine models
Section 4.1.3: C
Lastly, we focus on C programming language. There is a book called The C Programming Language by Brian Kernighan and Dennis Ritchie, give us an informal specification on C and some history information about C. The C programming language is a standardized imperative computer programming language developed in the early 1970s by Ken Thompson and Dennis Ritchie for use on the UNIX operating system. It has since spread to many other operating systems, and is one of the most widely used programming languages. C is prized for its efficiency, and is the most popular programming language for writing system software, though it is also used for writing applications. It is also commonly used in computer science education, despite not being designed for novices. [22][23] C is a language which operates very close to the hardware, also C is most similar with assembly language rather than other high-level languages. So C makes it easier for programmers to control what the programme is doing. That results in more efficiency than other languages. C also can archive lots of features than other languages, because C accepts most of the compilers, libraries, and interpreters. That is why the Xparser uses C as well as visualisation programme for X-machine models. Also, as mentioned above, the HPCx super computer has no problem to run C, so C is the best choice for the post-parsing programming language of X-machine models.
Now we know all three languages, it is a good preparation of the design stage.
Section 4.2: Overall Design

Section 4.2.1: X-machine Frameworks Architecture
X-machine framework is a specialised framework for modelling biologic and other areas models based on individual agents. The architecture now is using .xml text file to define the data for each individual agents. From the last section, XML is transferable description language. It is easy to build an .xml file by using text editors (low level programming) to write directly or using a GUI tool (high level implementation) to construct it.
25
Chapter 4: Design
Also, there is another important .xml file which defines all the interaction rules, sending and receiving messages, movements and variables etc. With a parser called Xparser, the .xml file could be parsed into a C code file. Then use a compiler, it will be an executable programme. The programme can run X-machine agents and implements the global message list communication relation [8]. By using the programme from above and supporting an initial .xml text file which holds all the states and other information of every agent, the model will start. Each iteration of the programme generates an .xml text file, holds all the changes of the states and other information such as location, speed etc. After a number of iterations (can be defined when programme start), there will be a set of .xml text files. Please note that one iteration is 0.5 second, so 2 iterations are 1 second. Now using these files is a great pleasure: you could use a specialised visualisation tool to get the display of the model; you could use a getdata tool to get needed information to generate a graph with specified x-axis and y-axis. In next chapter -- implementation and testing, this will be introduced in detail.
Section 4.2.2: Main XML File Structure

The main XML file is the soul of the model. Even tough when we visualise the model and get the data of the model, we wont need this main XML model file, but without this file or this file is incorrect, the model wont work or wont work properly. The main file structure is not simple (please see the Figure 4.1), the highest structure of the model main file consists of three parts, which are defined states, X-machine, Messages. Defined states part is actually comments of all the states for each molecule, which help users understand. Messages show each kind of messages and contents.
Model main file
Defined States
X-machine
Messages
Figure 4.1 Structure of the Main file (a)
26
Chapter 4: Design
The most complicated part is X-machine, it consists of three sub-parts as well, but they are: Memory, States and Functions. (Figure 4.2) Memory part is actually for variables, user can define all the global variables in this part, with special tags, and it is quite simple to define them. States part contains three states normally: input, output and move, which are linked with the functions in Function part. This part normally doesnt need to be changed for most of the model. Functions part is the core part in this file. It is the most complicated sub-part. It controls each agents behaviour. Outputdata function is for outputting messages which contains location, state and bond information. Inputdata function is for get message from other agents and process them then with appropriate reactions. Movements function is the function which controls the movement and locations of agents. It also draws the boundary of the model structure. For different cases of models, there are might be some other necessary functions act in this sub-part.
X-machine
Memory
States
Functions
Outputdata
Inputdata
Movements
Figure 4.2 Structure of the Main file (b)
27
Chapter 4: Design
Section 4.2.3: Iteration XML File Structure

Iteration files are also important for this framework; they have to be simple and uniform easier for reading and processing them. Most commonly, these files will start with a iteration number tag to show which iteration this file is and then the following part is for each agent in this model. This will include all the agents at this iteration time and the detailed data for each of the agent. Different model should have different type of data for the agents.
Section 4.3: Design of Chemical Interaction Model

Chemical interaction model is the simplest model in my project. It follows that A+B C. Only three kinds of molecules are involved. So the structure of the model and xml files is simple as well. There is already a Matlab model exists, and I could use the information of the molecules inside and interaction rules for the X-machine model. It is actually a conversion for this model. In Matlab, it is possible to generate numbers of molecules data randomly when the programme starts; also Matlab is a good tool to plot the graph of concentration against time for the model. But in X-machine model, these functions are needed extra tool to do it. So the design of the main file and iteration file should be quite simple. As in the chapter three, the first thing need to do for conversion is clarify the states for each kind of the molecules. For molecule A, there are two states: 0 free in box, 1 bound with B (this is actually appears as molecule C). For molecule B, there is only one state: 100 free in box. When molecule B bound with molecule A, it should be treated as disappeared. So there is no need to have a state for B which says it bound with molecule A. This models shape is inside a box, but according to the Matlab code, there are two coordination methods needed -- Cartesian and Polar coordinates. Cartesian coordinates mostly used in this model as location purpose, polar coordinates used as motion and movement purpose. So by using the Cartesian coordinates, it is possible to draw the boundary of the box and limit each molecule stay inside this box by reversing the movement in polar coordinates when they hit the edge of the box. The Memory part has to contain all the global variables for supporting the coordination systems described above. Also it has to contain other necessary variables such as state number, molecule radius and so on. The states part is simple, only have three states to be set: output, input and move as described in section 4.2.1.
28
Chapter 4: Design
Then the most challenge part is the functions part. Four functions are essential: outputdata, inputdata, checkbondtries and move. The best way for binding is from the perspective of each A molecule to look for bind. By processing the location messages from other B molecules, the A molecule will choose the best one and then bond with it. During binding process, bond message involved as well. The move function will make sure all the molecules freely moving around inside the box. Messages part contains two kinds of messages. First one is location message, which contains each molecules state, Cartesian coordinates and id number. Second one is bond message: it has the information of senders id and state, receivers state, bondunbond tag and a distance value. The iteration files have the same structure as described in section 4.2.2. The detailed implementation of this model will be appeared in the next chapter implementation and testing.
Section 4.4: Design of NF-B Signalling Pathway Model

The NF-B signalling pathway model is a complicated model compare with last one, it involves four kinds of molecules and tens of different states. The molecules are NF-B, IB (IB and IB are ignored because of their concentration is low), nuclear importing receptors and nuclear exporting receptors. NF-B can bind with IB, nuclear importing receptors and nuclear exporting receptors. Also after it bound with IB, the NF-B& IB is possible to bound with nuclear importing and exporting receptors as well. So the design of state numbers for NF-B are: 0 - free in cytoplasm, 1 - bound to IB in cytoplasm, 2 - bound to nuclear importing receptors, 3 - bound to IB and then bound to nuclear importing receptors, 4 - bound to nuclear exporting receptors, 5 - bound to IB and then nuclear exporting receptors, 6 - free in nuclear and 7 - bound to IkBa in nuclear. Same with chemical interaction model, IB acts similar with the molecule B in that model. When IB bound with NF-B, it is not necessary to display it, so the solution is eliminate it. However, for the situation that IB bound with nuclear importing and exporting receptors is different. The state numbers have to be unique so the design of state numbers for IB are: 10 - free in cytoplasm, 11 - bound to nuclear importing receptors, 12 - bound to nuclear exporting receptors and 13 - free in nuclear. For nuclear importing and exporting receptors, the states are easy. They dont need to worry which kind is bound with them, because the state numbers for the above two kinds of molecules have indicated the type of bind. So the only thing they need to make themselves clear is if they are busy or not. The design of state numbers for
29
Chapter 4: Design
nuclear importing receptors is: 20 dormant and 21 active. For nuclear exporting receptors is same but different number: 30 dormant and 31 active. Now the design of states numbers has done, the next part is memory part. The only thing in memory part is definition of variables. As last model, the shape was a box, but this model is a shape of a cell. The structure and boundaries are more complex. However, the coordination systems are still the same with last model Cartesian and polar coordination systems. Also same purpose for each system: Cartesian coordinates take care of locations, polar coordinates control the movement of molecules. The boundaries are drawn by the molecules, with the co-operation of both coordination systems, nuclear receptors will lay on the nuclear membrane and other molecules will be moving in the region they should be e.g. inside cytoplasm or nuclear. If any if the molecule is about to across the boundary, it is possible to reverse the movement and pull them back. The states part would be exactly the same with last model. However, the functions parts will not. Because this model involves nuclear receptors, some new sets of rules are necessary to appear inside this part. Another thing needs to be noticed is, for each bind with nuclear receptors, there should be a delay before they are unbind and be released into a new region. For example, NF-B bound with nuclear importing receptors, after a while, its state should be changed as a NF-B free moving inside nuclear. The last function move, it takes the responsibility of drawing the boundary, it should let nuclear receptors move on the nuclear membrane only and control other molecules moving in the right regions. Messages part is same with last model as well, which contains location message and bond message. Also, location message contains each molecules state, Cartesian coordinates and id number; bond message contains the information of senders id and state, receivers state, bondunbond tag and a distance value. Iteration files are same structure but more kinds of molecules are inside them now. In the next chapter, I will follow the design and talk about implementation in detail for NF-B signalling pathway model.
Section 4.5: Design of NF-B & MAP Kinase Signalling Pathway Combined Model
This models design of structure is almost the same with the NF-B model, but there is a new pathway added in MAP kinase pathway. Because there is no existing Matlab model for this one, so the relationship and cross-talk between these two pathways is
30
Chapter 4: Design
important. However, according to the Academic Unit of Cell Biology, Division of Genomic Medicine in the University of Sheffield the only relationship of these two pathways is in the Figure 4.3, they havent sorted the exact cross-talk between NF-B and MAP kinase pathways. So the design for this model is actually add a new pathway into the last model. From the Figure 4.3, it means molecules from outside of cell throw the toll receptor, some of the molecule will go inside of NF-B pathway, and others will go MAP kinase with a probability. But in the model it actually models intracellular behaviours, so the initial molecules are assigned in the first iteration file.
Figure 4.3 NF-B & MAP Kinase Signalling Pathway Relation [32] There are some more states needed to add in, from the Figure 3.4 in last chapter, it is possible to treat Ras, SOS and Grb2 as a single molecule, it acts similar with molecule A in the first model, and Active-Ras acts similar with molecule B in the first model.
31
Chapter 4: Design
After they bound together, they will produce MAPK, the degradation process is deleted and MAP KKK, MAP KK and MAP K can be treat as one molecule: MAPK, this is similar with molecule C in the first model, but it can be bound with nuclear importing receptors and go inside nuclear. So the possible states of the Ras_SOS_Grb2 are: 40 - free in cytoplasm, 41 - bound with Active-Ras => MAP K, 42 - bound with Acitve-Ras & nuclear importing receptors => Ap-1 bound with nuclear importing receptors, 43 Ap-1 free in nuclear. Possible state for Active-Ras is simple: 50 - free in cytoplasm. All the other molecules states stay exactly the same with last model. Because of the new pathways added in, the other parts have to be added in some new sets of rules as well. I will talk about these in detail in the next chapter -- implementation and testing.
32
Chapter 5: Implementation and Testing

This chapter is about the detailed method of implementation of the three models in my project. After the models are finished, it is important to test the models and evaluate them, so the testing method of the models will also be mentioned in this chapter.
Section 5.1: Implementation of Three Models

This section is the implementation of the models in my project, followed the design from last chapter, all the three models will be explained well of the implementation process. The first two models are actually converted from two Matlab models, but because of the difference between Matlab and X-machine framework, the best way to convert is get the ideas and algorithms from the Matlab and then write directly in XML with XML specification for X-machine framework.
Section 5.1.1: Implementation of Chemical Interaction Model

In Matlab, the model of chemical interaction works as: firstly, it defines some constants needed, such as time step, box length, speed range etc; then, it generates initial molecules positions and plot immediately; thirdly, it creates initial directions vectors; fourthly, it uses a loop from the perspective of each molecule A to look for a suitable molecule B to bind; fifthly, it controls the moving of each molecule and keep them inside the interaction model box; lastly, it draws a graph which shows concentration against time. As in last chapter, X-machine doesnt have its own initial value generation tool and graph drawing tool, so these need special external tools to help, but it is not difficult to archive them. This model started with the main .xml file. In last chapter, the states for each molecule are defined, and then we need to define constants and variables. Box length can be defined as a constant 3000 (in meter e-10) and used later. Variables are id number and state number as integers; doubles are x, y and z for Cartesian coordinates, postheta, posphi and posr for polar coordinates, movetheta, movephi and mover for movements in Cartesian coordinates and iradius for the radius of molecules. Then the states part, three states are defined: output, input and move. For output state, it has association with Outputdata function and pointing to input state. For input state, it is associated with Inputdata function and has the destination move state. For move state, it linked with Move function and with the output as next state. So the three states are actually linked together as a closed ring shape (see Figure 5.1):
33
Output (initial)
Input
Move
Figure 5.1 states and relations in X-machine The next part is functions part. From the design chapter, this is the most complicated part, and all the code is written between xml tags is actually C code. Mostly if else, while, and other simple C functions. The first function is Outputdata. This function is for sending out location message, with a method called add_location_message. It sends out the id, state, x, y and z of the molecules information to other molecule to decide. This function is quite short. The second function is Inputdata. In this function, it defines some local variables first for processing the location message. With a while loop, it gets all the location messages and process one by one for each molecule A. For each message, it first checks to see if it comes from the molecule is referring to (from itself), and it gets rid of messages from other molecule A. Also if the un-squared distance is less than the molecules radius squared (radius2). Then all the requirements are matched and a bond message can be sent. With information of the source molecules id, state and destinations molecules state, the distance of them and an integer 3 for bindunbind tag (3 means a try for a bond). The third function is called checkbondtries. This function is for processing the bond message with 3 on bindunbind tag, and decides if it is necessary to make the bond. Once the id of both associated molecule is checked, a new bond message will be sent, it has all the same information but with a 0 on bindunbind tag (0 means a bind tag) and 0.0 for distance distance is therefore not useful after they bind. Also the molecule B will be freed in memory (disappear) by a code return 1 in an if else statement. The fourth function is also the last function in this model -- move. This function processes the bond message with a 0 on bindunbind tag, and makes the bond. That
34
means the molecules state will be changed here as well (from 0 to 1 in this model). Then the following bit of this function is for controlling the movement of molecules, by using the Cartesian coordinates for location purpose, polar coordinates for movement, all the molecules will be restrict inside the box. And the movement is followed Brownian motion freely moving inside the box within defined speed range. The last part is messages part. As in design chapter, two kinds of messages are defined: location and bond message. That is all for the main .xml file. Now we need to make an initial iteration .xml file and visualisation programme. The initial iteration file creating programme and visualisation programme are part of the X-machine framework, so Simon Coakley has already done some examples only need to change them suit this model. The create initial iteration programme is written in C. Firstly, some variables defined, which are molecule initial numbers and moving speeds. The main part is some for loops, each for loop is used for a type of molecule, and it generates assigned number of molecules with random coordinates and speeds. Visualisation programme is written in C as well, it uses some openGL libraries. It reads each of the iteration files and displays each type of molecule in different colour, with the iteration file change. It is possible to display the molecules as moving objects. It also has function of rotation, save each iteration display as an image etc. There are some images of the model visualisation Figure 5.2:
Figure 5.2 Visualisation of Chemical Interaction Model

35
Section 5.1.2: Implementation of the NF-B Signalling Pathway Model

The Matlab model of NF-B signalling pathway model is similar with chemical interaction model, but more complex and involves a lot more molecules and receptors. The model works as: firstly, it also defines some constants needed; then, it assigns initial positions in spherical polar coordinates and also converts into Cartesian coordinates; thirdly, it uses a big while loop to do interactions, with defined sets of rules; lastly, the results will be drawn on a graph which shows concentration against time. As state number have been defined, it is necessary to define some constants and variables. The only constant defined here is receptor delay with a value 10. Variables are exactly the same with chemical interaction model: id number, state number, x, y, z, postheta, posphi, posr, movetheta, movephi and iradius. The states part is the same with last model, also three states are defined: output, input and move. The relationship of them are followed the Figure 5.1. From the functions part, it is easily to see the difference between NF-B pathway model and chemical interaction model. Same this last model, four functions in this part: Outputdata, Inputdata, Checkbondtries and Move. The first function is Outputdata. This time the function is not simple as last one. There are two sub-parts in this function. The first one is for sending out location message, which contains the id, state, x, y and z of the molecules information to other molecule to decide. The second sub-part is for nuclear receptors. By checking the states, if the molecule is an active nuclear receptor, it will decrease receptor delay counter. Once the receptor delay counter is changed to zero, it will send a bond message out to unbind the molecule which was binding with it. The bond message will contain both molecules state and with a 1 on bindunbind tag (1 corresponds to an unbind tag). Then the active nuclear receptor will release the molecule which was bound with it and the receptors state will be changed to dormant. The second function is Inputdata. By start with setting some local variables for function use and then check the bond message one by one with a while loop. If the bond message has a bindunbind tag 1, the associate molecule will be changed state into appropriate region. There are six situations:
1. 2. 3. 4. 5. 6. If NF-B bound to importing nuclear receptor then make free in nuclear; If NF-B bound to exporting nuclear receptor then make free in cytoplasm; If NF-B & IB bound to importing nuclear receptor then make free in nuclear; If NF-B & IB bound to exporting nuclear receptor then make free in cytoplasm; If IB bound to importing nuclear receptor then make free in nuclear; If IB bound to exporting nuclear receptor then make free in cytoplasm.
36
Then this function will get location message for each molecule. Firstly check if the location message was sent from the molecule itself, if it is not, then check the distance between the molecule and message sender. If the distance is less than radius2, then it will check if the states of them match any of the four situations (s: sender, r: receiver of the location message):
1. 2. 3. 4. r: NF-B free in cytoplasm and s: IB free in cytoplasm; r: NF-B free in nuclear and s: IB free in nuclear; r: dormant nuclear importing receptor and s: (NF-B free in cytoplasm, IB free in cytoplasm or NF-B & IB free in cytoplasm); r: dormant nuclear exporting receptor and s: (NF-B free in nuclear, IB free in nuclear or NF-B & IB free in nuclear).
If any of the four situations matched, a bond message with a bindunbind tag 3 a try for bond will be sent. The third function is Checkbondtries. This function only processes the bond message with 3 on bindunbind tag. If it gets a message like that, then it will check if the sender is closest in distance, if it is, then it will be bound with each other. Firstly it will send a bond message with 0 on bindunbind tag means bind. And then change the state accordingly:
If NF-B free in cytoplasm and bound by importing nuclear receptor then make NF-B bound to importing nuclear receptor; If NF-B free in nucleus and bound by exporting nuclear receptor then make NF-B bound to exporting nuclear receptor; If NF-B & IB in cytoplasm and bound by importing nuclear receptor then make NF-B & IB bound to importing nuclear receptor; If NF-B & IB in nucleus and bound by exporting nuclear receptor then make NF-B & IB bound to exporting nuclear receptor; If IB free in cytoplasm and bound by importing nuclear receptor then make IBa bound to importing nuclear receptor; If IB free in nucleus and bound by exporting nuclear receptor then make IB bound to exporting nuclear receptor; If IB free in cytoplasm and bound by NF-B then free IB; If IB free in nucleus and bound by NF-B then free IB.
The last function is Move. It firstly defined the cell and nuclear radius, then the speed and speed range for all the molecules. Then it starts processing bond message with a bindunbind tag 0. Then it will change the state accordingly as in the last function, but here has some different ones added in:
From point of free cytoplasmic NF-B bond with cytoplasmic IB;
37
From point of free nuclear NF-B bond with nuclear IB; If dormant importing receptor, start receptor delay and change state to active importing receptor; If dormant exporting receptor, start receptor delay and change state to active exporting receptor.
Also two rules are removed which are:

If IBa free in cytoplasm and bound by NF-B then free IB; If IBa free in nucleus and bound by NF-B then free IB.
The next part is for movements. Firstly if the molecule is a nuclear receptor, it will move on the nuclear membrane (with restrict distance from the centre of the model and set angle speed). Other kinds of molecules will be restricted in the right region and once they are moving out the boundary, the movement will be reversed to pull them back in. The shape of the model is based on these molecules. The creating initial iteration programme and visualisation programme have not much difference with last models. Only a little alteration is needed to create new molecules and display them. There is a image of the NF-B pathway model (yellow - NF-B, blue - IB, green - NF-B & IB, pink importing receptor dormant, orange exporting receptor dormant, red exporting receptor active, brown importing receptor active):
Figure 5.3 Visualisation of NF-B signalling pathway model
38
Section 5.1.3: Implementation of NF-B & MAP Kinase Signal Pathway Combined Model
The implementation of this combined model wasnt too difficult. With new unique state numbers defined for new molecules, and all the parts keep the same except the functions part. For functions part, each function is only needed to add some new rules inside and for outputdata function, there is no change made. In the inputdata function, there is one more rule added for processing the bond message:
If MAPK bound to importing receptor, then make it free in nuclear (as Ap-1)
Two rules added for processing location message:

r: Ras_SOS_Grb2 free in cytoplasm and s: Active-Ras in cytoplasm; r: nuclear importing receptor dormant and s: MAPK free in cytoplasm
In the checkbondtries function, two rules added for bond message processing:
If MAPK free in cytoplasm and bound by importing nuclear receptor then make MAPK (Ap-1) bound to importing nuclear receptor; If Active-Ras free in cytoplasm and bound by Ras_SOS_Grb2 then free Active-Ras.
Also one rule added in move function part and associative movement for new molecules:
From point of free cytoplasmic Ras_SOS_Grb2 bond with cytoplasmic Active-Ras.
The create initial iteration and visualisations changing is similar with the 1st model to the 2nd one (Cyan Ras_SOS_Grb2, Gray Active-Ras, Rose colour - MAPK):
Figure 5.4 Visualisation of NF-B MAP kinase combined model

39
Section 5.2: Testing Method

Once the models have been finished, and then by using the Xparser parse the main .xml file into a C code. If there is not problem with compile, then there will be a runnable programme. By assign the iteration number and the initial iteration file to the programme, the model will run. After it finishes, there will be a set of iteration files. By testing these iteration files, it is possible to test the model see if it works fine.
Section 5.2.1: Unix Tool for Single Iteration Testing

There is a Unix tool pack for win32 system, inside the pack, under \usr\local\wbin directory, there is a useful programme called grep.exe. Grep is a search command, with some appropriate regular expression or other commands, it is powerful to search the .xml iteration txt file. For this project, the most appropriate command line to get the states of the molecules and the number of it from the iteration files is: grep "<state>" %1 | sort | uniq -c with this command, it will get the line started with <state>, and calculation the times that each of the state appeared. For example, the testing of the initial file for NF-B pathway model: grep "<state>" its\0.xml | sort | uniq -c 500 <state>0</state> 500 <state>10</state> 500 <state>20</state> 500 <state>30</state> After 5 iterations: grep "<state>" its\5.xml 486 <state>0</state> 11 <state>1</state> 482 <state>10</state> 7 <state>11</state> 3 <state>2</state> 490 <state>20</state> 10 <state>21</state> 500 <state>30</state>
| sort
| uniq -c
This makes very easy and precise to see if the model is correct or not. But if there are thousands of iterations, it is necessary to use some other tools to collect the information inside the set of iteration files.
Section 5.2.2: Getdata Programme for Whole Iteration Files Testing

Getdata programme was written by Simon Coakley, the programme is a part of the
40
framework. Getdata is a C programme. Similar with grep, it also collects information between <state></state> in each of the iteration file. But it does all automatically, it will read from the initial iteration file until the last one, during reading the files, it stores the information in a file called data.csv, all the information store inside as a matrix. Each row is a iteration file, the first column is the number of iteration, the following columns are for states, one state one column. Then only need to use the data.csv file to plot the result. There are two ways to plot the results, either by using Microsoft Excel or GNUPlot. GNUPlot is the approach here. GNUPlot is a powerful script based plotting tool, with appropriate script, it is possible to generate very professional graphs with comment and labels. The Figure 5.5 is the plot for NF-B MAP kinase combined model with concentration against 4000 iterations graph.
Figure 5.5 Concentration against Iterations (time steps) graph By using the two ways above, it is possible to get the result to evaluate the model. The detailed results, evaluation methods and discussions will be stated in the next chapter Result and Discussion.
41
Chapter 6: Results and Discussion

As introduced in last chapter, there are two testing methods, Unix tool testing method and Getdata C programme testing. By using these two methods, it is possible to evaluate the behaviour of the three models and also to see if the models have been built correctly. By the unique specification of the three models, the following three sections are for each of the models with results and some discussion about it.
Section 6.1: Results and Discussion of Chemical Interaction Model

Firstly, the test is going to be carried out with the initial value of 100 A molecules and 100 B molecules, 300 iterations. With the Unix tool: grep "<state>" its\0.xml | sort | uniq -c 100 <state>0</state> 100 <state>100</state> grep "<state>" its\1.xml | sort | uniq -c 99 <state>0</state> 1 <state>1</state> 99 <state>100</state> grep "<state>" its\50.xml | sort | uniq -c 86 <state>0</state> 14 <state>1</state> 86 <state>100</state> grep "<state>" its\100.xml | sort | uniq -c 75 <state>0</state> 25 <state>1</state> 75 <state>100</state> grep "<state>" its\300.xml | sort | uniq -c 53 <state>0</state> 47 <state>1</state> 53 <state>100</state> From above, we can see (0 A, 100 B, 1 C), the initial file has 100 A molecules and 100 B molecules. Then after 1 time step (iteration), there are 99 A molecules left and 99 B molecules left, also 1 new C molecule came out. As the formula A + B C, the concentration should follow that 1 A molecule interacts with 1 B molecule and get 1 C molecule, the concentration change should follow Figure 3.4 in requirement chapter. We can easily see that the result after 1 time step is correct. After 50 iterations, 86 A molecules and 86 B molecules left, 14 C molecules came out. 100 86 = 14, that means the model is correct. After 100 time steps: 100 75 = 25, correct. The last iteration for this testing is after 300 time steps (iterations): 100 53 = 47, correct. The random checks of the iteration for this model are all correct, but it is necessary to see
42
the graph with concentration against iterations (time steps). With the Getdata programme, we can get the Figure 6.1:
Figure 6.1 Chemical interaction agent model graph one The first look of the graph seems it has problem, three kinds of molecules should have three lines, but there is only two lines and the red line (molecule A) seems missing. Actually this simple models changing rate for molecule A and molecule B is exactly the same, so we can only see one line from both of them. To avoid this, the solution is give different initial value for the concentration of molecule A and molecule B. However, we still can see from the graphs concentration change, the model is correct. This time the initial value is 500 A molecules and 300 B molecules. Lets see some values from Unix tool: grep "<state>" its\0.xml | sort | uniq -c 500 <state>0</state> 300 <state>100</state> grep "<state>" its\300.xml | sort | uniq -c 222 <state>0</state> 278 <state>1</state> 22 <state>100</state> As we can see the initial iteration file has 500 molecule A and 300 molecule B, after 300 time steps, 222 molecule A left, 22 molecule B left and 278 molecule C came out. Lets do some calculations: for molecule A, 500 222 = 278; molecule B, 300 22 =
43
278; molecule C, 278. The difference of initial time step to the 300 time steps changing rate is same, which means the model satisfies the requirement. The graph from the Getdata programme is different this time, see Figure 6.2:
Figure 6.2 Chemical interaction agent model graph two This time, it is easy to see all three molecules changing rate. That again proves the model satisfies the requirement, and it is correct.
Figure 6.3 Visualisation for chemical interaction model From the two types of initial value sets, the visualisation from the model is different.
44
With only 100 molecule A and 100 molecule B, the left-side graph of Figure 6.3 doesnt look good enough they cant show the boundary of the box and may resulting misunderstanding of the model. However, the graph from the 500 & 300 set of initial value is good (right-side graph of Figure 6.3). That shows a lot more clear of the box and more bonds are showing these all gives the users a directive and intuitional view of the model. All these above show the X-machine frameworks advantage. Compared with the visualisation and result from Matlab model (Figure 2.3 & 2.3 in chapter 2) X-machine is more suitable and more precise of modelling for this model.
Section 6.2: Result and Discussion of NF-B Pathway model

For NF-B signalling pathway model, the testing was carried out by 500 NF-B molecules free in cytoplasm, 500 IB molecules free in cytoplasm, 500 nuclear Importing Receptors and 500 nuclear Exporting Receptors. Also by running 4000 iterations, it is enough to see the models behaviour. Before testing the model, we need to know the representation of the state number: NF-B 0 - free in cytoplasm 1 - bound to IB in cytoplasm 2 - bound to Nuclear Importing Receptors 3 - bound to IB and then Nuclear Importing Receptors 4 - bound to Nuclear Exporting Receptors 5 - bound to IB and then Nuclear Exporting Receptors 6 - free in nuclear 7 - bound to IB in nuclear IB 10 - free in cytoplasm 11 - bound to Nuclear Importing Receptors 12 - bound to Nuclear Exporting Receptors 13 - free in nuclear Nuclear Importing Receptors 20 - dormant 21 - active Nuclear Exporting Receptors 30 - dormant 31 - active
By testing some iteration files randomly:
45
grep "<state>" its\0.xml | sort | uniq -c 500 <state>0</state> 500 <state>10</state> 500 <state>20</state> 500 <state>30</state> The initial iteration file has 500 NF-B free in cytoplasm, 500 IB free in cytoplasm, 500 nuclear importing receptors and 500 nuclear exporting receptors. grep "<state>" its\10.xml | sort | uniq -c 483 <state>0</state> 11 <state>1</state> 481 <state>10</state> 8 <state>11</state> 6 <state>2</state> 486 <state>20</state> 14 <state>21</state> 500 <state>30</state> After 10 iterations, there are 483 NF-B in cytoplasm, 11 NF-B bound with IB in cytoplasm, 481 IB free in cytoplasm, 8 IB bound with nuclear importing receptors, 6 NF-B bound with nuclear importing receptors, 486 dormant nuclear importing receptors, 14 active nuclear importing receptors, and 500 dormant nuclear exporting receptors. The evaluation method is, get the total number of NF-B (in cytoplasm + bound with nuclear importing & exporting receptors + free in nuclear), and then get the total number of IB (in cytoplasm + bound with nuclear importing & exporting receptors + free in nuclear), then get the total number of NF-B & IB (in cytoplasm + bound with nuclear importing & exporting receptors + free in nuclear). Then the initial value of NF-B minus the total number of existing NF-B should equal the total number of existing NF-B & IB, also should equal the initial value of IB minus the total number of existing IB. The evaluation has not finished yet, it is necessary to test the nuclear importing and exporting receptors behaviours. By getting the total number of molecules that bound with nuclear importing or exporting receptors and then if they are equal to the number of active nuclear importing or exporting receptors, which means the nuclear receptors work fine. For the tenth iteration, total number of NF-B is: 483 + 6 = 489, total number of NF-B & IB is: 11, total number of IB is: 481 + 8 = 489. The change of NF-B is 500 489 = 11, change of IB is 500 489 = 11, which both are equal to the total number of NF-B & IB -- 11. That means the reaction is correct. For nuclear receptors, molecules bound with importing receptors: 8 + 6 = 14, which equals to the number of active nuclear importing receptors -- 14. There is no molecule bound with
46
nuclear exporting receptors and there are 500 dormant nuclear exporting receptors. That means the nuclear receptors are working fine. At lease in this stage, the model is working fine. grep "<state>" its\2000.xml | sort | uniq -c 156 <state>0</state> 279 <state>1</state> 150 <state>10</state> 3 <state>11</state> 1 <state>12</state> 17 <state>13</state> 1 <state>2</state> 492 <state>20</state> 8 <state>21</state> 4 <state>3</state> 495 <state>30</state> 5 <state>31</state> 1 <state>4</state> 3 <state>5</state> 13 <state>6</state> 43 <state>7</state> The result above is from the file after 2000 iterations. With evaluation: total number of NF-B is: 156 + 1 + 1 + 13 = 171, total number of NF-B & IB is: 279 + 4 + 3 + 43 = 329, total number of IB is: 150 + 3 + 1 + 17 = 171. The change of NF-B is 500 171 = 329, change of IB is 500 171 = 329, which both are equal to the total number of NF-B & IB -- 329. For nuclear receptors, molecules bound with importing receptors: 1 + 4 + 3 = 8, which equals to the number of active nuclear importing receptors -- 8; molecules bound with exporting receptors: 1 + 3 + 1 = 5, which equals to the number of active nuclear exporting receptors -- 5. That means this model is correct. grep "<state>" its\4000.xml 100 <state>0</state> 336 <state>1</state> 95 <state>10</state> 1 <state>11</state> 15 <state>13</state> 497 <state>20</state> 3 <state>21</state> 2 <state>3</state> 495 <state>30</state> 5 <state>31</state> 5 <state>5</state> 11 <state>6</state> | sort | uniq -c
47
46 <state>7</state> After 4000 time steps: total number of NF-B is: 100 + 11 = 111, total number of NF-B & IB is: 336 + 2 + 5 + 46 = 389, total number of IB is: 95 + 1 + 15 = 111. The change of NF-B is 500 111 = 389, change of IB is 500 111 = 389, which both are equal to the total number of NF-B & IB -- 389. For nuclear receptors, molecules bound with importing receptors: 2 + 1 = 3, which equals to the number of active nuclear importing receptors -- 3; molecules bound with exporting receptors: 5, which equals to the number of active nuclear exporting receptors -- 5. Now, it is possible to say that the model is correct. From the Getdata programme:
Figure 6.4 NF-B pathway agent model result (a) From the Figure 6.4, it is possible to see the models behaviour. However, some of the lines below is not necessary to show in the resulting graph the lines that associate with nuclear receptors, from GNUPlot, it is possible to write the script that choose only some of the columns in the data.scv resulting file. From the Figure 6.5 on next page, we can see the six lines clearly: with the changing rate, these shows the model is correct:
48
Figure 6.5 NF-B pathway agent model result (b)
Section 6.3: Result and Discussion of NF-B & MAP kinase pathways combined model
This model involves two new molecules and Ras_SOS_Grb2 40 - free in cytoplasm 41 - bound with Active-Ras => MAPK 42 - MAPK (Ap-1) bound with importing receptor 43 Ap-1 free in nuclear Active-Ras 50 - free in cytoplasm Some results from the Unix tools: grep "<state>" its\0.xml | sort 500 <state>0</state> 500 <state>10</state> 500 <state>20</state> 500 <state>30</state> 500 <state>40</state> 400 <state>50</state>
| uniq -c
49
That shows the initial value of the model, for NF-B pathway and nuclear receptors the initial values are the same with last models testing. For Ras_SOS_Grb2, there are 500 molecules in the initial file and 400 Active-Ras molecules. The reason I assign different number of Ras_SOS_Grb2 and Active-Ras is same with the chemical interaction model. Both of the molecules are not going inside the nuclear. Only MAPK the bound molecule of them can go inside. So during the result graph, the two lines which show the Ras_SOS_Grb2 and Active-Ras wont overlap. The evaluation of this model is similar with last model, but only need to do some more calculation for the new pathway molecules: get the total number of Ras_SOS_Grb2 (in cytoplasm only), and then get the total number of Active-Ras (in cytoplasm only), then get the total number of MAPK or Ap-1 (in cytoplasm + bound with nuclear importing receptors + free in nuclear). Then the initial value of Ras_SOS_Grb2 minus the total number of existing Ras_SOS_Grb2 should equal the total number of existing MAPK or Ap-1, as well as Active-Ras. We check the 4000th iteration file to see if the model works fine: grep "<state>" its\4000.xml | sort | uniq -c 97 <state>0</state> 344 <state>1</state> 101 <state>10</state> 1 <state>11</state> 7 <state>13</state> 497 <state>20</state> 3 <state>21</state> 2 <state>3</state> 498 <state>30</state> 2 <state>31</state> 166 <state>40</state> 186 <state>41</state> 148 <state>43</state> 2 <state>5</state> 66 <state>50</state> 12 <state>6</state> 43 <state>7</state> For NF-B pathway, total number of NF-B is: 97 + 12 = 109, total number of NF-B & IB is: 344 + 2 + 2 + 43 = 391, total number of IB is: 101 + 1 + 7 = 109. The change of NF-B is 500 109 = 391, change of IB is 500 109 = 391, which both are equal to the total number of NF-B & IB -- 391. For MAP kinase pathway, total number of Ras_SOS_Grb2 is: 166, total number of MAPK or Ap-1 is: 186 + 148 = 334, total number of Active-Ras is: 66. The change of Ras_SOS_Grb2 is 500 166 = 334, change of Active-Ras is 400 66 = 334, which both are equal to the total number of MAPK or Ap-1 -- 334.
50
For nuclear receptors, molecules bound with importing receptors: 2 + 1 = 3, which equals to the number of active nuclear importing receptors -- 3; molecules bound with exporting receptors: 2, which equals to the number of active nuclear exporting receptors -- 2. All the results above shows the model is correct. Then we can see a graph from the Getdata programme:
Figure 6.6 Result of the combined model As we can see the models concentrations changing rate are followed the rules defined. That means this model works perfect, also shows to add a new pathway in X-machine framework existing model is fast and easy to build. Also compared with Matlab model, X-machine framework is more users friendly and more adaptive.
51
Chapter 7: Conclusions
Section 7.1: Summary of the Dissertation and Project
This is the seventh chapter and also the last chapter of the dissertation. The dissertation is most about the three models of the project. Also each chapter breaks down by the three models. The first chapter is introduction chapter. This chapter contains the background of the project and some introductory information about agent-based modelling, X-machine framework and the project. The second chapter is literature review. This chapter states the all three models associative literatures and also the Matlab models for the first two models. After the models, it then states in detail about three agent-based modelling approaches Swarm, MASON and X-machine framework, as well as why this project chose X-machine framework. The third chapter is requirement and analysis. This chapter started with the requirements of this project and then all the requirements and how to evaluate each model in detail. The fourth chapter is design. It gives some idea of the three programming language XML, Matlab and C which are associated with the project. Then the architecture of the X-machine framework and the xml files. Lastly, each models design is stated for implementation. The fifth chapter is implementation and testing. This chapter gives the detailed implementation of the three models and two testing method for the models. The sixth chapter is most important chapter for the dissertation results and discussion. By using the testing method from chapter five, by giving lots of results of the three models and discuss them to evaluate the models. For the project, it is a great pleasure to do this project. Because the project involves both computation and biological field, this is a rising field with lots of discoveries and contributions to our human beings in future. Also this project proved the X-machine frameworks usefulness of biological modelling. It has shown the adaptive and reality of the X-machine framework. As well as the agent-based modelling are more precise and researchable than the reaction kinetics differential equations. The try out of the NF-B & MAP Kinase Signalling Pathway Combined Model shows the X-machine is capable of modelling more than one pathways and easy to be added in.
Section 7.2: Future Work of this Project

Some future work could also be carried out. For the NF-B & MAP Kinase Signalling
52
Pathway Combined Model, the cross-talk of both pathways are still not clear now. However, I believe that once the biology group got the results and the detail of cross-talk, it is possible to change the model with those details quickly. Also, based on the compatible and extensible of the framework, it is possible to make a programme with GUI graphics user interface for the X-machine framework. Users could add and drop any kind of molecules and set of rules as they wish. That should be welcomed by the biology scientist to do the experiments and get results for their research.
53
References
References
[1] Lengauer, T., (2001). Computational Biology at the Beginning of the Post-genomic Era, (http://domino.mpi-sb.mpg.de/internet/news.nsf/0/76c2efbf2adc399ac1256ae60044b5ac?Open Document) [2] Stenger, Judith E., (2005). Computational Biologist. (http://www.bookrags.com/sciences/genetics/computational-biologist-gen-01.html) [3] Wikipedia contributors (2006). Computer simulation. Wikipedia, The Free Encyclopedia. Retrieved 00:14, April 7, 2006 from http://en.wikipedia.org/w/index.php?title=Computer_simulation&oldid=46951134. [4] Gregory, R., (2002). "An Individual Based Model for Simulating Bacterial Evolution", accepted contribution to Evolvability and Individuality Workshop, University of Hertfordshire, 18-20 September 2002 [5] Pogson, M., Holcombe, M., Smallwood, R., Qwarnstrom, E., (2005a). Agent-Based Modelling of the NF-B Signalling Pathway A system Biology Approach. [6] Pogson, M., Holcombe, M., Smallwood, R., (2005b). Agent-Based Modelling of Intracellular Chemical Interactions [7] Balanescu, T., Holcombe, M., Cowling, A. J., Gheorgescu, H., Georghe, M., Vertan, C., (1999). Journal of Universal Computer Science 5, 494, [8] Coakley, S., (2005). X-machine Agents: A Software Architecture for Agent-Based Modelling. [9] Hutchison, C.A., Peterson, S.N., Gill, S.R., Cline, R.T., White, O., Fraser, C.M., Smith, H.O., and Venter, J.C. (1999). Global transposon mutagenesis and a minimal Mycoplasma genome. Science, 1999 Dec 10; 286(5447): 2165-9. [10] Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P., (2002). Molecular Biology of the Cell 4th ed. Garland Science, New York. [11] Cho, K. H., Wolkenhauer, O., (2003). Analysis and Modelling of Signal Transduction Pathways in Systems Biology. Biochem. Soc. Trans. [12] Burrage, K., Burrage, P., Jeffrey, S., Pickett, T., Sidje, R., Tian, T., (2003). A Grid Implementation of Chemical Kinetic Simulation Methods in Genetic Regulation. Proceedings of the APAC03 Conference on Advanced Computing, Grid Applications and eResearch.
54
References
[13] Jackson, D., Holcombe, M., Ratnieks, F., (2004a). Coupled computational simulation and empirical research into the foraging system of Pharaohs ant (Monomorium pharaonis). Biosystems 76, 101-112. [14] Jackson, D., Holcombe, M., Ratnieks, F., (2004b). Trail geometry gives polarity to ant foraging networks. Nature 432, 907-909. [15] Walker, D. C., Southgate, J., Hill, G.., Holcombe, M., Hose, D. R., Wood, S. M., MacNeil, S., Smallwood, R.H., (2004). The Epitheliome: modelling the social behaviour of cells. Biosystems 76, 89-100. [16] Holcombe, M., Ipate, F., (1998). Correct Systems: Building A Business Process Solution. Springer-Verlag. [17] Baldwin AS Jr. (1996). The NF-kappa B and I kappa B proteins: new discoveries and insights. Annu Rev Immunol. 14:649-83. [18] Blackwell, T.S., Lancaster, L.H., Christman, J.W. (1998). Nuclear factor B: a pivotal role in the systemic inflammatory response syndrome and new target for therapy. Intensive Care Med. 24(11):1131-1138. [19] Yamamoto, Y., Gaynor. R. B., (2001). Therapeutic potential of inhibition of the NF-kappaB pathway in the treatment of inflammation and cancer. J Clin Invest;107:135-142 [20] Bourdon, J., (2005). Using MATLAB. (https://www.dcs.shef.ac.uk/wiki/bin/view/Guide/UsingMatlab) [21] From the Product description for Matlab 7.1 of The MathWorks. (http://www.mathworks.com/products/matlab/description1.html). [22] Kernighan, B., Ritchie, D., (1978). The C Programming Language. 1st Ed, Prentice Hall. [23] Wikipedia contributors (2006). C programming language. Wikipedia, The Free Encyclopedia. Retrieved 00:16, April 7, 2006 from http://en.wikipedia.org/w/index.php?title=C_programming_language&oldid=47238048. [24] From W3 website Extensible Markup Language (XML) (http://www.w3.org/XML/) [25] RIKEN BIORESOURCE CENTER, DNA BANK, Summary of Map kinase pathway, image from (http://www.brc.riken.jp/lab/dna/en/GENESETBANK/mapk_ras.html) [26] Qwarnstrom, E., (2005). Image of Transmembrane Signalling Biomechanical and Soluble Mediators.
55
References
[27] Wikipedia contributors (2006). Mitogen-activated protein kinase. Wikipedia, The Free Encyclopedia. Retrieved 00:17, April 7, 2006 from http://en.wikipedia.org/w/index.php?title=Mitogen-activated_protein_kinase&oldid=3469908 9. [28] Pogson, M., Smallwood, R., Qwarnstrom, E., Holcombe, M., (2006). Formal Agent-Based Modelling of Intracellular Chemical Interactions [29] Carlotti, F., Chapman, R.., Dower, S.K., Qwanstrom, E.E., (1999). Activation of Nuclear Factor B in single living cells. J. Biol. Chem. 274, 37941-37949. [30] Carlotti, F., Dower, S.K., Qwanstrom, E.E., (2001). Dynamic Shuttling of Nuclear Factor B
between the nucleus and cytoplasm as a consequence of inhibitor dissociation. J. Biol. Chem. 275, 41028-41034.
[31] Yang, L., Ross, K., Qwanstrom, E.E., (2003). RelA control of IB phosphorylation: a
positive feedback loop for high affinity NF-B complexes. Biol. Chem. 278, 30881-30888.
[32] Qwarnstrom, E., (2006). Image of NF-B & MAP Kinase Signalling Pathway. [33] Minar, N., R. Burkhart, C. Langton, and M. Askenazi. (1996). The Swarm simulation system: a toolkit for building multi-agent simulations. Working Paper 96-06-042, Santa Fe Institute, Santa Fe. [34] MASON, Retrieved 12 Apr, 2006 from http://cs.gmu.edu/~eclab/projects/mason/
56
Appendices
Appendices
* The XML spec (Simon Coakley) Each agent must have a name tag as multiple agents can be used. For the time being each agent must have a position in the model space. Defined as either 2D or 3D Cartesian space. The memory values of these positions must being named thus: x: x, px, posx y: y, py, posy z: z, pz, posz (if used) This is because the underlying message communication system implementation must know the positions of agents. For each variable in an agent's memory for example 'state': There are the provided functions: set_state(new_value) get_state() Also for direct access the the variables (avaliable for the moment): xmemory->state Messages are added to the messageboard by: add_messagename_message(variable1, variable2, ...); Messages are read by cycling through the message board: messagename_message = get_first_messagename_message(); while(location_message) { process message code here messagename_message = get_next_messagename_message(messagename_message); } Each message type has its own messageboard. Messageboards are cleared at the end of each iteration. Agents can be added to a simulation with: add_xmachinename_agent(variable1, variable2, ...); Be careful as newly created agents will straight away start to be processed from the current function.
i
Appendices
An agent can destroy itself immediately with: return RELEASE; As this will stop the current function and tell the program to delete the agent from the simulation. Instead of defining function code inside the XML with: <code><![CDATA[ my code here ]]></code> Code can be read in from a file using the file tag: <code> <file>myfunction.c</file> </code> Access to iteration iteration_loop number is avaliable via the integer variable
ii

Xingdong Bian - X-Machine Model of A Biological System

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Xingdong Bian - X-Machine Model of A Biological System

Uploaded by

Copyright:

Available Formats

X-Machine Model of a Biological System

Third year undergraduate dissertation project Final Dissertation

Department of Computer Science University of Sheffield

Signature: Date: 02/05/2006

Section 1.2: About the Project

Section 1.2.1: Agent-Based Modelling

Section 1.2.2: X-machine

Section 1.2.3: HPCx

Section 1.3: About This Dissertation

Chapter 2: Literature Review

Chapter 2: Literature Review

Figure 2.1 [26]

Chapter 2: Literature Review

Section 2.2: Agent-Based Intracellular Chemical Interactions Model

Chapter 2: Literature Review

Chapter 2: Literature Review

Section 2.3: Agent-Based the NF-B Signalling Pathway Model

Chapter 2: Literature Review

Chapter 2: Literature Review

Chapter 2: Literature Review

Figure 2.6 [25]

Chapter 2: Literature Review

Section 2.5: Some Agent-Based Modelling Approaches

Chapter 2: Literature Review

Section 2.5.2: MASON Multi-Agent Simulations

Chapter 2: Literature Review

Section 2.5.3: X-machine Framework and XML

and is the input and output alphabets respectively.

Chapter 2: Literature Review

Chapter 2: Literature Review

Chapter 3: Requirements and Analysis

Chapter 3: Requirements and Analysis

Chapter 3: Requirements and Analysis

Figure 3.1 Process of combination

Section 3.2: Analysis for Intracellular Chemical Interaction Model

Chapter 3: Requirements and Analysis

Section 3.2.2: Conversion from Matlab

Chapter 3: Requirements and Analysis

Section 3.2.3: Concentrations Rates

Figure 3.3: concentration of molecule A, B and C against time [6]

Section 3.3: Analysis for the NF-B Signalling Pathway Model

Chapter 3: Requirements and Analysis

Section 3.3.2: Conversion from Matlab

Chapter 3: Requirements and Analysis

Figure 3.4: Possible states and transition of an NF-B [5]

Chapter 3: Requirements and Analysis

Raf (MAP KKK)

MEK1/2 (MAP KK)

ERK1/2 (MAP K) Nuclear Membrane

Figure 3.5 Simplify of the MAP Kinase pathway

Section 4.1.1: XML

Section 4.1.2: Matlab

Section 4.2: Overall Design

Section 4.2.2: Main XML File Structure

Figure 4.1 Structure of the Main file (a)

Figure 4.2 Structure of the Main file (b)

Section 4.2.3: Iteration XML File Structure

Section 4.3: Design of Chemical Interaction Model

Section 4.4: Design of NF-B Signalling Pathway Model

Chapter 5: Implementation and Testing

Chapter 5: Implementation and Testing

Section 5.1: Implementation of Three Models