You are on page 1of 41

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction to Biochemical Network Modelling


Darren Wilkinson1,2 1 School of Mathematics & Statistics 2 Centre for Integrated Systems Biology of Ageing and Nutrition Newcastle University, UK

SAMSI Undergraduate Workshop, 2nd3rd March, 2007

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Overview

Biological network modelling Model calibration Application projects modelling and inference (Bayesian inference) Round-up

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Computational Systems Biology (CSB)


Much of CSB is concerned with building models of complex biological pathways, then validating and analysing those models using a variety of methods, including time-course simulation Most CSB researchers work with continuous deterministic models (coupled ODE and DAE systems) There is increasing evidence that much intra-cellular behaviour (including gene expression) is intrinsically stochastic, and that systems cannot be properly understood unless stochastic eects are incorporated into the models Stochastic models are harder to build, estimate, validate, analyse and simulate than deterministic models...
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Modelling

Start with some kind of picture or diagram for a mechanism Turn it into a set of (pseudo-)biochemical reactions Specify the rate laws and rate parameters of the reactions Run some stochastic or deterministic computer simulator of the system dynamics Study the dynamics in a variety of ways to gain insight into the system Rene the model structure and/or parameters after comparing simulated dynamics with experimental observations

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Example genetic auto-regulation

P P2 r

RNAP

DNA

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Biochemical reactions
Simplied view: Reactions g + P2 g P2 g g + r r r + P 2P P2 r P Repression Transcription Translation Dimerisation mRNA degradation Protein degradation

But these arent as nice to look at as the picture...

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Petri net representation

Simple bipartite digraph representation of the reaction network useful both for visualisation and computational analysis
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Matrix representation of the Petri net

Species Repression Reverse repression Transcription Translation Dimerisation Dissociation mRNA degradation Protein degradation

Reactants (Pre ) g P2 g r P P2 1 1 1 1 1 2 1 1 1

Products (Post ) g P2 g r P P2 1 1 1 1 1 1 1 1 2

But still need rate laws and reaction rates...


Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Mass-action stochastic kinetics


Stochastic molecular approach: Statistical mechanics arguments lead to a Markov jump process in continuous time whose instantaneous reaction rates are directly proportional to the number of molecules of each reacting species Such dynamics can be simulated (exactly) on a computer using standard discrete-event simulation techniques Standard implementation of this strategy is known as the Gillespie algorithm (just discrete event simulation), but there are several exact and approximate variants of this basic approach

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Lotka-Volterra system

Reactions X 2X X + Y 2Y Y X Prey, Y Predator We can re-write this using matrix notation for the corresponding Petri net (prey reproduction) (prey-predator interaction) (predator death)

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Forming the matrix representation


The L-V system in tabular form Rate Law h(, c ) c1 x c2 xy c3 y LHS X Y 1 0 1 1 0 1 RHS X Y 2 0 0 2 0 0 Net-eect X Y 1 0 -1 1 0 -1

R1 R2 R3

Call the 3 2 net-eect (or reaction) matrix A. The matrix S = A is the stoichiometry matrix of the system. Typically both are sparse. The SVD of S (or A) is of interest for structural analysis of the system dynamics...

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Petri net invariants


A P -invariant is a non-zero solution to Ay = 0 (ie. y is in the null-space of A)
P -invariants correspond to conservation laws in the network, and lead to rank-degeneracy of A

A T -invariant is a non-zero, non-negative (integer-valued) solution to Sx = 0 (ie. x is in the null-space of S )


T invariants correspond to sequences of reaction events that return the system to its original state

The SVD of S (or A) characterises the null-space of S and A The Lotka-Volterra model is of full rank (so no P -invariants), and has one T -invariant, x = (1, 1, 1)

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

The Gillespie algorithm


1

Initialise the system at t = 0 with rate constants c1 , c2 , . . . , cv and initial numbers of molecules for each species, x1 , x2 , . . . , xu . For each i = 1, 2, . . . , v , calculate hi (x , ci ) based on the current state, x . Calculate h0 (x , c )
v i =1

3 4

hi (x , ci ), the combined reaction hazard.

Simulate time to next event, t , as an Exp (h0 (x , c )) random quantity, and put t := t + t . Simulate the reaction index, j , as a discrete random quantity with probabilities hi (x , ci ) / h0 (x , c ), i = 1, 2, . . . , v . Update x according to reaction j . That is, put x := x + S (j ) , where S (j ) denotes the j th column of the stoichiometry matrix S . Output x and t . If t < Tmax , return to step 2.
Biochemical Network Modelling

7 8

Darren Wilkinson SAMSI Undergraduate Workshop

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

The continuous deterministic approximation


If the discreteness and stochasticity are ignored, then by considering the reaction uxes it is straightforward to deduce the mass-action ordinary dierential equation (ODE) system: ODE Model dXt = Sh(Xt , c ) dt Analytic solutions are rarely available, but good numerical solvers can generate time course behaviour Slight complications due to rank-degeneracy of S Also spatial versions reaction-diusion kinetics PDE models computationally intensive (slow)
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

The Lotka-Volterra model


[Y1] [Y2] 15 25 [Y] 10 [Y2] 0 20 40 Time 60 80 100 0 0 5 0 10 5 15 20

4 [Y1]

Y1 Y2

400

300

200

Y2 100 0 5 10 15 Time 20 25 100 200

300

400

50

100

150

200 Y1

250

300

350

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Key dierences
Deterministic solution is exactly periodic with perfectly repeating oscillations, carrying on indenitely Stochastic solution oscillates, but in a random, unpredictable way (wandering from orbit to orbit in phase space) Stochastic solution will end in disaster! Either prey or predator numbers will hit zero... Either way, predators will end up extinct, so expected number of predators will tend to zero qualitatively dierent to the deterministic solution So, in general the deterministic solution does not provide reliable information about either the stochastic process or its average behaviour
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Modelling Stochastic kinetics

Simulated realisation of the auto-regulatory network

Rna P P2

200

400

600 0 10

30

50

0.0

0.5

1.0

1.5

2.0

1000

2000

3000

4000

5000

Time

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Likelihood-based fully Bayesian inference Likelihood-free Bayesian inference

Model calibration

In its most basic form, model calibration is concerned with tuning the parameters of a computer model in order to make the output obtained by running it consistent with experimental observations In practice, this is only one aspect of the problem, as there will typically be a range of parameter values consistent with observations, and so the calibration exercise is part of a broader analysis, also concerning model validity and parameter identiability and confounding

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Likelihood-based fully Bayesian inference Likelihood-free Bayesian inference

Simple example: linear birth-death process


Birth-death reactions X 2X X Deterministic solution: Xt = X0 exp{( )t } This is a function of ( ) only! Stochastic solution is more interesting, and depends on both and ...
X X

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Likelihood-based fully Bayesian inference Likelihood-free Bayesian inference

Birth-death realisations

0 0

10

20

30

40

50

60

2 t

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Likelihood-based fully Bayesian inference Likelihood-free Bayesian inference

Issues with the birth-death process

Stochastic variation: random distribution at each time point, correlations between time points, random time to extinction, etc. Parameter identication: if a deterministic model is tted, one can only ever identify ( ) never and separately Information about both and in the data... Need both and for reliable stochastic simulation Cant t parameters using a deterministic model, then run a stochastic simulation...

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Likelihood-based fully Bayesian inference Likelihood-free Bayesian inference

Birth-death realisations
lambda=0, mu=1
60 60

lambda=3, mu=4

40

20

X 0 0 1 2 t 3 4 5 0 0 20

40

2 t

lambda=7, mu=8
60 60

lambda=10, mu=11

40

20

X 0 0 1 2 t 3 4 5 0 0 20

40

2 t

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Likelihood-based fully Bayesian inference Likelihood-free Bayesian inference

Fully Bayesian inference


In principle it is possible to carry out rigorous statistical inference for the parameters of the stochastic process model Fairly detailed experimental data are required eg. quantitative single-cell time-course data derived from live-cell imaging The standard procedure uses GFP labelling of key reporter proteins together with time-lapse confocal microscopy, but other approaches are also possible The statistical theory underlying the inference algorithms is fairly technical the techniques are developed and illustrated in a sequence of papers. The main ndings are summarised in: Golightly, A. & Wilkinson, D. J. (2006) Bayesian sequential inference for stochastic kinetic biochemical network models, Journal of Computational Biology, 13(3):838851.
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Introduction Likelihood-based fully Bayesian inference Likelihood-free Bayesian inference

Likelihood-free MCMC for Bayesian inference


It is possible to develop a generic framework for Bayesian inference for model parameters applicable to both deterministic and stochastic models using the ideas of likelihood-free MCMC, which sacrices some computational eciency for considerable reduction in implementation complexity It exploits forward simulation from the computer model Such an approach requires a very large number of simulation runs, and is therefore most easily applied to fast simulators (simple models) For slow simulators (complex models), HPC facilities can be exploited in order to build a fast emulator of the slow simulator
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

Mechanisms of ageing
Ageing is caused by the gradual accumulation of unrepaired molecular damage, leading to an increasing fraction of damaged cells and eventually to functional impairment of tissues and organs One major cause of molecular damage is highly reactive oxygen species (ROS) Molecular damage may trigger cellular response programmes, so that the ageing process may also be seen to be governed by genetically determined pathways Many of the (random) damage and (imperfect) repair mechanisms important for understanding cellular ageing are intrinsically stochastic
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

Network theory of ageing

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

Modelling large biological systems


BBSRC/MRC/DTI Grant (+ Unilever) BASIS Biology of Ageing e-Science Integration and Simulation (4/023/06) Kirkwood, Wilkinson, Boys, Gillespie, Proctor, Shanley Modelling large complex systems with many interacting components SBML model database (SBML encoded for discrete stochastic simulation) Discrete stochastic simulation service (and results database) Distributed computing infrastructure for routine use (web portal and web-service interface for GRID computing)
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

SBML The Systems Biology Markup Language


SBML is an XML-based language for encoding and exchanging quantitative biochemical network models Encodes species, initial amounts, reactions, rate laws, etc. Original specication (Level 1) aimed mainly at continuous deterministic models Current specication (Level 2) perfectly capable of encoding discrete stochastic models in an unambiguous way Many tools for working with SBML models (model builders, deterministic and stochastic simulators, etc.) Issues with testing correctness of stochastic simulators, and correctly encoding discrete stochastic models using o-the-shelf model-building tools
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

Computer model technology

BASIS features service-oriented architecture (SOA)


Controls access to models, data and computational resources Represents and encodes complex models using XML technology (SBML in this case) Simulation engine that can handle a broad class of models without recompilation Databases for models and simulation output Web interface for human-interaction SOAP web-services API for programmatical access

Do we need a standard API for biological simulation services?

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

BASIS Software
Web client SOAP client (WSSecurity)

www.basis.ncl.ac.uk

UK e-Science GRID Pilot Project


SOAP client (SSL)

Apache web server

Python Spyce/PSP and CGI scripts

Tomcat

Axis

Java WSSecurity WSs

Python SOAP Web Services interface (SSLbased) Python Main BASIS API

Postgres Database

Condor Job sched

libSBML SBML library

R Data analysis

GraphViz Network visualise

C Simulation code

GSL Scientific library

Debian GNU/Linux (sarge)

Software architecture used to implement the BASIS system

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

Example: Chaperones and their role in ageing


C. J. Proctor, C. Soti, R. J. Boys, C. S. Gillespie, D. P. Shanley, D. J. Wilkinson, T. B. L. Kirkwood (2005) Modelling the actions of chaperones and their role in ageing, Mechanisms of Ageing and Development, 126(1):119-131. Several versions of this model in the BASIS public model repository, each with a unique ID each can be copied, modied and simulated eg. urn:basis.ncl:model:518

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

Outline CaliBayes architecture

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

Distinctive features
Although the outline architecture appears similar to that for many iterative parameter tting algorithms, there are some fundamental dierences This is not a hill-climbing algorithm, and is not searching for the best t The calibration engine is using the information it receives from the statistical comparison service in order to randomly explore the posterior distribution for the parameters (the set of parameters consistent with the data and prior knowledge, weighted according to their probability) This posterior distribution can be used for a range of analyses, including calibration
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

An example posterior distribution

0.25

100

80

0.20

Density

60

Density 0.000 0.010 0.020 0.030 vd 0 0.05 5 10

vdr

0.15

0.10

0.00 0.01 0.02 0.03 0.04 vd

20

40

15

20

25

0.15 vd

0.25

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Ageing Complex modelling Bayesian calibration

Extensions
Bayesian inference naturally integrates data from multiple sources, and may be assimilated simultaneously or sequentially depending on the context The architecture requires slight modication for complex models, as then the simulator is replaced by an emulator, built o-line using HPC facilities The framework can also be adapted to tackle experimental design questions such as: Given a limited budget, and our current state of knowledge, what are the best new experiments to carry out in order to learn most about the model parameters of greatest interest? It is also possible to extend the framework to compare evidence for competing models for the same process
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

MCMC Future directions Emulators

MCMC-based fully Bayesian inference for fast computer models


Before worrying about the issues associated with slow simulators, it is worth thinking about the issues involved in calibrating fast deterministic and stochastic simulators, based only on the ability to forward-simulate from the model In this case it is often possible to construct MCMC algorithms for fully Bayesian inference using the ideas of likelihood-free MCMC (Marjoram et al 2003) Here an MCMC scheme is developed exploiting forward simulation from the model, and this causes problematic likelihood terms to drop out of the M-H acceptance probabilities
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

MCMC Future directions Emulators

Future directions
In the presence of measurement error, the sequential likelihood-free scheme is eective, and is much simpler than a more ecient MCMC approach The likelihood-free approach is easier to tailor to non-standard models and data The essential problem is that of calibration of complex stochastic computer models Worth connecting with the literature on deterministic computer models For slow stochastic models, there is considerable interest in developing fast emulators and embedding these into MCMC algorithms
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

MCMC Future directions Emulators

Building emulators for slow simulators


Use Gaussian process regression to build an emulator of a slow deterministic simulator Obtain runs on a carefully constructed set of design points (eg. a Latin hypercube) easy to exploit parallel computing hardware here For a stochastic simulator, many approaches are possible
(Mixtures of) Dirichlet processes (and related constructs) are potentially quite exible Can also model output parametrically (say, Gaussian), with parameters modelled by (independent) Gaussian processes Will typically want more than one run per design point, in order to be able to estimate distribution

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Biological computer models Problems Reference

Why are Systems Biology models interesting examples of computer models?


Models
Diverse class of models: fast/slow, spatial/non-spatial, deterministic/stochastic, discrete/continuous time/states even modelling the same biological process! Many parameters Structural uncertainty Genuine interest in the (posterior distribution of the) parameters not just in prediction

Data
High-dimensional Diverse: high-resolution time-course data, coarse population averaged data, endpoint data, distributional data, individual specic parameters/data, covariates Multiple distinct sources of data for a given model
Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Biological computer models Problems Reference

Interesting methodological problems


Calibration of fast and slow stochastic simulators, using individual, averaged and distributional data Dealing with heterogeneity cellcell, tissuetissue, or organismorganism Emulation of slow stochastic simulators good models and tting procedures Experimental design for stochastic computer models trade os between repetition and space-lling, etc. Utilising fast stochastic or deterministic approximate simulators for a slow stochastic simulator

Darren Wilkinson SAMSI Undergraduate Workshop

Biochemical Network Modelling

Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions

Biological computer models Problems Reference

Further information
Stochastic Modelling for Systems Biology
An accessible introduction to stochastic modelling of complex genetic and biochemical networks. Covers: biological modelling, biochemical reactions, Petri nets, SBML, stochastic processes, simulation algorithms (including Gillespie), case studies, MCMC, and Bayesian inference for network dynamics. ISBN: 1-58488-540-8

Contact details... email: d.j.wilkinson@ncl.ac.uk www: http://www.staff.ncl.ac.uk/d.j.wilkinson/


Darren Wilkinson SAMSI Undergraduate Workshop Biochemical Network Modelling

You might also like