You are on page 1of 75

Bioinformatics: Applications

ZOO 4903 Fall 2006, MW 10:30-11:45 Sutton Hall, Room 312 Jonathan Wren

Systems Biology

Lecture overview
What weve talked about so far
Pathways & network motifs Simulating evolution in-silico Cellular simulations

Overview
The ultimate goal of biology & bioinformatics is to tie it all together and understand the system In the meantime, forced to live in the real world, we focus on tying a few things together

Systems Biology backers & attackers


Though coined 40 years ago, a lot of people still ask, "What's that?" when the term systems biology comes up. "It is used in so many different contexts, nobody is really clear what you mean by it," says John Yates III, a professor at the Scripps Research Institute in La Jolla, Calif. He's not the only one stumped by the term's meaning. David Placek, president of Sausalito, Calif.-based Lexicon Branding, a company that cooks up names for pharmaceutical products such as Velcade and Meridia, says he's not so hot on the moniker. "Systems biology is just so general that it could apply to many things. When you're naming a category, the underlying principle is that if you make a statement like, 'I'm doing systems biology,' do people know what you're talking about?'

Volume 17 | Issue 19 | 27 Oct. 6, 2003, The Scientist

What is Systems Biology?


Is this just another name for physiology? The study of the mechanisms underlying complex biological processes as integrated systems of many interacting components. Systems biology involves (1) collection of large sets of experimental data (2) proposal of mathematical models that might account for at least some significant aspects of this data set, (3) accurate computer solution of the mathematical equations to obtain numerical predictions, and (4) assessment of the quality of the model by comparing numerical simulations with the experimental data.

-(Leroy Hood, 1999)

Institute for Systems Biology

http://www.systemsbiology.org/

Why Systems Biology?


On the technology side (PUSH): Capabilities for highthroughput data gathering that have made us aware that biological networks have many more components than we previously surmised. On the biology side (PULL): The realization that to the extent that we dont characterize biological systems quantitatively in their full complexity, the scope and accuracy of our understanding of those systems will be compromised. (in classical experimental terms, the uncontrolled variables in the system will undermine our confidence in the conclusions we draw from our experiments and observations)

Systems Biology vs. traditional cell and molecular biology


Experimental techniques in systems biology are high throughput. Intensive computation is involved from the start in systems biology, in order to organize the data into usable computable databases. Exploration in traditional biology proceeds by successive cycles of hypothesis formation and testing; data accumulates during these cycles. Systems biology initially gathers data without prior hypothesis formation; hypothesis formation and testing comes during post-experiment data analysis and modeling.

Genomics, Proteomics & Systems Biology


Genomics

Proteomics

Systems Biology 1990 1995 2000 2005 2010 2015 2020

Modelling Tools
9

65-69

70-74

75-79

80-84

85-89

90-94

95-99

Period

BIOSSIM (1968) ESSYN (1976) SCAMP (1983) SCOP (1986) METAMOD (1986) SIMFIT (1990) METAMODEL (1991) METASIM (1992) KINSIM (1993) GEPASI (1994) METALGEN (1994 ?) MIST (1995) METABOLIKA (1997 ?) METAFLUX (1997) SIMFLUX (1997) MNA (1998) CELLMOD (1998) FLUXMAP (1999) METATOOL (1999) VCELL (1999)

From Klaus Mauch, University of Stuttgart

Systems Biology is an integration of data & approaches

Technologies to study systems at different levels


Genomics (HT-DNA sequencing) Mutation detection (SNP methods) Transcriptomics (Gene/Transcript measurement, SAGE, gene chips, microarrays) Proteomics (MS, 2D-PAGE, protein chips, Yeast-2-hybrid, X-ray, NMR) Metabolomics (NMR, X-ray, capillary electrophoresis)

Each system has methods for modeling

Pi Calculus

Petri Nets

Flux Balance Analysis

Differential Eqs

Each system has methods for modeling

Boolean Networks

Electrical Circuit Model

Cellular Automata

So how can we meaningfully integrate the data?

System heterogeneity in size & timescale

Atomic Scale 0.1 - 1.0 nm Coordinate data Dynamic data 0.1 - 10 ns Molecular dynamics

Molecular Scale 1.0 - 10 nm Interaction data Kon, Koff, Kd 10 ns - 10 ms Interactions

Cellular Scale 10 - 100 nm Concentrations Diffusion rates 10 ms - 1000 s Fluid dynamics

System heterogeneity in size & timescale

Tissue Scale 0.01m - 1.0 m Metabolic input Metabolic output 1 s 1 hr Process flow

Organism scale 0.01m 4.0 m Behaviors Habitats 1 hr 100 yrs Mechanics

Ecosystem scale 1 km 1000 km Environmental impact Nutrient flow 1 yr 1000 yrs Network Dynamics

Each of the scales does not fit together seamlessly


If one scale (e.g., protein-protein interactions) behaves deterministically and with isolated components, then we can use plug-n-play approaches If it behaves chaotically or stochastically, then we cannot Most biological systems lie between this deterministic order and chaos: Complex systems

Man-made Complex Devices


Intel Pentium 4 42 million transistors

Man-made Complex Devices


The Intel Itanium 2 410 million transistors Number of gates > 100 Million By 2007 both Intel and AMD are predicting dies with 1 billion transistors In terms of parts and interconnections, man-made devices will likely have comparable complexity to bacterial cells if not greater by around 2010

System Models
Building computational models of systems seems more and more like a viable project.
Such a project would bring a much clearer understanding of how systems are controlled and ultimately it should bring unprecedented predictive power.

Are Biologists Ready?


Xo S1 S2 S3 S4 S5 S6 X1

Xo and X1 fixed, all reactions reversible, assume stable steady state.

Are Biologists Ready?


50 %

Xo

S1

S2

S3

S4

S5

S6

X1

What happens to the steady state? Xo and X1 fixed, all reactions reversible, assume stable steady state.

Are Biologists Ready?


50 %

Xo

S1

S2

S3

S4

S5

S6

X1

Typical replies: 1. Nothing happens.

2. Nothing happens unless it is the rate-limiting step.


3. The rate v goes down, but thats all. 4. S3 goes up. 5. S4 goes down. 6. Species downstream of v go down. 7. Steady State flow changes but species levels dont. 8. Xo and X1 change

Are Biologists Ready?


50 %

Xo

S1

S2

S3

If we cant understand this system how can we hope to understand:

S4

S5

S6

X1

Functional Motif Identification


Computer simulation of EGF signal transduction PC12 cells.
Frances Brightman, Simon Thomas and David Fell

http://bms-mudshark.brookes.ac.uk/frances/fabweb5.htm

29 species

Functional Motif Identification


Computer simulation of EGF signal transduction PC12 cells.
Frances Brightman, Simon Thomas and David Fell

http://bms-mudshark.brookes.ac.uk/frances/fabweb5.htm

29 species

Functional Motif Identification

27 components

Functional Motif Identification

As we begin to connect systems we can engage in inference


We move up the chain from data to knowledge by questioning, observing and then hypothesizing
These X genes are upregulated together, but are they interacting? PPI network data suggests Y are Are these Y part of a complex? If they are always expressed together, that suggests maybe yes

As more data is integrated and systems linked together, this becomes easier

Example of inference
(a) An interaction network of SnzSno proteins of S. cerevisiae. The nodes represent proteins and the lines represent yeast two-hybrid (Y2H) interactions. The red nodes represent proteins that correspond to genes in one transcriptome cluster, whereas the green nodes represent proteins that correspond to genes belonging to a different cluster. The existence of two stable complexes can be hypothesized based on the integrated data. (b) The genes NTH1 and YLR270W have similar expression profiles (upper panel). Red indicates upregulation and green indicates downregulation. mRNA expressions of both genes are upregulated during heat shock and other forms of stress. Deletions of NTH1 and YLR270W each confer similar heat-shock sensitive phenotypes (lower panel).

Integrating heterogeneous but related observations

How are the data related?

What kind of model?


What kind of inferencing? Is the data validated? Can we take a best guess on how it might work by drawing upon other motifs or systems with similar properties?

Problems?
How is static data interpreted since its a dynamic system?
How do we deal with low-resolution quality?

How do we treat missing data?


How do we deal with heterogeneous data types? How can we identify and evaluate competing hypotheses inferred by any system? Yes

SB is springing out of existing efforts anyway


E-cell (Keio University, Japan) BioSpice Project (Arkin, Berkeley) Metabolic Engineering Working Group (Palsson & Church, UCSD, Harvard) Silicon Cell Project (Netherlands) Virtual Cell Project (UConn) Gene Network Sciences Inc. (Cornell) Project CyberCell (Edmonton/Calgary)

So where do we start?
Quantitative analysis of components and dynamics of complex biological systems
Static (Tier 1)

Deterministic (Tier 2)

Stochastic (Tier 3)

Features of complex systems


Nonlinearity

global properties not simple sum of parts

Features of complex systems


Feedback loops

Features of complex systems


Open systems (dissipation of energy)

Flagella uses energy:

Features of complex systems


Can have memory (response history dependent)

Response

New protein may remain in cell after initial response, shifting the rate of reaction the next time the cell is exposed to a chemical

Chemical concentration

Features of complex systems


Nested (modules have complexity)

Features of complex systems


There are no precise boundaries

So where do we start?
Quantitatively account for these properties
Different levels of modeling
Static (Tier 1)

Three tiers
Static interactions Deterministic Stochastic

Deterministic (Tier 2)

Principles which transcend tiers

Stochastic (Tier 3)

Principle 1: Modularity
Module
Interacting nodes w/ common function Constrained pleiotropy Feedback loops, oscillators, amplifiers

Principle 2: Recurring circuit elements


Network motifs
Common methods to achieve an effect

Principle 3: Robustness
Robustness
Insensitivity to parameter variation

Severe constraints on design


Robustness not present in most designs

Aims of systems biology


Tier 1: Interactome
Which molecules talk to each other in networks?

Tier 2: Deterministic
What is the average case behavior?

Tier 3: Stochastic
What is the variance of the system?

Aims of systems biology


Tier 1
Get parts list

Aims of systems biology


Tier 2 & 3
Enumerate biochemistry Define network/mathematical relationships Compute numerical solutions

Aims of systems biology


Tier 2 & 3
Deterministic: Behavior of system with respect to time is predicted with certainty given initial conditions Stochastic: Dynamics cannot be predicted with certainty given initial conditions

Aims of systems biology


Deterministic
Ordinary differential equations (ODEs)
Concentration as a function of time only

Partial differential equations (PDEs)


Concentration as a function of space and time

Stochastic
Stochastic update equations
Molecule numbers as random variables functions of time

Y = # molecules at time t

Tier 1: Static interactome analysis


Protein-protein
Signal transduction Cell cycle

Protein-DNA
Gene regulation

Metabolic pathways
Respiration cAMP

Tier 1: Static interactome analysis


Goals
Determine network topology Network statistics Analyze modular structure

Tier 1: Static interactome analysis


Limitations:
Time, space, population average Crude interactions
strength types

typical interactome

Global features
starting point for Tier 2 & 3
first time-varying yeast interactome (Bork 2005)

Tier 1: Static interactome analysis


Analysis methods
Functional Genomics
expression analysis network integration

Graph Theory
scale free small world

Tier 2: Deterministic Models


Goal
model mesoscale system average case behavior
lumped cell

Three levels
ODE system ODE compartment system PDE data limited
cell compartments

continuous time & space (MinCDE oscillation)

Tier 2: Deterministic Modeling


Results
Robust Chemotaxis (Barkai 1997) MinCDE Oscillation (Howard 2003) Feedback in Signal Transduction (Brandman 2005)

Output
time series plots (ODE) condition on parameter values Brandman 2005

Tier 2: Deterministic Modeling


Example
Robustness in bacterial chemotaxis

Bacterial chemotaxis robust to parameter fluctuations!


Chemotaxis: bacterial migration towards/away from chemicals Parameters
concentrations binding affinities

Tier 2: Deterministic Modeling


Bacterial chemotaxis
model as random walk

Exact adaptation
change in concentration of chemical stimulant rapid change in bacterial tumbling frequency then adapts back precisely to its prestimulus value!!
Random walk

Experimental Design
Is exact adaptation robust to substantial variations in biochemical parameters? Systematically varied concentrations of chemotaxis-network proteins and measured resulting behavior

Distinguish between robust-adaptation and fine-tuned models of chemotaxis


Tumbling frequency IPTG inducer
pUA4 pUA4 pUA4 pUA4

Adaption time

E. Coli cheR -/- population

Express CheR over a 100-fold range Adaption precision 1 mM L-aspartate

Adaptation precision = ratio of steady-state tumbling frequency of unstimulated to stimulated cells Summary of results Tumbling frequency 0.3 0.06 (20-fold) Adaption time 3 1 (3-fold) Adaption precision 1.04 0.07

Tumbling frequency as a function of time for wild-type cells

Conclusions from study


Exact adaptation is maintained despite substantial varations in network-protein concentrations
Exact adaptation is a robust property but adaptation time and steadystate behavior are fine-tuned

Tier 3: Stochastic analysis


Fluctuations in abundance of expressed molecules at the single-cell level
Leads to non-genetic individuality of isogenic population

Tier 3: Stochastic Analysis


When stochasticity is negligible, use deterministic modeling

Molecular noise is low:


System is large
molar quantities

Fast kinetics
reaction time negligible

Large cell volume


infinite boundary conditions

Tier 3: Stochastic Analysis


Molecular noise is high:
System is small
finite molecule count matters

Slow kinetics
relative to movement time

Large cell volume


relative to molecule size

Need explicit stochastic modeling!

Tier 3: Ensemble Noise


Transcriptional bursting
Leaky transcription Slow transitions between chromatin states

Translational bursting
Low mRNA copy number

Tier 3: Temporal Noise

Canonical way of modeling molecular stochasticity

Tier 3: Spatial Noise


Finite number effect: translocation of molecules from the nucleus to the cytoplasm have a large effect on nuclear concentration Cytoplasm Nucleus

N = average molecular abundance (coefficient of variation) = /N Decrease in abundance results ina 1/N scaling of the noise (=1/N)

Recap
Three tiers
Interactomes Deterministic Stochastic Static (Tier 1)

Principles which cross tiers


Modularity Reuse Robustness Deterministic (Tier 2)

Stochastic (Tier 3)

Major challenges and limitations


Measurement of chemical kinetics parameters and molecular concentrations in vivo
Differences between in vitro and in vivo data
Compartmental specific reactions

Major challenges and limitations


Data is the limit!!!
Functional genomic data (Interactomes) E. Coli chemotaxis (Leibler, deterministic/robustness)

Important
parameter estimation feedback based estimation methods

Sachs 2005

Software
Tier 1: Interactomes
Graphviz, Bioconductor, Cytoscape

Tier 2: Deterministic
Matlab (SBtoolbox), Mathematica (PathwayLab)

Tier 3: Stochastic
R, Stochsim

Software
High-performance algorithms to solve systems of PDEs
Virtual Cell

Automated parsing of networks into stochastic and deterministic regimes


H-GENESIS STOCK

Summary
Systems Biology can be done by breaking down each system into modules Many problems remain unsolved in exactly how to do this, but independent efforts are being developed in most areas that may one day merge together

For next time


Read supplemental material S9 Homework #10 due