You are on page 1of 15

Appendix 1: Flux balance analysis primer

Introduction
The computational analysis of genome sequence information is beginning to reveal the complete set of
molecular components involved in cellular activities. However, it is clear that cellular functions are
intricate, and the integrated function of biological systems involves many complex interactions among
the molecular components within the cell. To understand the complexity inherent in cellular networks,
approaches that focus on the systemic properties of the network are required 1 . The focus of such
research represents a departure from the classical reductionist approach to the integrated approach 2 to
understanding the interrelatedness of gene function and the role of each gene in the context of multi-
genetic cellular functions or genetic circuits 3,4 .

The engineering approach to analysis and design is to have a mathematical or computer model; e.g. a
dynamic simulator, of a cellular process that is based on fundamental physicochemical laws and
principles, and there has been a long history of mathematical modeling of metabolic systems, which
dates back to the 1960s 5­7 . However, the availability of enzyme kinetic information was fragmented
and attention turned to developing methods that could shed light on the relative importance of various
metabolic events. Methods for sensitivity analysis of metabolic regulation began in the 1960s 8 and
continued into the 1970s 9,10 , leading to the biochemical systems theory (BST) and metabolic control
analysis (MCA).

Although the ultimate goal is the development of dynamic models for the complete simulation of
cellular systems 11 , the success of such approaches has been severely hampered by the current lack of
kinetic information on the dynamics and regulation of metabolic reactions. However, in the absence of
kinetic information it is still possible to accurately assess the theoretical capabilities and operative
modes of metabolic systems using metabolic flux balance analysis (FBA) 4,12­16 . FBA is based on the
fundamental physicochemical constraints on metabolic networks. FBA only requires information
regarding the stoichiometry of metabolic pathways and the metabolic demands; furthermore, FBA can
incorporate additional information when it is available. FBA is particularly applicable for post-
genomic analysis, because the stoichiometric parameters can be defined from the annotated genome
sequence 14 . In this appendix, we will describe the basic concepts of FBA and how it relates to
genomics.

System and methods


Metabolic network reconstruction
Functional assignments

The complete genome for organisms with a genome size of approximately a few million base pairs can
be rapidly sequenced, and currently many are available online (The Institute for Genomic Research).
The annotated whole genome sequence of an organism can be used to reconstruct the metabolic
network, and this process involves several challenges 17-19 .

453594170.doc 1
The first step toward reconstructing the metabolic network is to identify the coding regions or open
reading frames (ORFs) within a genomic sequence. Subsequently, each ORF is searched against
databases with the goal of identifying homologous genes. Homology often provides the first clues
regarding the functionality of a newly sequenced gene. Through such analysis of the genome
sequence, a large fraction of the genes can be assigned a putative function. It is to be expected that
over the coming years, the ability to identify functionally related genes will improve.

Metabolic reaction database

We have constructed a database of known metabolic reactions from the extensive literature regarding
the metabolism of E. coli 20 and several online databases 21­23 . The reaction database contains the
following information: the substrates, products, and stoichiometry of each metabolic reaction, the
name of the enzyme catalyzing the reaction, the genes that code for the respective enzymes, the EC
number of each metabolic reaction. The Supplementary Table 1 list of reactions is available online.

Defining the in silico representation of the metabolic network

All of the metabolic genes in the cell compose a subset of the full genotype. This subset will be
referred to as the metabolic genotype of a particular organism, and the in silico representation of the
metabolic genotype will be referred to as the in silico metabolic genotype. The gene products derived
from the genes in the metabolic genotype carry out all of the enzymatic reactions and transport
processes that occur within the cell. For example, the E. coli in silico metabolic genotype included the
genes involved in central metabolism, amino acid metabolism, nucleotide metabolism, fatty acid and
lipid metabolism, carbohydrate assimilation, vitamin and cofactor biosynthesis, energy and redox
generation, and macromolecule production (i.e. peptidoglycan, glycogen, RNA, and DNA). A
hypothetical metabolic genotype is shown in Figure 2a. This hypothetical metabolic genotype is used
to reconstruct the hypothetical metabolic network (Figure 2b) and to define the stoichiometric matrix
(Figure 4).

The basic methodology used to construct the E. coli metabolic genotype is defined below. First, the
annotated E. coli K12 (ref 24 ) genome sequence was searched against our database. This process
selected all metabolic reactions from our database. However, there still remained metabolic genes that
were identified in E. coli. Therefore, genes that were annotated with a metabolic function but not
identified in our database were flagged for further investigation. Subsequently, each of the flagged
genes was researched (in the literature and the online databases) to determine whether the
gene/reaction should be included. Therefore, our database was constructed and updated to be a
complete database of the metabolic reactions in E. coli.

At this point, a few E. coli metabolic genes/reactions were still not included in the metabolic
genotype. One reason for this may be uncharacterized genes that perform a known biochemical
conversion. Therefore, upon careful review of the existing biochemical literature we added the
necessary genes/reactions to the metabolic genotype. A complete list of the reactions is available
(Table 1). See Covert et al for a more detailed description of this process 19 .

Converting the genomic data into a stoichiometric matrix


All of the information in the metabolic genotype regarding the stoichiometry of the metabolic reactions
can be used to give an in silico representation of the metabolic network, or the in silico metabolic
genotype. Given the myriad of details required to model cellular behavior, modeling cellular functions
has proved a difficult task. However, given a complete list of the molecular components in a cellular
453594170.doc 2
system, we can constrain cellular behavior and define the systemic capabilities/constraints of the
metabolic network. The capabilities of the metabolic network can then be analyzed, and the optimal
characteristics within the capabilities can be identified. Below, we will discuss the methodology we
used to convert the metabolic genotype into an in silico representation of the metabolic
capabilities/constraints.

Flux balance analysis

The fundamentals of flux balance analysis (FBA) have been reviewed 12,13,15 . Below we describe the
procedure we used to construct the in silico representation of the capabilities/constraints of the E. coli
metabolic network and discuss the fundamentals of FBA.

A flux balance was written for each metabolite (Xi) within the metabolic network to yield the dynamic
mass balance equation for each metabolite in the network. Figure 1 depicts an example system of
fluxes (Vsyn, Vdeg, Vtrans, Vuse) affecting a particular metabolite (Xi). The rate of accumulation of Xi was
equated to its net rate of production yielding the dynamic mass balance for Xi:

Equation 1

where the subscripts ‘syn’ and ‘deg’ refer to the synthesis and degradation reactions of metabolite Xi.
The metabolic fluxes Vtrans, correspond to exchange fluxes that bring metabolites into or out of the
system boundary, and Vuse refers to the growth and maintenance requirements. Not all metabolites in
the network were acted upon by all four reaction types; for example, Vtrans reactions acted only on
external metabolites, and Vuse reactions operated only on internal metabolites. Equation 1 can be
rewritten as:

Equation 2

where bi is the net transport of Xi (Xi is an external metabolite – see Figure 2b) into the defined
metabolic system. The growth and maintenance requirements were represented as fluxes in the
metabolic network (see Figures 2b & 4) and are equivalent to a degradation reaction. The
stoichiometry and magnitude of the growth and maintenance fluxes were estimated from the literature
25-29
. For the E. coli metabolic network, all the transient material balances were represented by a single
matrix equation,

Equation 3

where X is an m dimensional vector defining the quantity of the metabolites within a cell (or cell
population), v is the vector of n metabolic fluxes, S is the m x n stoichiometric matrix, and b is the
vector of metabolic exchange fluxes.

The time constants characterizing metabolic transients are typically very rapid compared to the time
constants of cell growth and process dynamics, (e.g, ref 30 ); therefore, the transient mass balances
were simplified to only consider the steady-state behavior. Eliminating the time derivative in Equation
3 (assuming a steady-state) and rearranging the equation yielded:

453594170.doc 3
Equation 4

where I is the identity matrix. This equation states that on time scales longer than the doubling time,
all the formation, degradation, utilization, and transport fluxes are balanced. Otherwise, significant
amounts of the metabolite would accumulate inside the cell.

Not all of the metabolites are capable of transport into or out of the cell; therefore, the I·b term was
simplified by removing the rows in the b vector that correspond to metabolites that are not transported
across the cell membrane, forming a vector br. Additionally, the corresponding columns in I were
eliminated forming a matrix U. The stoichiometric matrix was partitioned such that Sreactions defined the
metabolic reactions within the system boundary (this included transport processes, i.e. PTS system);
Suse defined the biomass and maintenance requirement fluxes, and the U matrix allowed certain
metabolites to be transported into (and out of) the system (defined by the constraints placed on the br
vector – discussed below).

Equation 5

let

Therefore, we generated the following equation (Note, in the literature, this equation is written Sv=0
for simplicity):

Equation 6

where S’ is the m  n’ stoichiometric matrix where n’ is the total number of fluxes (this includes
fictitious fluxes that only transport material across the system boundary). Every metabolite inside the
system boundary corresponded to a row in the stoichiometric matrix; however, some of these
metabolites were intracellular and some were extracellular (Figures 2b & 4 show how the metabolic
network in converted into the stoichiometric matrix while considering extracellular metabolites). The
stoichiometric matrix was arranged such that the mi internal metabolites were entered first, and then me
external metabolites were entered second (m = mi + me). When the stoichiometric matrix was arranged
in this manner, the matrix U and the Suse matrix took on the following form:

Equation 7
453594170.doc 4
where U is an m  me matrix, I is the me  me identity matrix, and Suuse is a matrix with mi rows.

Equation 6 defined the mass, energy, and redox potential constraints on the metabolic network; thus
effectively defining the capabilities and constraints of the metabolic genotype. All vectors, v’, that
satisfied Equation 6 (nullspace of S’) were steady-state metabolic flux distributions that did not violate
the mass, energy, or redox balance constraints. However, many vectors within the nullspace were not
physiologically feasible, and additional constraints were placed on the metabolic network.

Additional constraints

Equation 6 defined the mass, energy, and redox balance constraints on the metabolic system.
Additional constraints were also placed on the metabolic network, and in the limiting case where all
the constraints on the metabolic network are known (as well as the initial conditions), the intersection
between the nullspace and the region defined by all other constraints may be reduced to a point.
Herein, we have considered the stoichiometric constraints (mass, energy, and redox balance
constraints), capacity constraints on the exchange fluxes, and a limited set of the physicochemical
constraints that includes basic thermodynamics (reversibility and irreversibility of the metabolic
reactions).

The capacity and thermodynamic constraints were realized by constraining the value of the flux
through the metabolic reactions by using linear inequalities ( ; j = 1…n). Capacity
constraints were placed on the fictitious exchange fluxes (Figure 3 illustrates how the constraints are
written for the hypothetical network shown in Figure 2b). The exchange fluxes were defined as
reversible fluxes with the positive direction being inward. The exchange fluxes were simply fluxes
that “transported” external metabolites into and out of the system. They were often equivalent to the
transport fluxes because they were often collinear, and in a steady-state assume that same flux value.
However, the inclusion of the exchange fluxes allowed for several programming conveniences.
Furthermore, the exchange fluxes allowed us to consider the presence of multiple transporters (i.e.
glucose uptake by the PTS system or by the galactose transporter).

The exchange fluxes for inorganic phosphate, ammonia, carbon dioxide, sulfate, potassium, and
sodium were unrestrained (i = - and i = ). The exchange fluxes for the carbon source and
oxygen were constrained to a defined value (i.e. ), and the exchange fluxes for all
metabolites not available in the medium were constrained to zero ( ). The exchange flux for
metabolites capable of leaving the metabolic network (i.e. acetate, ethanol, lactate, succinate, formate,
pyruvate) were always unconstrained in the outward direction (see Figures 2b & 3). Irreversibility was
enforced by setting j=0; if vj was irreversible, otherwise j=-¥ for reversible fluxes. All internal
metabolic reactions were unconstrained in the forward direction (j = ¥).

Optimization method

The formalism described above constrained the operation of the metabolic network. With this
formalism, we have defined the capabilities of the metabolic network, therefore defining what it can
and cannot do. The results produced a feasible region in multidimensional space within which the
steady-state flux vector, v’, must lie. Adding additional constraints can further reduce the size of the
space, and if all constraints are considered (including initial conditions), the feasible region may be
reduced to a point. Herein, we considered the stoichiometric, capacity, and thermodynamic
constraints. These constraints enforced simultaneously, led to the definition of the feasible region that

453594170.doc 5
contains all feasible steady-state flux vectors that satisfy the imposed constraints. Within this set, we
can find a particular steady-state metabolic flux vector that maximizes/minimizes an objective
function.

Herein, we utilized an objective function and linear programming to find a feasible steady-state flux
vector that maximizes an objective function. The solution to Equation 6, subject to the inequality
constraints, was formulated as a linear programming (LP) problem. Mathematically, the LP problem
was stated as;

Equation 8

where Z is the objective function that was represented as a linear combination of metabolic fluxes vi.
For our analysis, the vector c was defined as the unit vector in the direction of the growth flux, vgrowth.

The growth flux was defined in terms of the biosynthetic requirements based on the biomass
composition defined in the literature 25,27,28 . Thus, biomass generation was defined as a reaction flux
draining the intermediate metabolites in the appropriate ratios (Figure 2b & 4), and this flux was
defined as the objective function 25,29 . A commercially available package was used to solve the LP
problem (LINDO, Lindo Systems Inc.).

The methodology for formulating genomically derived in silico metabolic genotypes described above
provided a computational method for the analysis of the metabolic physiology and the systemic
metabolic constraints.

Metabolic pathway reconstruction

The assignment of gene functions based solely on sequence similarity often provides an incomplete set
of metabolic pathways. However, a metabolic pathway reconstruction takes into consideration the
comprehensiveness of the entire metabolic network or metabolic pathway 17,18,31,32 . Within the
framework of metabolic pathway reconstructions, the accuracy of each functional assignment is
examined in the context of the holistic function of the metabolic network. For example, an amino acid
biosynthesis pathway may be complete except for a single aminotransferase; in this case, it is likely
that the respective biochemical activity is assumed by a nonspecific aminotransferase 17 . Furthermore,
functional assignments for which only a single reaction in a pathway is identified should be further
investigated for accuracy.

The metabolic pathway reconstruction that we have used was examined by comparing the known
behavior and nutritional requirements of E. coli to the in silico analysis. We have utilized FBA to
assist in our metabolic pathway reconstruction. The capability of the metabolic network to synthesize
each metabolite in the biomass requirements on various carbon sources was examined. Inconsistencies
between the in silico analysis and the experimental observations were further investigated. The
complete list of reactions that were used in the analysis is available on the web.

Phenotype phase plane analysis


All feasible E. coli in silico metabolic flux distributions are mathematically confined to the feasible set,
which is a region in flux space (Ân), where each solution in this space corresponds to a particular
internal metabolic flux distribution (or a particular metabolic phenotype) 15 . Optimal metabolic
453594170.doc 6
behavior under specified growth conditions can be determined from this set of all possible phenotypes
using LP.

Phenotype phase planes (PhPPs): PhPPs are essentially two (or three) -dimensional representations of
the feasible set and the formalism for constructing the PhPP is briefly discussed next. Two parameters
that describe the growth conditions (such as substrate and oxygen uptake rates) were defined as the two
axes of the two-dimensional space. The optimal flux distribution was calculated (using LP) for all
points in this plane by repeatedly solving the LP problem while adjusting the exchange fluxes defining
the two-dimensional space. A finite number of qualitatively different metabolic pathway utilization
patterns were identified in such a plane 33 , and lines were drawn to demarcate these regions. Each
phase is denoted by Pnx,y, where “n” denotes the number of the demarcated phase (see Figure 5 for an
example), and “x,y” denotes the two uptake rates on the axis of the PhPP. The PhPP can also be
generated for a mutant genotype; represented as Pgenenx,y.

One demarcation line in the PhPP is defined as the line of optimality (LO). This line represents the
optimal relation between respective metabolic fluxes. The LO is identified by varying the x-axis flux
and calculating the optimal y-axis flux with the objective function defined as the growth flux 33 .

Conclusions
In summary, the ability to reconstruct genome-scale metabolic maps calls for the development of
methods to analyze their integrative behavior. Flux balance analysis has shown utility for such
analysis. Phenotype phase planes were developed to obtain a broad view of the metabolic genotype-
phenotype relation based on FBA.

References

1. Weng, G., Bhalla, U. S. & Iyengar, R. Complexity in biological signaling systems. Science 284,
92-6 (1999).

2. Kanehisa, M. Databases of biological information. Trends Guide to Bioinformatics, 24-26


(1998).

3. Palsson, B. O. What lies beyond bioinformatics? Nature Biotechnology 15, 3-4 (1997).

4. Edwards, J. S. & Palsson, B. O. How will bioinformatics influence metabolic engineering?


Biotechnology and Bioengineering 58, 162-169 (1998).

5. Hess, B. & Boiteux, A. Oscillatory organization in cells, a dynamic theory of cellular control
processes. Hoppe-Seylers Zeitschrift fur Physiologische Chemie 349, 1567 - 1574 (1968).

6. Tyson, J. J. & Othmer, H. G. The dynamics of feedback control circuits in biochemical


pathways. Progress in Theoretical Biology 5, 1 - 62 (1978).

453594170.doc 7
7. Goodwin, B. C. Oscillatory organization in cells, a dynamic theory of cellular control
processes. Academic Press, New York (1963).

8. Savageau, M. A. Biochemical systems analysis. I. Some mathematical properties of the rate law
for the component enzymatic reactions. J Theor Biol 25, 365-9 (1969).

9. Heinrich, R., Rapaport, S. M. & Rapaport, T. A. Metabolic regulation and mathematical


models. Progress in Biophysics and Molecular Biology 32, 1 - 82 (1977).

10. Kacser, H. & Burns, J. A. The control of flux. Symposium for the Society of Experimental
Biology 27, 65 - 104 (1973).

11. Tomita, M. et al. E-CELL: software environment for whole-cell simulation. Bioinformatics 15,
72-84 (1999).

12. Bonarius, H. P. J., Schmid, G. & Tramper, J. Flux analysis of underdetermined metabolic
networks: The quest for the missing constraints. Trends in Biotechnology 15, 308-314 (1997).

13. Edwards, J. S., Ramakrishna, R., Schilling, C. H. & Palsson, B. O. in Metabolic Engineering
(eds. Lee, S. Y. & Papoutsakis, E. T.) 13-57 (Marcel Deker, 1999).

14. Edwards, J. S. & Palsson, B. O. Systems Properties of the Haemophilus influenzae Rd


Metabolic Genotype. Journal of Biological Chemistry 274, 17410-17416 (1999).

15. Varma, A. & Palsson, B. O. Metabolic Flux Balancing: Basic concepts, Scientific and Practical
Use. Bio/Technology 12, 994-998 (1994).

16. Sauer, U., Cameron, D. C. & Bailey, J. E. Metabolic capacity of Bacillus subtilis for the
production of purine nucleosides, riboflavin, and folic acid. Biotechnology and Bioengineering
59, 227-238 (1998).

17. Bono, H., Ogata, H., Goto, S. & Kanehisa, M. Reconstruction of amino acid biosynthesis
pathways from the complete genome sequence. Genome Research 8, 203-10 (1998).

18. Selkov, E., Maltsev, N., Olsen, G. J., Overbeek, R. & Whitman, W. B. A reconstruction of the
metabolism of Methanococcus jannaschii from sequence data. Gene 197, GC11-26 (1997).

19. Covert, M. W. et al. Metabolic Modeling of Microbial Strains in silico. Trends in Biochemical
Sciences In Press (2001).

20. Neidhardt, F. C. (ed.) Escherichia coli and Salmonella: cellular and molecular biology (ASM
Press, Washington, D.C., 1996).

21. Karp, P. D., Riley, M., Paley, S. M., Pellegrini-Toole, A. & Krummenacker, M. EcoCyc:
Encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Research 26, 50-3
(1998).

22. Selkov, E., Jr., Grechkin, Y., Mikhailova, N. & Selkov, E. MPW: the Metabolic Pathways
Database. Nucleic Acids Research 26, 43-5 (1998).

453594170.doc 8
23. Kanehisa, M. A database for post-genome analysis. Trends in Genetics 13, 375-6 (1997).

24. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277,
1453-74 (1997).

25. Pramanik, J. & Keasling, J. D. Stoichiometric model of Escherichia coli metabolism:


Incorporation of growth-rate dependent biomass composition and mechanistic energy
requirements. Biotechnology and Bioengineering 56, 398-421 (1997).

26. Varma, A. & Palsson, B. O. Stoichiometric flux balance models quantitatively predict growth
and metabolic by-product secretion in wild-type Escherichia coli W3110. Applied and
Environmental Microbiology 60, 3724-3731 (1994).

27. Neidhardt, F. C., Ingraham, J. L. & Schaechter, M. Physiology of the bacterial cell (Sinauer
Associates, Inc., Sunderland, MA, 1990).

28. Ingraham, J. L., Maalce, O. & Neidhardt, F. C. Growth of the bacterial cell. Sinauer associates
Inc., Sutherland, Massachusetts (1983).

29. Varma, A. & Palsson, B. O. Metabolic capabilities of Escherichia coli: II. Optimal growth
patterns. Journal of Theoretical Biology 165, 503-522 (1993).

30. Vallino, J. & Stephanopoulos, G. Metabolic Flux Distributions in Corynebacterium glutamicum


During Growth and Lysine Overproduction. Biotechnology and Bioengineering 41, 633-646
(1993).

31. Overbeek, R., Larsen, N., Smith, W., Maltsev, N. & Selkov, E. Representation of function: the
next step. Gene 191, GC1-GC9 (1997).

32. Selkov, E. et al. The metabolic pathway collection from EMP: the enzymes and metabolic
pathways database. Nucleic Acids Res 24, 26-8 (1996).

33. Edwards, J. S., Ramakrishna, R. & Palsson, B. O. Characterizing phenotypic plasticity: A phase
plane analysis. Submitted (In review).

453594170.doc 9
453594170.doc 10
453594170.doc 11
453594170.doc 12
453594170.doc 13
453594170.doc 14
Supplementary Table 1: The list of reactions in E. coli in silico

EMP Pathway

Tagatose-6-phosphate kinase agaZ TAG6P + ATP  TAG16P + ADP 2.7.1.-

Tagatose-bisphosphate aldolase 2 gatY TAG16P « T3P2 + T3P1 4.1.2.-

Tagatose-bisphosphate aldolase 1 agaY TAG16P  T3P2 + T3P1 4.1.2.-

Glycerol
Glycerol kinase glpK GL + ATP ® GL3P + ADP 2.7.1.30
Glycerol-3-phosphate-dehydrogenase-[NAD(P)+] gpsA GL3P + NADP « T3P2 + NADPH 1.1.1.94

Nucleosides and Deoxynucleosides

Phosphopentomutase deoB DR1P  DR5P 5.4.2.7

Phosphopentomutase deoB R1P  R5P 5.4.2.7

Deoxyribose-phosphate aldolase deoC DR5P  ACAL + T3P1 4.1.2.4

Aspartate & Asparagine Biosynthesis

Asparate transaminase aspC OA + GLU  ASP + AKG 2.6.1.1

Asparagine synthetase (Glutamate dependent) asnB ASP + ATP + GLN  GLU + ASN + AMP + PPI 6.3.5.4

Aspartate-ammonia ligase asnA ASP + ATP + NH3  ASN + AMP + PPI 6.3.1.1

Etc.......

453594170.doc 15

You might also like