You are on page 1of 10

ARTICLE IN PRESS

Metabolic Engineering 9 (2007) 258267 www.elsevier.com/locate/ymben

Global transcription machinery engineering: A new approach for improving cellular phenotype
Hal Alper, Gregory Stephanopoulos
Department of Chemical Engineering, Massachusetts Institute of Technology, Room 56-469, Cambridge, MA 02139, USA Received 23 October 2006; received in revised form 15 December 2006 Available online 8 January 2007

Abstract It is now generally accepted that most cellular phenotypes are affected by many genes. As a result, engineering a desired phenotype would be facilitated enormously by simultaneous multiple gene modication, yet the capacity to introduce such modications is very limited. Here, we demonstrate that the components of global cellular transcription machinery (specically, s70) can be engineered to allow for global perturbations of the transcriptome, which can help unlock complex phenotypes. Results from three distinct phenotypes (ethanol tolerance, metabolite overproduction, and multiple phenotypes) are provided as proof-of-concept. In each case, the tool of global transcription machinery engineering (gTME) outperformed traditional approaches by quickly and more effectively optimizing phenotypes. r 2007 Elsevier Inc. All rights reserved.
Keywords: Sigma factor engineering; Ethanol tolerance; Metabolic engineering; Directed evolution; Lycopene production

1. Introduction Multiple genetic modications are often required to unlock cellular phenotypes of interest. However, most current cellular and metabolic engineering approaches rely almost exclusively on the deletion or over-expression of single genes due to experimental limitations in vector construction, transformation efciencies, and screening capacity. These limitations preclude the simultaneous exploration of multiple gene modications and conne gene modication searches to restricted, sequential approaches that often have difculties reaching a global phenotype optimum due to the complexity of metabolic landscapes (Alper et al., 2005a, b). Even more so, subsets of genes that are benecial only when perturbed simultaneously are inaccessible through the application of greedy search algorithms (one in which gene targets are sequentially identied to continuously improve phenotype). As a result, it would be advantageous to explore the genomic space through multiple, simultaneous perturbations.
Corresponding author. Fax: +1 617 253 3122.

E-mail address: gregstep@mit.edu (G. Stephanopoulos). 1096-7176/$ - see front matter r 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.ymben.2006.12.002

To address these limitations, alternative methods have been investigated; however, these approaches can be limited in the breadth of accessible genes due to the reliance on specic transcription factors or DNA-binding motifs (Gerber et al., 1994; Kim et al., 1997; Park et al., 2003). Cellular systems have nevertheless optimized the capacity to self-regulate their thousands of genes through ne-tuning components of global transcription machinery. We have previously shown the capacity to induce multigenic perturbations to eukaryotic cellular systems and elicit an improved tolerance to high ethanol and glucose through the mutagenesis and selection of TATA-binding proteins (Alper et al., 2006). In bacterial systems, sigma factors focus the promoter preferences of the RNA polymerase (Burgess and Anthony, 2001) and prior work using promoter-b-galactosidase fusions suggest that mutations in key residues can alter this preference (Gardella et al., 1989; Siegele et al., 1989; Malhotra et al., 1996; Owens et al., 1998). In this work, the rpoD gene, which encodes the main sigma factor, s70, was subjected to random mutagenesis and introduced into Escherichia coli to search for varying cellular phenotypes. This sigma factor was chosen on the premise that mutations will alter the promoter

ARTICLE IN PRESS
H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267 259

preferences of RNA polymerase affecting transcription levels and thus modulating the transcriptome at a global level (Gardella et al., 1989; Siegele et al., 1989; Malhotra et al., 1996; Owens et al., 1998). The rpoD gene and upstream, intergenic promoter region were subjected to error-prone PCR and cloned into a low-copy expression vector (see Supplementary Information 1). A nearly 106 viable-mutant library was initially constructed and transformed into strains. This screening and selection was performed in the background a wild-type E. coli DH5a strain containing the endogenous, unmutated chromosomal copy of rpoD. As such, this genetic screen places the mutant protein in competition with the wild-type version and thus can allow for the identication of unique mutants which only function in the presence of an unmutated version. By taking the approach of library complementation and phenotype selection, we wish to engineer these factors through identifying mutants, which impart a desired cellular phenotype such as tolerance to a chemical or specic metabolic capacity. By focusing the mutagenesis and selection on the single s70 protein, it is possible to explore a larger mutant protein space. While such a mutant protein may be explored in the context of a genomic library complementation and general mutagenesis, limitations in vector construction, transformation efciencies, and screening capacity preclude the effective probing of the s70 protein space and have been ineffective in nding such mutants. In undertaking such an engineering approach, the identied mutant may contain simple mutations or more unique deletions, each of which may be the subject of further studies to understand the ultimate function of how these engineered factors impart the phenotype. However, the major goal of such an engineering exercise is the elicitation of the phenotype. To this end, three distinct and diverse phenotypes of (a) tolerance to ethanol, (b) metabolite overproduction, and, (c) multiple, simultaneous phenotypes, were investigated as proof-of-concept. Each of these phenotypes has been studied by traditional methods of randomized cellular mutagenesis, gene complementation, knockout searches, and microarray analysis, with limited success to-date (Hemmi et al., 1998; Zaldivar et al., 2001; Gill et al., 2002; Gonzalez et al., 2003). 2. Materials and methods 2.1. Strains and media E. coli DH5a (Invitrogen) was used for routine transformations as described in the protocol as well as for the ethanol tolerance and ethanol/sodium dodecyl sulfate (SDS) phenotypes as described below. The lycopene production phenotype was investigated in the following backgrounds as described below and in prior work: E. coli K12 PT5-dxs, PT5-idi, PT5-ispFD background, E. coli K12 Dhnr PT5-dxs, PT5-idi, PT5-ispFD background, E. coli DgdhA DaceE DfdhF K12 PT5-dxs, PT5-idi, PT5ispFD background, E. coli K12 DgdhA DaceE DPyjiD PT5-

dxs, PT5-idi, PT5-ispFD background (Alper et al., 2005a, b; Yuan et al., 2006). Strains were grown at 37 1C with 225 RPM orbital shaking in either LB-Miller medium or M9-minimal medium containing 5 g/L D-glucose and supplemented with 1 mM thiamine (Maniatis et al., 1982). Medium was supplemented with 34 mg/ml of chloramphenicol for low-copy plasmid propagation (pHACM plasmids) and 68 mg/ml of chloramphenicol (pACYC184 plasmids and pUC19-Cm plasmids), 20 mg/ml kanamycin (pACLYC-Kan plasmids), and 100 mg/ml ampicillin (pUC19 plasmids) for higher-copy plasmid maintenance as necessary. Specically, the pHACM plasmid (a low-copy plasmid described below) which harbors all sigma factor mutants and controls for this experiment were always cultured, selected, and isolated using 34 mg/ml of chloramphenicol. Cell density was monitored spectrophotometrically at 600 nm. M9 Minimal salts were purchased from US Biological, X-gal was purchased from American Bioanalytical and all remaining chemicals were from Sigma-Aldrich. Primers were purchased from Invitrogen. 2.2. Library construction A low-copy (5 copies/cell) host plasmid (pHACM) was constructed using pUC19 (Yanisch-Perron et al., 1985) as a host background strain and replacing ampicillin resistance with chloramphenicol using the CAT gene in pACYC184 (Chang and Cohen, 1978) and the SC101 origin of replication from pSC101 (Bernardi and Bernardi, 1984). The chloramphenicol gene from pACYC184 was amplied with AatII and AhdI restriction site overhangs using primers CM_sense_ AhdI: GTTGCCTGACTCCCCGTCGCCAGGCGTTTAAGGGCACCAATAAC and CM_anti_AatII: CAGAAGCCACTGGAGCACCTCAAAACTGCAGT. This fragment was digested along with the pUC19 backbone and ligated together to form pUC19-Cm. The pSC101 fragment from pSC101 was amplied with AIII and NotI restriction site overhangs using primers pSC_sense_AIII: CCCACATGTCCTAGACCTAGCTGCAGGTCGAGGA and pSC_anti_NotI: AAGGAAAAAAGCGGCCGCACGGGTAAGCCTGTTGATGATACCGCTGCCTTACT. This fragment was digested along with the pUC19-Cm construct and ligated together to form pHACM grown in 34 mg/ml of chloramphenicol. The rpoD gene and native promoter region was amplied from E. coli genomic DNA using HindIII and SacI restriction overhangs to target the lacZ gene in pHACM to allow for blue/white screening using primers rpoD_ sense_SacI: AACCTAGGAGCTCTGATTTAACGGCTTAAGTGCCGAAGAGC and rpoD_anti_HindIII: TGGAAGCTTTAACGCCTGATCCGGCCTACCGATTA;AT. These primers provide for the amplication of the intergenic region between the dnaG gene and rpoD gene as well as the entire rpoD gene. Fragment mutagenesis was performed using the GenemorphII Random Mutagenesis kit (Stratagene) using various concentrations of initial

ARTICLE IN PRESS
260 H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267

template to obtain low (04.5 mutations/kb), medium (4.59 mutations/kb), and high mutation (916 mutations/ kb) rates as described in the product protocol. Following PCR, these fragments were puried using a Qiagen PCR cleanup kit, digested by HindIII and SacI overnight, ligated overnight into a digested pHACM backbone, and transformed into E. coli DH5a competent cells. Cells were plated on LB-agar plates containing 34 mg/ml of chloramphenicol and scraped off to create a liquid library. The total library size of white colonies was approximately 106. The same procedure (and library size) was used for the three rounds of mutagenesis used in the ethanol-tolerance experiment. 2.3. Phenotype selection Samples from the liquid library were placed into challenging environments to select for surviving mutants. A detailed protocol for phenotype selection and evaluation is provided in the Supplementary Methods section in Supplementary Information 6. All mutant strains were compared to a control strain, which harbored the unmutated version of the sigma factor under native promoter control cloned into the pHACM plasmid described above and cultured in the presence of 34 mg/ml of chloramphenicol. As a result, the inuence of the plasmid and interference between both plasmid and chromosomal copies of rpoD are neutralized through the use of the control. All mutant sigma factors were retransformed into a fresh strain background before nal analysis to separate out the effects of the sigma factor from any chromosomal mutations, which may have been acquired during the selection phase. All data presented in gures and tables correspond to the strains retransformed with these mutant factors. 2.4. Sequence analysis Sequences of mutant sigma factors were sequenced using the following set of primers: S1: CCATATGCGGTGTGAAATACCGC, S2: CACAGCTGAAACTTCTTGTCACCC, S3: TTGTTGACCCGAACGCAGAAGA, S4: AGAAACCGGCCTGACCATCG, A1: GCTTCGATCTGACGGATACGTTCG, A2: CAGGTTGCGTAGGTGGAGAACTTG, A3: GTGACTGCGACCTTTCGCTTTG, A4: CATCAGATCATCGGCATCCG, A5: GCTTCGGCAGCATCTTCGT, and A6: CGGAAGCGATCACCTATCTGC. Sequences were aligned and compared using Clustal W version 1.82. 2.5. Transcriptional prole Ethanol strains were grown to an OD of approximately 0.40.5 and RNA was extracted using the Qiagen RNeasy Mini Kit. Microarray services were provided by Ambion, Inc. using the Affymetrix E. coli 2.0 arrays. Arrays were run in triplicate with biological replicates to allow for

statistical condence in differential gene expression. From Affymetrix CEL les, Bioconductor software (Gentleman et al., 2004) was used to conduct the data analysis with RMA background correction, qspline normalization (Workman et al., 2002), and Li and Wong (2001) expression value summaries. For the downstream data summaries, only those probes with a corresponding E. coli b-number were used. Microarray data was deposited in the GEO database under accession number GSE3665. A completed MIAME checklist is provided in Supplementary Information 3. 3. Results 3.1. Eliciting a tolerance phenotype Mutants of the sigma factor library were rst selected on the basis of growth in the presence of high ethanol concentrations in complex medium (Yomano et al., 1998), a phenotype which, at present, is limiting prospects of industrial bioethanol production (Zaldivar et al., 2001). For this selection, strains were serially subcultured twice at 50 g/L of ethanol overnight, and then plated to select a total of 20 colonies, which were subsequently assayed for growth in varying ethanol concentrations. After conrming that the improved phenotype was indeed conferred by the mutant factor (through retransformation into a new host strain), the best mutant sigma factor was subjected to two additional rounds of mutation and selection. In each round of mutagenesis/selection, the single, isolated mutant sigma factor imparting the most improved growth phenotype to the cell (as measured by growth rates at all concentrations of ethanol tested) was selected for full characterization including retransformation, growth rate assays and sequencing. This isolated, characterized mutant factor was termed the best mutant from a given round of mutagenesis. With both subsequent rounds, the selection concentration was increased to 60 and 70 g/L of ethanol. In these enrichment experiments, mutants were isolated after both 4 and 8 h of incubation due to the strong selection pressure used, and decreased cellular viability after 20 h. These mutants were compared in growth performance to that of a control strain expressing an unmutated version of the rpoD gene on the same plasmid used for the study. Initial studies of growth characteristics of strains in the presence of 50 g/L of ethanol indicated that both this control strain and an alternative control containing only a blank plasmid (no rpoD gene) had similar growth rates. As a result, we chose to use the strain containing the unmutated rpoD gene as the control strain for all further studies in this work as the inuence of the plasmid and interference between both plasmid and chromosomal copies of rpoD are neutralized through the use of the control. The best isolated mutant from each round shows improved overall growth at all ethanol concentrations tested when compared to both the control strain as well

ARTICLE IN PRESS
H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267 261

A
10 log cells (OD600)

20 g/L 1 log cells (OD600) 0 1 0 0.1 2 4 6 8 10 2

40 g/L log cells (OD600) 4 6 8 10

50 g/L 1 0 2 4 6 8 10

0.1

0.1

0.01 Time (h) 60 g/L 1 0 log cells (OD600) 2 4 6 8 10 log cells (OD600)

0.01 Time (h) 70 g/L 1 0 0.1 2 4 6 8 10

0.01 Time (h)

0.1

0.01

0.01 Time (h)

0.001 Time (h) Time (h)

B
Region 1.1 Region 1.2 Region 2 Region 3 Region 4 Native 70

F221L l287V P60S T95l l249F O204G I511V R603C

Log cells (OD600) Region 2 Region 3 Region 4 Round 1


R603C

Region 1.1

Region 1.2

Region 4

Round 2

A542V E538V H518L E555V

D566N E605G L607Q

Region 4

Round 3

Fig. 1. Isolation of ethanol-tolerant sigma factor mutants. Strains were isolated containing mutant sigma factors, which increased the tolerance to ethanol. (A) Growth curves are presented for the strain harboring the best s70 mutant isolated from the third round of mutagenesis and selection (Red) and control strains harboring the wild-type s70 (Blue). This round 3 mutant has signicantly improved growth rates at all tested ethanol concentrations. Error bars present the standard deviation of independent growth analysis conducted using biological replicates of independent rpoD mutant transformants. (B) The location of mutations in the best mutant from each round of mutagenesis is shown on the s70 protein in relation to previously identied critical functional regions (Gruber and Gross, 2003). The second round mutagenesis resulted in the identication of a truncated factor containing only one of the two prior mutations in that region.

as the best, isolated strain(s) from the previous round(s) of selection (Fig. 1A and Supplementary Fig. 1). Additionally, these mutants have higher growth rates and higher minimum inhibitory concentrations (MICs), as

measured by the maximum sustainable concentration (Supplementary Table 1), than reported ethanol-tolerant strains obtained through traditional methods of strain improvement. As a point of reference, previously identied

ARTICLE IN PRESS
262 H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267

ethanol-tolerant strains obtained through serial subculturing of an E. coli B background exhibited doubling times of 46 h in 50 g/L of ethanol (Yomano et al., 1998) compared to the 3.5 h (in a DH5a background) presented in this work. Furthermore, the strain harboring the best mutant factor isolated from the third round in this work had 6 h doubling time in 60 g/L of ethanol, whereas growth was not seen in the identied E. coli B mutant. While differences in the genotype and basil growth and ethanol tolerance of these two strains make direct comparison difcult, the improvements obtained through gTME are signicant when compared with improvements obtained the random evolution/selection procedure used by Yomano et al. (1998). It should also be noted that although all isolated strains harboring the mutant sigma factors exhibited improved ethanol growth characteristics, the growth phenotype of the mutant strains in the absence of ethanol was not impacted (Supplementary Table 1). 3.2. Characterization of ethanol-tolerance mutants Fig. 1B (and Supplementary Information 2A) identies the sequences of the best mutant isolated from each round of mutagenesis. Interestingly, the second round mutation led to the formation of a truncated factor, which is apparently instrumental in increasing overall ethanol tolerance. This truncation, arising from an artifact in the restriction enzyme digestion and sequence similarities, includes part of region 3 and the complete region 4 of the protein. Region 4 is known to be responsible for binding to the promoter region and to anti-sigma factors, and a truncated form has been previously shown to have an increased binding afnity to anti-sigma factors relative to that of the full protein (Sharma et al., 1999). It is therefore possible that this truncated mutant may impact transcription through a novel interaction with anti-sigma factors and various activation complexes. However, interactions with anti-factors would be expected to only cause overexpression and does not account for the downregulated genes found in these mutants, which indicates the presence of potential competitive binding mechanisms that prevents the wild-type, endogenous sigma factor from transcribing some genes; however, the exact mechanism of these truncated sigma factors is presently unknown. Furthermore, mutations in the R603 site, occurring in rounds 1 and 2, have been implicated in a reduction of transcriptional capacity at most promoters tested as well as interactions with the alpha subunit of RNAP to interact with UP regions in the DNA (Lonetto et al., 1998; Ross et al., 2003). Nevertheless, these three rounds of mutagenesis and resulting sequences suggest an important distinction compared with protein-directed evolution. In the latter case, mutations which increase protein function are typically additive in nature (Wells, 1990; Zhang et al., 1995). However, this is certainly not the case when altering transcription machinery as these factors act as conduits to the transcriptome. In this regard, many local maxima may

occur in the sequence space due to the various subsets of gene alterations, which may lead to an improved phenotype. These sequences and identied mutations are not meant to imply uniqueness as several other mutants have been isolated with improved tolerance, albeit the mutants presented here showed the greatest improvement among the tested isolated strains from each round. From these experiments, it is not possible to attribute the phenotype to any one specic mutation. Such analysis would either require the generation of sufcient sequence-level diversity to lter out spurious mutation or experiments testing the effect of systematically designed mutations (Jensen et al., 2006). 3.3. Transcriptional analysis of ethanol-tolerance mutants In an effort to quantify the transcriptional changes in cells harboring the mutant sigma factors, the transcriptome of these strains was assayed using DNA microarrays under various conditions (Supplementary Data 1 and 2). First, all strains (including the plasmid-based, wild-type rpoD control) were assayed in a complex LB medium in the absence of ethanol to assess the impact of the mutant sigma factors on transcription. In general, the transcriptional results validated the capacity of mutant sigma factors to elicit simultaneous global transcription-level alterations. Specically, a total of 72 genes were differentially expressed in cells harboring the third round mutant compared to the control at a p-value threshold of 0.001 (44 upregulated and 28 downregulated). Similarly, 125 and 82 genes were altered in the rst and second round mutant, respectively. At a threshold of p-value o0.005 across all three rounds, a total of 40 genes are altered in a similar fashion between each of these three rounds of mutagenesis. Supplementary Data 3 provides a summarizing list of these genes with altered expression. These results suggest that mutant sigma factors through each round are converging on a subset of important genes, despite the gross sequence level difference seen between these various mutant factors (both amino acid changes as well as the truncation of the protein). Furthermore, these results echo prior observations suggesting that ethanol tolerance is a phenotype controlled by many genes (Gonzalez et al., 2003). Finally, since the mRNA transcript of the truncated mutant is shorter than the whole length rpoD, it is possible to deduce the increased transcript level of rpoD in the round 1 mutant compared with that of a round 2 or 3 mutant when comparing the microarray probe corresponding to this region. On the array, the hybridization occurs in a region further 50 than the truncation, the difference between the signal from the rpoD probe in a strain containing the full-length mutant and that of a strain containing the truncated mutant can help estimate the level of expression of the mutant sigma factor. Such an analysis suggests that the ratio of expression for the mutant rpoD to endogenous rpoD in the rst round mutant is approximately 3.6:1, compared with the expected ratio of nearly 5:1 based on plasmid copy

ARTICLE IN PRESS
H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267 263

number, which suggests either native regulation or other cellular factors limiting maximal output of this native promoter. Finally, the transcriptomes of the best mutant from round 3 and the control strain were further assayed at varying levels of ethanol (20 and 40 g/L for round 3 and 20 g/L for the control, see Supplementary Data 2). These results illustrate the complex, pleiotropic impact of ethanol in the control strain. As an example, the presence of ethanol in the control strain initiated a generic stress response consisting of 354 genes differentially expressed (at a p-value threshold of 0.001), many of which are typically associated with cellular stress responses such as groEL, htpG, dnaK and marA (Supplementary Data 4 and Supplementary Information 4). The mutant sigma factor from any round (in the absence of ethanol) alters signicantly fewer, but still a good number of genes, some of which overlap with the generic stress response (Supplementary Information 4) including ompW, sodA, and ftnA. However, this strain is able to grow in the presence of elevated ethanol and similarly, the response to ethanol is varied compared with the control. In this new response, many genes previously related with ethanol stress response in the wild-type have now been pre-programmed as a result of the introduction of the mutant sigma factor, while an additional set of genes are altered by ethanol, representing a new mode of response (Supplementary Information 4). This analysis illustrates that ethanol tolerance is highly complex and regulated by a multitude of genes. Furthermore, it should be noted that an analysis of promoter binding sites and consensus sequences for the most highly altered genes failed to unveil any substantial leads into a particular characteristic responsible for the observed genetic reprogramming. The putative targets extracted from this analysis can provide invaluable leads to key genes responsible for ethanol tolerance. However, this transcriptional data must be analyzed in light of gene expression altered indirectly through regulatory networks, which may mask the direct effects of these mutant sigma factors. 3.4. Eliciting a metabolic phenotype Beyond tolerance phenotypes, the method of global transcription machinery engineering (gTME) was found to be effective for improving the phenotype of metabolite overproduction. Lycopene production in E. coli was chosen as a representative metabolic phenotype. Lycopene may be recombinantly produced in E. coli through the glycolytic intermediates of pyruvate and glyceraldehyde-3-phosphate to form precursor monomers, which subsequently undergo polymerization to form the ultimate 40-carbon lycopene product. In prior work, a computational method based on global cellular stoichiometry was employed to identify single-and multiple-gene knockout targets for lycopene production in E. coli. These approaches and targets were complemented with combinatorial searches to identify unknown and regulatory targets. When combined, these

searches led to further increases of lycopene production and allowed for the visualization of the resulting metabolic landscape. The visualization of the metabolic landscape uncovered the presence of two globally optimum strains with respect to lycopene accumulation: one a purely stoichiometric-based strain (DgdhA DaceE DfdhF), and the other the result of combining the systematic and combinatorial targets (DgdhA DaceE DPyjiD) (Alper et al., 2005a, b). In this study, we sought to use the technique of gTME to enhance lycopene production and compare the impact to traditional metabolic engineering approaches. Utilizing the pre-engineered parental strain (K12 PT5-dxs, PT5-idi, PT5ispFD harboring pAC-LYC), Dhnr (one of the combinatorial gene targets identied through transposon mutagenesis), and the two identied global maximum strains, DgdhADaceEDfdhF, and DgdhADaceEDPyjiD, it was possible to search for and identify mutant sigma factors (based on a colorimetric screen) yielding increased lycopene production, independently in each of the above genetic backgrounds. Several mutants were chosen based on increased lycopene content. Each of the best producing mutants from these selected strains harbored different mutated versions of a truncated rpoD, although several, suboptimal, whole-length mutants were also recovered. Sequences for these mutants are provided in Supplementary Information 2B. Fig. 2A illustrates the lycopene content after 15 h for several strains of interest. The single round of gTME in both the parental strain and hnr knockout was able to achieve similar increases in lycopene accumulation as strains previously engineered through the introduction of three distinct gene knockouts. Furthermore, in the backgrounds of these knockout mutants, lycopene levels were further increased through the introduction of an additional, yet distinct mutant sigma factor. These results suggest that, (i) gTME is able to elicit phenotypes of metabolite overproduction and, more importantly, (ii) a single round of selection using gTME is more effective than rounds of single-gene knockout or overexpression modications linked with a search strategy. Moreover, comparing the results of Fig. 2A, it is clear that a single-gTME perturbation outperforms that of a single-genetic perturbation. Furthermore, the identied sigma factor mutant is specic to the genotype such that the mutant sigma factor isolated to improve lycopene production in one genetic background would not improve the lycopene yield in another genetic background, indicating the required mode of transcriptional reprogramming is genotype-specic (Fig. 2B). 3.5. Eliciting a multiple phenotype We nally investigated the ability of gTME as a method to impart multiple, simultaneous phenotypes to a cell. Studying various strategies for eliciting multiple phenotypes is important for strain improvement programs, as it is often desirable to combine two or more cellular traits. Initial investigations using gTME produced the most

ARTICLE IN PRESS
264 H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267

A
7.000

6.000

gTME Single gene knockout

PyjiD

fdhF

5.000
aceE

4.000

gdhA

hnr

ac eE

S1 40

hA

hn

gd

Strain background WS140 hnr gdhA aceE fdhF gdhA aceE PyjiD

Fig. 2. Application of gTME to a metabolite production phenotype. (A) The tool of gTME was compared with traditional methods of strain improvement whereby rational and combinatorial methods are applied to the identication of gene knockout targets aiming to enhance lycopene accumulation in an engineered strain of E. coli. The mutant sigma factor library was introduced into four pre-engineered lycopene over-producing strains to identify factors, which further increase production. Lycopene content, in mg/g dry cell weight (ppm), are presented after 15-h cultivations. The center of the black dots represents the production level of lycopene in ppm for a given strain with the wild-type strain labeled at the bottom of the graph. The arrowheads of curved arrows not terminating at a black dot (e.g. gdhA knockout curve) represent the lycopene production of this strain. Only four main strains were identied through black dots, as these were the four backgrounds that were used in the search for sigma factor mutants. Volumetric production (mg/L) for the parental strain was 4.2 mg/L and improved to 6.3 mg/L using gTME, compared with volumetric productivities of 6.1 and 7.2 mg/L for the strains engineered by traditional methods. These engineered strains were improved to 6.8 and 7.7 mg/L, respectively, using gTME. The volumetric productivity of the hnr mutant strain at 15 h is 0.4 mg/L due to a growth defect reported in previous work (Alper et al., 2005a, b), but was increased to 3.4 mg/L using gTME. (B) A dot plot for each of the 16 strains is shown which depicts the maximum fold increase in lycopene production achieved over the control during the fermentation. The size of the circle is proportional to the fold increase. The landscape is clearly diagonally dominant with mutant factors predominantly working in the strain background in which they were identied. These results suggest that different transcriptome reprogramming (as evinced by the different identied mutant sigma factors) is required for lycopene production in different genotypes.

dh

ac eE

Origin of mutant factor

chosen for an investigation of the capacity to elicit a multiple tolerance phenotype, simply on the basis of the ability to obtain individual tolerances to these two chemicals. As a multiple phenotype may be elicited through several different trajectories, sigma factor mutants were isolated by following four distinct search strategies (Fig. 3): (1) isolate rst an ethanol-tolerant mutant, then create a new mutant library and screen for ethanol and SDS tolerance; (2) isolate rst an SDS mutant, then create a new mutant library and screen for ethanol and SDS tolerance; (3) select for an ethanol/SDS tolerant mutant simultaneously on the single library; or (4) independently select for an ethanol and an SDS mutant and then co-express these two proteins. The best mutant strains obtained through each approach were assayed using a metric based on their growth rate under all possible combinations of 0%, 0.5%, 1% w/w SDS and 0, 25, 50 g/L ethanol. The cumulative phenotype metric (see Fig. 3 gure legend for denition of the phenotype metrics used) is a measure of the extent to which the mutant is able to outperform the control under all nine possible conditions. On the other hand, the ethanol component phenotype metric and the SDS component phenotype metric represent the growth enhancement when only one of the two toxic compounds is varying while the other is kept at the control level of 0 g/L. Fig. 3 summarizes the results of the four possible search strategies. In both the sequential searches and the simultaneous search (strategies 13), there exists a tradeoff between the cumulative phenotype metric and pure component metrics (either SDS or ethanol or both). Of these three routes, the sequential path of selecting for ethanol rst, followed by a new mutagenesis step and selection in ethanol/ SDS is superior. However, the co-expression of the fulllength ethanol mutant and the truncated SDS mutant (strategy 4) imparted the most signicant phenotype (highest overall growth rate improvements) without a sacrice of either pure component growth advantage (either ethanol or SDS), which was present in all the remaining search strategies. In a way, co-expression effectively allowed the additive expression of the two independently identied phenotypes. The strain with co-expressed mutants had a similar individual component phenotype metric compared with the single-phenotype mutant (0.87 vs. 0.89 for ethanol and 0.15 vs. 0.18 for SDS) along with a greatly improved cumulative phenotype metric. In particular, no singlemutant factor was identied which could impart a growth and tolerance advantage comparable to that of co-expression. As such, the expression of a full length and truncated mutant could be a potent method for directing overexpression and knockout modications simultaneously in the cell. These results suggest a powerful method for creating desirable phenotypes. 4. Discussion

Lycopene (ppm)

hF fd

drastic individual, single-tolerance phenotypes for ethanol and for SDS phenotypes. Therefore, for this study, the multiple tolerance phenotype of ethanol and SDS was

Py

jiD

The modication and engineering of the global transcription machinery presented here provides the means to

ARTICLE IN PRESS
H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267 265

Select for Ethanol tolerance

Select for SDS tolerance

Whole-length mutant

2.94 (0.22) 0.89 0.08

-2.49 (0.72) 0.15 0.18

Truncated mutant

Whole-length plus truncated mutant Ethanol selection

4.88 (0.29) 0.87 0.15

Co-express mutants

mutagenesis Ethanol/SDS selection 3.58 (0.36) 0.52 0.08

2.94 (0.22) 0.89 0.08

Multiple phenotype mutants (Ethanol and SDS)

mutagenesis 2.67 (0.38) 0.48 0.02 SDS/Ethanol selection

SDS selection

-2.49 (0.72) 0.15 0.18

Whole-length mutant
Phenotype Metric Matrix 0 SDS (%w/w) 0.0 0.5 1.0 Ethanol(g/L) 25 50

Truncated mutant

Truncated mutant

Truncated mutant

3.17 (0.28) row=


Ethanol metric

0.39

-0.02

Truncated mutant

Ethanol/SDS selection

Cumulative phenotype metric

column=
SDSmetric

all=
Cumulative metric

Ethanol metric

SDS metric

Fig. 3. Eliciting multiple, simultaneous phenotypes using gTME. The tool of gTME was applied to the problem of imparting the multiple, simultaneous phenotype in E. coli of tolerance to both ethanol and SDS. Four distinct, alternative strategies (described in the text) were chosen to search for the best sigma factor mutant. A phenotype assay was conducted whereas the best mutant was assayed for improved growth rate over the control in each of the nine conditions spanning combinations of 0%, 0.5%, 1% w/w SDS and 0, 25, 50 g/L ethanol. The cumulative phenotype metric represents the cumulative sum of components in a matrix of fraction increase in growth-rate over control for these nine conditions, with the experimental error provided in parenthesis, values may be negative if the mutant has reduced growth compared to the control. Component phenotype metrics (either ethanol or SDS) represent the summation of only conditions in which one of the component is varied, while the other is absent. While each search strategy led to an improved mutant, the co-expression of the two independently identied mutant sigma factors (shown on top of gure) imparted an additive tolerance phenotypes in the cell. Sequences for these mutants are provided in Supplementary Information 2C.

making higher-level modications, which can traverse transcriptional control schemes and diverse pathways. As such, the introduction of modied transcription machinery units offer the unique opportunity to elicit a simultaneous global transcription-level alteration that has the potential to impact cellular properties in a very profound way. While experiments in adaptive evolution have identied mutations in sigma factors, illustrating their possible importance in natural evolution (Zinser and Kolter, 1999; NotleyMcRobb et al., 2002; King et al., 2004), this approach is more direct by specically focusing on the global regulatory functions of the bacterial s70 sigma factor to introduce multiple simultaneous gene expression changes. Thus, the approach of gTME facilitates whole-cell engineering

through the selection and identication of a specic mutant protein responsible for a variety of improved cellular phenotypes. Collectively, and by virtue of their diversity and magnitude of achieved phenotype improvement, these three examples illustrate the potential of gTME to mediate global transcriptome changes that allow organisms to access novel cellular phenotypes. In each of these cases, the improvements were signicant compared to alternative methods. Furthermore, these three phenotypes: one of tolerance, one of metabolic production and one of multiple phenotypes provide examples for three of the major attributes desirable for strain improvement and metabolic engineering. These results, (i) illustrate that gTME provides

ARTICLE IN PRESS
266 H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267

for a novel perturbation which can improve phenotypes, (ii) highlight the ability of sequential rounds of renement of this protein in a directed evolution fashion to further improving the cellular phenotype, and (iii) provides a means for making the multiple, simultaneous alterations of gene expression to obtain phenotypes of interest. The approach of gTME has been recently extended to other tolerance phenotypes in E. coli beyond ethanol including obtaining strains with increased resistance to chemicals such as acetate, hexane, p-hydroxybenzoic acid among others (unpublished data). These results are similar in nature to that of ethanol tolerance in that the mutant factors are able to outperform the control strains in growth rate at elevated levels of these stressors. In each of these cases, the improvement in phenotype provided by gTME is substantial compared with other methods and is obtained in a far more efcient manner. Furthermore, the unique identication of the truncated mutant sigma factor arose from the specic genetic screen, which allowed for the mutant factor to be in competition with the native, endogeneous factor. Nevertheless, in each of these examples, it is shown that the global changes brought about by random mutations in the components of transcriptional regulatory machinery improve cellular phenotypes beyond the levels attainable through rational engineering or traditional strain improvement by random mutagenesis. In similar fashion to results with the engineering of yeast transcription machinery (Alper et al., 2006), we have demonstrated the application of gTME to alter cellular phenotype. As such, the gTME paradigm allows for cellular and metabolic engineering to be reduced to a problem of protein evolution. As a result, it is now possible to unlock complex phenotypes regulated by multiple genes, which are essentially unreachable, by the relatively inefcient, iterative search strategies. It is worth noting that the described method can also be applied in reverse to uncover complicated genotypephenotype interactions, as illustrated by the results of the ethanol-tolerance study. In such applications, one would employ a number of highthroughput cellular and molecular assays to assess the altered cellular state and ultimately deduce systematic mechanisms of action underlying the observed phenotype in these mutants. We also envision that this tool can be used to uncover mechanisms responsible for imparted, complex phenotypes such as disease states. Hence, gTME, as described here, is a promising tool to complement current techniques of cellular engineering.

Appendix A. Supplementary materials Supplementary data associated with this article can be found in the online version at doi:10.1016/j.ymben.2006. 12.002.

References
Alper, H., Jin, Y.-S., Moxley, J.F., Stephanopoulos, G., 2005a. Identifying gene targets for the metabolic engineering of lycopene biosynthesis in Escherichia coli. Metab. Eng. 7, 155164. Alper, H., Miyaoku, K., Stephanopoulos, G., 2005b. Construction of lycopene-overproducing E. coli strains by combining systematic and combinatorial gene knockout targets. Nat. Biotechnol. 23, 612616. Alper, H., Moxley, J., Nevoigt, E., Fink, G.R., Stephanopoulos, G., 2006. Engineering yeast transcription machinery for improved ethanol tolerance and production. Science 314, 15651568. Bernardi, A., Bernardi, F., 1984. Complete sequence of pSC101. Nucleic Acids Res. 12, 94159426. Burgess, R.R., Anthony, L., 2001. How sigma docks to RNA polymerase and what sigma does. Curr. Opin. Microbiol. 4, 126131. Chang, A.C., Cohen, S.N., 1978. Construction and characterization of ampliable multicopy DNA cloning vehicles derived from the P15A cryptic miniplasmid. J. Bacteriol. 134, 11411156. Gardella, T., Moyle, H., Susskind, M.M., 1989. A mutant E. coli s70 subunit of RNA polymerase with altered promoter specicity. J. Mol. Biol. 206, 579590. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y., Zhang, J., 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80. Gerber, H.P., Seipel, K., Georgiev, O., Hofferer, M., Hug, M., Rusconi, S., Schaffner, W., 1994. Transcriptional activation modulated by homopolymeric glutamine and proline stretches. Science 263, 808811. Gill, R.T., Wildt, S., Yang, Y.T., Ziesman, S., Stephanopoulos, G., 2002. Genome-wide screening for trait conferring genes using DNA microarrays. Proc. Natl. Acad. Sci. USA 99, 70337038. Gonzalez, R., Tao, H., Purvis, J.E., York, S.W., Shanmugam, K.T., Ingram, L.O., 2003. Gene array-based identication of changes that contribute to ethanol tolerance in ethanologenic E. coli: comparison of KO11 (parent) to LY01 (resistant mutant). Biotechnol. Prog. 19, 612623. Gruber, T.M., Gross, C.A., 2003. Multiple sigma subunits and the partitioning of bacterial transcription space. Annu. Rev. Microbiol. 57, 441466. Hemmi, H., Ohnuma, S., Nagaoka, K., Nishino, T., 1998. Identication of genes affecting lycopene formation in E. coli transformed with carotenoid biosynthetic genes: candidates for early genes in isoprenoid biosynthesis. J. Biochem. (Tokyo) 123, 10881096. Jensen, K., Alper, H., Fischer, C., Stephanopoulos, G., 2006. Identifying functionally important mutations from phenotypically diverse sequence data. Appl. Environ. Microbiol. 72, 36963701. Kim, J.S., Kim, J., Cepek, K.L., Sharp, P.A., Pabo, C.O., 1997. Design of TATA box-binding protein/zinc nger fusions for targeted regulation of gene expression. Proc. Natl. Acad. Sci. USA 94, 36163620. King, T., Ishihama, A., Kori, A., Ferenci, T., 2004. A regulatory trade-off as a source of strain variation in the species E. coli. J. Bacteriol. 186, 56145620. Li, C., Wong, W.H., 2001. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA 98, 3136.

Acknowledgments We acknowledge nancial support of this work by the DuPont-MIT Alliance and the Singapore-MIT Alliance (SMA1). We also acknowledge Joel Moxley for assistance with microarray analysis.

ARTICLE IN PRESS
H. Alper, G. Stephanopoulos / Metabolic Engineering 9 (2007) 258267 Lonetto, M.A., Rhodius, V., Lamberg, K., Kiley, P., Busby, S., Gross, C., 1998. Identication of a contact site for different transcription activators in region 4 of the E. coli RNA polymerase sigma70 subunit. J. Mol. Biol. 284, 13531365. Malhotra, A., Severinova, E., Darst, S.A., 1996. Crystal structure of a sigma 70 subunit fragment from E. coli RNA polymerase. Cell 87, 127136. Maniatis, T., Fritsch, E.F., Sambrook, J., 1982. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Notley-McRobb, L., King, T., Ferenci, T., 2002. rpoS mutations and loss of general stress resistance in E. coli populations as a consequence of conict between competing stress responses. J. Bacteriol. 184, 806811. Owens, J.T., Miyake, R., Murakami, K., Chmura, A.J., Fujita, N., Ishihama, A., Meares, C.F., 1998. Mapping the sigma 70 subunit contact sites on E. coli RNA polymerase with a sigma 70-conjugated chemical protease. Proc. Natl. Acad. Sci. 95, 60216026. Park, K.S., Lee, D.K., Lee, H., Lee, Y., Jang, Y.S., Kim, Y.H., Yang, H.Y., Lee, S.I., Seol, W., Kim, J.S., 2003. Phenotypic alteration of eukaryotic cells using randomized libraries of articial transcription factors. Nat. Biotechnol. 21, 12081214. Ross, W., Schneider, D.A., Paul, B.J., Mertens, A., Gourse, R.L., 2003. An intersubunit contact stimulating transcription initiation by E. coli RNA polymerase: interaction of the alpha C-terminal domain and sigma region 4. Genes Dev. 17, 12931307. Sharma, U.K., Ravishankar, S., Shandil, R.K., Praveen, P.V.K., Balganesh, T.S., 1999. Study of the interaction between bacteriophage T4 asiA and E. coli sigma 70, using the yeast two-hybrid system: 267 neutralization of asiA toxicity to E. coli cells by coexpression of a truncated sigma 70 fragment. J. Bacteriol. 181, 58555859. Siegele, D.A., Hu, J.C., Walter, W.A., Gross, C.A., 1989. Altered promoter recognition by mutant forms of the sigma 70 subunit of E. coli RNA polymerase. J. Mol. Biol. 206, 591603. Wells, J.A., 1990. Additivity of mutational effects in proteins. Biochemistry 29, 85098517. Workman, C., Jensen, L.J., Jarmer, H., Berka, R., Gautier, L., Nielser, H.B., Saxild, H.H., Nielsen, C., Brunak, S., Knudsen, S., 2002. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 3, research0048. Yanisch-Perron, C., Vieira, J., Messing, J., 1985. Improved M13 phage cloning vectors and host strains: nucleotide sequences of the M13mp18 and pUC19 vectors. Gene 33, 103119. Yomano, L.P., York, S.W., Ingram, L.O., 1998. Isolation and characterization of ethanol-tolerant mutants of E. coli KO11 for fuel ethanol production. J. Ind. Microbiol. Biotechnol. 20, 132138. Yuan, L.Z., Rouviere, P.E., LaRossa, R.A., Suh, W., 2006. Chromosomal promoter replacement of the isoprenoid pathway for enhancing carotenoid production in E. coli. Metab. Eng. 8, 7990. Zaldivar, J., Nielsen, J., Olsson, L., 2001. Fuel ethanol production from lignocellulose: a challenge for metabolic engineering and process integration. Appl. Microbiol. Biotechnol. 56, 1734. Zhang, X., Baase, W., Shoichet, B., Wilson, K., Matthews, B., 1995. Enhancement of protein stability by the combination of point mutations in T4 lysozyme is additive. Protein Eng. 8, 10171022. Zinser, E.R., Kolter, R., 1999. Mutations enhancing amino acid catabolism confer a growth advantage in stationary phase. J. Bacteriol. 181, 58005807.

You might also like