The “Modern” View of Bacterial Genome Dynamics Genome Dynamics and Environmental Adaptation in Bacteria

Eric Alm
• Horizontal Gene Transfer is rampant • Closely related strains harbor lots of newly acquired DNA • HGT is a key mechanism for niche adaptation • “native” genes are insulated from dynamics at the periphery of networks

Depts. Of Biological Eng. And Civil and Environmental Eng., MIT Broad Institute of MIT and Harvard

Uptake of Foreign DNA
• • • • Transformation Phage Conjugation Genomic islands as reservoirs of new DNA

How Common Is It?
• Marine isolates of co-existing microdiversity

Thompson et al., Science 2005

• Large variation in genome size among closely related strains
Colemann et al., Science 2006

Genome Dynamics (HGT) at the Periphery

Pal, Papp & Lercher, Nature Genetics, 2005. *horizontal gene transfer into the E. coli lineage since its split from the Vibrio lineage.

From: Lerat et al. (2005) PLoS Biol 3(5): e130

5. 4. 3. The rate of protein evolution follows a molecular clock Less important proteins evolve faster Conservative substitutions occur more frequently than disruptive ones Gene duplication allows emergence of new functions Positive Darwinian selection is less common than drift or purifying selection = (substitutions / site) (substitutions / site / billion years) x (billions of years) Evolutionary distance = rate X time .Responses to Natural Selection Environment Environment Responses to Natural Selection Environment Environment HGT Novel genes retained in genome HGT Novel genes retained in genome Native genes gene evolutionary rate variation Comparing Rates Across Species Microarray analogy: genomes as natural experiments on genes Genomes/Experiments The System: Gammaproteobacteria Intracellular parasites Enterobacteria Human pathogens Marine heterotrophs Soil bacteria Plant associated Motoo Kimura: Some Principles of Molecular Evolution (1974) Gene s 1. 2.

population size.genome) ⋅ t Overview of the Method Seed possible orthologs: Single copy ubiquitous COGs ρ(gene family) .Principle #1: Molecular clock.55. Ribosome) Align and build trees ~1000 gene families β(genome) .1. (e.0 Slow: ν < 0. etc.4e-7 Fast: ν > 4.genome) ⋅ t Protein Family and Molecular Clock Explain Most Distance Variation Observed branch length log2(r⋅t) 5 0 -5 ÷ = -10 -10 -5 0 5 Predicted branch length log2(ρ⋅β⋅t) Residual variation is an estimate of ν What Can We Learn From Residual Variation? • Noise? • Environment-specific selective pressures – Positive selection – Negative selection – Relaxed negative selection Outgroup Negative Selection FAST Lost ? Fisher's exact test: Odds Ratio = 3. Compare to species phylogeny ν(gene.g. Rate of change depends on mutation rate.Principle # 2 & 3: More important proteins evolve slowly. P = 2.01 .Evolutionary Distance = r⋅ t = ρ(gene family) ⋅ β(genome) ⋅ ν(gene.25 • Similar patterns in similar genes? Odds Ratio = 0.Principle #5: Positive or negative selection? Read out terminal branch “lengths” Normalize against family rate and molecular clock ? 744 gene families KH-test Reject outliers Evolutionary Distance = r⋅ t = ρ(gene family) ⋅ β(genome) ⋅ ν(gene.genome) . P = 0.

05 <0.05 <0.001 <0.01 <0.001 <0.05 Metabolism of Idiomarina Lost: sugar transporters. APS Wigglesworthia H.05 <0. parahaemolyticus B. Photo. transaldolase. Photo. (2004) Proc. USA 101. ducreyi V.Selective Sweeps Positive Selection? P=0. coli Photorhabdus V. APS Wigglesworthia H.05 <0.05 <0.05 <0. Natl.05 eno V. profundum Hou.05 <0. aphid.05 <0.05 Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes) Species E.05 <0.01 <0.01 <0.… tpi pfk pgi Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes) Species E. APS Wigglesworthia H.001 <0. aphid.05 <0.001 <0. parahaemolyticus B.05 <0.05 <0. 18036-18041 .01 <0. Sci.05 <0. Xylella fastidiosa Idiomarina loihi.05 <0.05 <0. G6PD.001 <0. Photo. Xylella fastidiosa Idiomarina loihi.01 <0.01 <0. profundum COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0.05 <0. Acad.05 <0. ducreyi V. parahaemolyticus B. vulnificus Yersinia pestis Idiomarina loihi. ducreyi V. coli Photorhabdus COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0.05 <0.01 <0. Shaobin et al.05 Hypergeometric test for enrichment of COG functions in fast/slow (top 10% of genes) Species E. Xylella fastidiosa Idiomarina loihi.01 <0.01 <0.001 <0. vulnificus Yersinia pestis Idiomarina loihi. profundum COG Function Motility & Secretion Motility & Secretion Amino acid metabolism Ion Transport & metabolism Coenzyme transport Cell Division Nucleic acid metabolism Motility & Secretion Carbohydrate metabolism Energy production Amino acid metabolism Cell Division Enrichment fast fast slow slow slow fast slow fast fast slow fast fast Bonferroni P-value <0. vulnificus Yersinia pestis Idiomarina loihi. aphid. coli Photorhabdus V.

coli * * * * * * * * * * Photor. prepilin signal peptidase PulO and related peptidases Do correlations in ν between rows (genes) indicate similar functional roles? Selection Acts Coherently Across Pathways/Functions Analysis of Patterns of Selection Genomes/Experiments Do correlations in ν between columns (genomes) indicate similar ecology? Evolution of Evolutionary Rates No Correlation With Phylogeny Over Shorter Timespans Correlation of ν across all genes (orthologs) for each pair of genomes Deep-branching clades show significant correlation in genome-wide selective patterns Gene s Gene s * Flagellin-specific chaperone . major pilin Type II secretory pathway. pilus retraction ATPase Tfp pilus assembly protein.COG 1377 1684 3418 4787 1261 3190 4967 1815 1345 1516 4786 1706 1677 2805 4969 1989 Name FlhB FliR FlgN FlgF FlgA FliO PilV FlgB FliD FliS FlgG FlgI FliE PilT PilA PulO E. * * * * * Yersinia Flagellar biosynthesis pathway Flagellar biosynthesis pathway * Flagellar biosynthesis/type III secretory pathway chaperone Flagellar basal body rod protein Flagellar basal body P-ring biosynthesis protein Flagellar biosynthesis protein * Tfp pilus assembly protein Flagellar basal body protein Flagellar capping protein Analysis of Patterns of Selection Genomes/Experiments * * * * * * * Flagellar basal body rod protein Flagellar basal body P-ring protein Flagellar hook basal body protein Tfp pilus assembly protein.

8e-5 <0. they were too short even to reach the mouth. Museum of Science.34 0. Indirect selection Inferring Genome Dynamics Reconciliation . 1979 HGT Novel genes retained in genome “Front legs a puzzle: how Tyrannosaurus used its tiny front legs is a scientific puzzle.2e-16 2.detail evolutionary ‘events’ that explain discrepancies between gene and species phylogenies Possible events: -Horizontal gene transfer -Gene loss -Gene duplication .Explanatory information. 0.30 0.0001 <0.02 HGT Direct selection Novel genes retained in genome Direct selection v X dist QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.19 0.0001 ns 0.Responses to Natural Selection Environment Environment ‘A critique of the adaptionist programme’ Gould & Lewontin.4e-10 4. Indirect Selection Environment Environment Gene Content Influences Selection on Genes? Test v X g-c (partial) Spearman corr. v X time (v X g-c | dist) (v X g-c | time) Indirect selection (v X dist | g-c) (v X time | g-c) gene evolutionary rate variation Summary of Rate Variation • Variation in evolutionary rates provides can inform studies of natural selection • Co-evolution of lineage-specific rates may imply similar function • “What is the environment of a gene?” – Direct vs. 1979 Native genes gene evolutionary rate variation Direct vs.11 P-value <2. They may have been used to help the animal rise from a lying position.” .44 0. Boston c.09 0.41 0.

Background: The “DownPass” Algorithm in Phylogenetic Inference A ‘Downpass’ Algorithm for Reconciliation • • • What information is passed from leaves to parents? – sequence and score Reconciliation proceeds by labeling each node in gene tree as HGT. Dup. or Speciation (loss is implied) Pass LCA (and score) of each subtree from leaves to root 5 2 4 4 1 4 4 4 2 5 4 1 4 4 1 2 3 Species 4 1 1 2 4 Gene 4 The Algorithm 3 1 3 The Algorithm 1 2 3 4 3 3 3 0 Downpass species tree 1 2 3 4 Species tree Species tree 1 1 2 Gene tree Calculate optimal scenario resulting in each possible LCA 4 4 1 1 2 Gene tree Calculate optimal scenario resulting in each possible LCA 4 4 The Algorithm For all LCAs at parent: For all LCAs at left child: For all LCAs at right child: Real Data O(ngns3) •COG100: 30S ribosomal subunit protein S11 1 2 3 4 Species tree 1 2 Gene tree Calculate optimal scenario resulting in each possible LCA 1 4 4 .

gene trees may have significant uncertainty Bootstrap trees are a convenient but very limited sample of different topologies Consensus trees discard information • • • Don’t fear the bootstrap embrace it! Reconcile ALL bootstraps: For each subtree reconciliation.Species Species Gene 32 transfers!! Uncertainty in Gene Trees Love the Bootstrap • • • Even with a good species phylogeny. check other bootstraps for more efficient reconciliation The Idea The Idea .

Return best answer and merge tables 1. plausible gene tree The Algorithm The Algorithm • • Each internal node of each bootstrap has three potential parents For each node. Reconcile children Bootstrap trees The Algorithm The Algorithm 2. Reconcile children Bootstrap trees . Reconcile same node in bootstrap trees 3. Reconcile same node in bootstrap trees 2. Reconcile children Bootstrap trees 1. three tables of potential LCAs must be maintained 1.The Idea Reconciliation meets construction • • Reconciliation as a tool for tree construction Incorporation of bootstrap subtrees explores a very large region of plausible “tree space” • Constructed tree is most parsimonious.

Return best answer and merge tables 2. Return table to parent • • 3. Reconcile children 2 4 1 4 4 Bootstrap trees 4 …different entries in the same table can have different subtree topologies! Rooting trees is easy! Real Data Revisited! •COG100: 30S ribosomal subunit protein S11 • • Iterate through all branches Root at branch with best reconciliation Reconciliation Species 7 transfers Reconciliation events .The Algorithm It Gets Messy… 4. select best reconciliation to represent linked subtrees. 5 1. Reconcile same node in bootstrap trees • link subtrees across bootstraps Find path through all bootstrap trees optimizing reconciliation After all subtrees reconciled.

Summary Acknowledgements • • • • • Jesse Shapiro (Evolutionary rates) Lawrence David (Reconciliation) Sonia Timberlake (Evolution of regulation) Sean Clarke (HGT in the laboratory) Arne materna (Experimental evolution) • • • • • • • • • • Possible to reconcile gene and species trees efficiently Uncertainty in gene trees can hamper reconciliation Use bootstraps to sample reasonable subsets of tree space Are there 7 transfers for COG100? Wrong species phylogeny Need more bootstraps Gold-standard? All metabolic genes Co-evolution among genes with similar function? Next steps? .