Professional Documents
Culture Documents
Sequence‐based Bacterial Typing: Concepts and Approaches
OHSI 2022
7/25/2022
Mostafa Ghanem, University of Maryland
mghanem@umd.edu
Outline
• Different Sequence Based Typing Approaches.
• Concept of Each Approach
• Advantages And Disadvantages
1
7/25/2022
Molecular typing (Genotyping)
a) DNA banding pattern based methods (RAPD, PFGE, RFLP)
b) DNA hybridization‐based methods (Microarrays)
c) DNA sequencing based methods
Sequence based typing
Seq1 ACTGTTGGCACGATTT
Seq2 AGTGTTGCCACGATTT
Seq3 ACTGTTGGCAGGATTT
Seq4 ACTGTAGGCACGATTT
Seq5 ACTGTAGGCACGATTA
Sanger method,
1977
The first molecular sequence based phylogenetic classification of living organisms into three main domains
4
2
7/25/2022
What is a SNP?
Single Nucleotide Polymorphism (SNP)
ATGTTCCTC sequence
ATGTTGCTC reference
*phylogenetically informative differences
Insertion or Deletion (Indel)
ATGTTCCCTC sequence
ATGTTC-CTC reference
*differences not used in hqSNP analysis
Large recombination event that introduces a large
prophage
Use of sequence data to assess relatedness of
organisms
Differences in sequences can be used to assess relatedness of
organisms and the likelihood of recent common ancestor
Definition of “recent” becomes important – recent in years or generation
times
Salmonella in a dry processing plant may stay dormant and rarely if ever multiply (or
imagine anthrax spores in soil)
Salmonella in a chicken flock may multiply every 30 min (>7,500 times a year)
Assessing relationships of microbial isolates typically requires
more information than just sequence data
3
7/25/2022
Level of discrimination
Low – few or multiple stable genes – look at long term evolutionary trends
High – more genes, possibly variable gene(s) ‐ outbreak investigation / local surveillance
Typing approaches
Protein Serotyping
DNA
PFGE
Pulsed Field Gel Electrophoresis
Total gDNA fragments
16S rRNA
Information
Ribosomal RNA Sequencing
1 gene
Sequencing
MLST
Multi Locus Sequence Typing
7 genes
wgMLST
Whole Genome Multi Locus Sequence Typing
Thousands of reference genes plus pan genome
WGS
wgSNP or hqSNP
Whole Genome Single Nucleotide Polymorphism Typing
Total gDNA
4
7/25/2022
Sequence Based Approaches
Single locus based methods
Multilocus Sequence Typing (MLST)
K‐mer–based typing approaches
High quality SNP (hqSNP) typing approaches
Allele based typing approach(rMLST‐ cgMLST‐wgMLST)
WGS for phenotypic typing (AMR typing, virulence typing)
5
7/25/2022
Typing Approach evaluation criteria
Typeability Capacity to produce clearly interpretable results with most strains of
the bacterial species
Reproducibility Capacity to repeatedly obtain the same typing profile result with the
same bacterial strain
Discriminatory power Ability to produce results that clearly allow differentiation between
unrelated strains of the same bacterial species
Single locus based methods
• 16S rRNA region sequence
• Ribosomal intergenic spacer analysis
• 16S–23S IGSR of MG
• Surface variable and polymorphic genes
vlhA typing of MS
spa typing for Staph
Disadvantage:
Insufficient discriminatory power
Reliability & Evolutionary relationship
12
6
7/25/2022
Traditional MLST (Allele based typing)
• The gold standard before WGS
• HK Genes, Population structure 13
http://beta.mlst.net/Instructions/default.html
Sequence type (ST) VS Clonal Complex (CC)
• ST is identified based on the allelic profile of the 7 genes
• CC is a group of STs that are similar in 5 or more alleles to a
central (ancestral) sequence type.
• A ST could be
• Single locus variant (SLV)
• Double locus variant (DLV)
7
7/25/2022
Sequence type (ST) VS Clonal Complex (CC)
Fig a minimum spanning tree for the 101 samples and eight clonal complexes typed by the seven loci MLST.
15
Ghanem and El‐Gazzar, 2019
Traditional MLST
Advantages:
• High typeability & reproducibility
• Sequence‐based‐ high accuracy and relative discrimination.
• Central database‐ easier exchange of data and comparison of strains
globally.(https://pubmlst.org/)
• Expandable nomenclature
• Now can be performed using WGS
Disadvantages:
• Not useful for organisms with conserved HKGs.
• Targets selected to represent population structure, not as useful for
outbreak detection
16
8
7/25/2022
Whole genome sequence based approaches
Whole Genome Sequencing
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
Results interpretation
and action
9
7/25/2022
K‐mer–based typing approaches
Compare the genome small piece‐by‐small piece
to find pieces that are different
• Assembly and alignment free approach
• Avoid the high computational needs
• Faster and more suitable for real time epidemiological typing
Disadvantage
• Doesn’t consider the genomic context and sequence quality at each locus
• No true phylogenetic relationship
19
High Quality Single Nucleotide Polymorphisms
(hqsnps) Typing Approaches
Reference genome
• Serve as a suitable template against which all reads
from other genomes will be mapped to it.
Reference
mapping
• Has to be a closely related to the mapped samples.
• Selection of ref. genome vary according: SNP
Genomic diversity of the organism Detection
Aim of the study
Context of the investigation SNP
Evaluation
20
10
7/25/2022
What makes a SNP high quality (hq)?
Sequence
Sequence reads Apply a quality filter that filters out
Reads nucleotides in sequence reads for
comparison based on sequence coverage
Sequence and quality
reads
Quality filtered Sequence
Reads ready for analysis
The alphabet soup of analysis – Coverage
Any single location on the genome can have
zero to hundreds of sequence reads that
cover the one region
11
7/25/2022
What to call a SNP ATGTTACTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
SNPs called based on: ATGTTCCTC
ATGTTTCTC
ATGTTCCTC ATGTTCCTC
ATGTTCCTC ATGTTCCTC
• Quality ATGTTCCTC
ATGTTGCTC
ATGTTCCTC
ATGTTGCTC
reference
• Coverage Is it a SNP?
• Base frequency
The differences between the reference and
compared genome are extracted and used
to determine relatedness
(hqsnps) Typing Approaches
Detected SNP screened for
Functional implications (genic or inter‐genic,
•
synonymous or non‐synonymous).
• Distribution across the reference genome (eg.
Highly variable regions)
SNP related to recombination are excluded.
24
12
7/25/2022
Where to call a SNP?
Not all SNP pipelines are equal – where you call SNPs will affect the total SNP
count
SNPs relevant for phylogenetic analysis are vertically transmitted, not
horizontally, so horizontal genetic elements like phages can be masked
Mobile elements
genes
Raw reads
Mask mobile elements Only call SNPs in genes
‐do no consider SNPs in this location
High Quality Single Nucleotide Polymorphisms
(hqsnps) Typing Approaches
• Remaining SNP are representative to parsimoniously
informative loci within the core bacterial genome
• hqSNP typing approach is the most widely used whole
genome strain typing methods.
• Works best with clonal organisms like salmonella
26
13
7/25/2022
How to report SNP data – keep it simple
New Cluster: 2016039
Hi folks:
Two isolates are 0 SNPs from each other:
E2017003216 (SE77B52)
E2017003039 (SE77B52)
Limitations of Whole‐genome SNP typing approaches
• Difficulty to apply for long term or global scale multi‐outbreak
analysis
• Computational resources and experienced bioinformatics are
necessary
• challenging in bacteria with high genomic diversity and /or
extra‐chromosomal or mobile genetic elements.
• Creating standard method for WGS SNP typing very difficult and
impractical.
28
14
7/25/2022
hqSNP analyses
Advantages Disadvantages When to Use
Phylogenetically Requires a closely related reference Good for situations where a
informative genome – hqSNP analysis is wgMLST database has not
(build a tree consistent problematic if reference genome is been developed and validated.
with evolution of the not closely related May provide highest amount of
strains) resolution for strain
comparison
SNP position can be Takes a while and requires a lot of
identified on genome computer power
(gene affected can be
identified)
Interpretation of data depends on
genomes added – is not stable and
does not lead to nomenclature
Allele based typing
(rMLST‐ cgMLST‐wgMLST)
Expanding the concept of MLST from 7 genes to genome‐wide gene by gene based typing
approach.
= ≠
Reproducibllity + Portablity
Cg Cg
Discriminatory Power
SNP MLST
100s‐1000s genes
few
gene’s
SNP MLST 5‐7 genes
30
15
7/25/2022
Allele based typing
(rMLST‐ cgMLST‐wgMLST)
Database is built from gene content representing a diverse selection of the
genus/species of the organism being compared
Each unique gene is referred to as a “locus”
Any changes – SNP, insertions, deletions – equals a new allele call for a locus
New alleles are named sequentially when encountered‐ not based on
sequence
2 SNPs 1 indel
Locus 1 ACTAGAGGGAAA ACTAGAGGCTAA ACT-GAGGGAAA
allele 1 allele 2 allele 3
Allele based typing
(rMLST‐ cgMLST‐wgMLST)
Allows for simpler analysis and clear naming of subtypes
Performs comparison on a gene by gene level
Isolate A Isolate B Isolate C
Locus 1 (20 nt) 1 1 1
Locus 2 (100nt) 8 8 12
Locus 3 (5000nt) 5 5 2
Etc.
Locus 2,005 (5nt) 4 4 4
wgMLST type A A B
16
7/25/2022
http://www.ridom.de/seqsphere/cgmlst/
Fig. Standardized hierarchical microbial WGS typing approach. From bottom to top
with increasing discriminatory power.
MLST vs cgMLST
cgMLST
Discriminatory Power
7 genes MLST
34 Ghanem and El‐Gazzar, 2018
5 genes MLST
17
7/25/2022
Allele based typing
(rMLST‐ cgMLST‐wgMLST)
The allele calls at each locus are compared between isolates and
differences are used to determine relatedness
Fig. A) Seven‐locus MLST
dendrogram displaying 101
samples including sanger
sequenced clinical samples.
The sequence type (ST) and
the clonal complex (CC) B)
cgMLST dendrogram
displaying 81 clinical and
reference MG samples.
36
Ghanem and El‐Gazzar, 2019
18
7/25/2022
The new way to use in silico MLST in the NGS era.
Kimura et al., 2017
37
Fig. Schematic representation of the Bacterial Isolate Genome Sequence Database Platform and the gene‐by‐gene
approach to nucleotide sequence analysis.
38
Cody et al., 2014
19
7/25/2022
cgMLST
• Became very popular
• Public databases
• (cgMLST.org‐ BIGSdp)
39
20
7/25/2022
Allele based typing approach
(rMLST‐ cgMLST‐wgMLST)
• Unique and expandable nomenclature
• Can be standardized
• No need for reference genome for mapping
• Applied to related and non related genomes (multiple outbreaks)
• Computationally less intensive
• Lineage specific SNP/allele approaches can be used to gain more
discriminatory power.
41
Allele based typing approach
(rMLST‐ cgMLST‐wgMLST)
Faster than analyzing SNP differences
For WGS data, allele calls can be performed on short
reads (“assembly free”) and assembled genomes
(“assembly‐based”)
If there is a conflict between the allele calls then no
allele call is made
42
21
7/25/2022
Limitations of cgMLST
• Dependence on variation within a set of predefined
loci.
• Information within noncoding sequences or non‐
predefined loci will not be included in the analysis.
43
Advantages and Caveats of wgMLST analysis
22
7/25/2022
hqSNP versus cgMLST Analysis
Both analyses conducted from the same raw data (typically
short read sequencing data)
For public health purposes, both correlate well
i.e the outermost branches of phylogenetic trees are
almost identical
The two are not mutually exclusive
For some use cases cgMLST works better, others SNP
works better
45
Limitations of WGS based typing approaches
Defining the gold standard???
• When to judge two isolates as indistinguishable, closely related,
possibly related, different
• Relating results to clinical and epidemiological data
• Using results to answer different questions
• How the results compare to traditional typing methods (PFGE)
Cost and difficulty of analysis
23
7/25/2022
Reference Characterization by WGS
“One Shot” Characterization of STEC
Summary of Potential WGS Applications
Outbreak investigation
Sporadic vs outbreak
Not just cluster but phylogenetic relationships
Microbial Source Tracking (MST)
Microbial Surveillance
Food
Environment
Animals, soil, food prep areas, hospitals, etc
Antibiotic resistance monitoring
Genotype predicts phenotype
Mobile vs integrated
Virulence gene monitoring
What else???
24
7/25/2022
Summary
Different Sequence Based Typing Approaches.
Concept of Each Approach
Advantages and Disadvantages
49
Resources and References
New York Integrated Food Safety Center of Excellence, Molecular Epidemiology and Sequencing Approaches
in Public Health ‐ Webinars
Schürch, A. C., et al. "Whole genome sequencing options for bacterial strain typing and
epidemiologic analysis based on single nucleotide polymorphism versus gene‐by‐gene–
based approaches." Clinical Microbiology and Infection 24.4 (2018): 350‐354.
Pérez‐Losada, Marcos, et al. "Pathogen typing in the genomics era: MLST and the future
of molecular epidemiology." Infection, Genetics and Evolution 16 (2013): 38‐53.
Pérez‐Losada, Marcos, Miguel Arenas, and Eduardo Castro‐Nallar. "Microbial sequence
typing in the genomic era." Infection, Genetics and Evolution (2017).
Cody, Alison J., Julia S. Bennett, and Martin CJ Maiden. "Multi‐locus sequence typing and
the gene‐by‐gene approach to bacterial classification and analysis of population
variation." Methods in microbiology. Vol. 41. Academic Press, 2014. 201‐219.
Chui, Linda, and Vincent Li. "Technical and Software Advances in Bacterial Pathogen
Typing." Methods in Microbiology. Vol. 42. Academic Press, 2015. 289‐327.
50
25
7/25/2022
Questions?
Thanks
Mostafa Ghanem
Department of Veterinary Medicine, University of Maryland
301.314.1191/ mghanem@umd.edu
26
7/25/2022
WGS for phenotypic predictions
The center for genomic epidemiology provide web based services
AMR typing using Resfinder
Virulence typing using virulence finder
It uses BLAST for identification of acquired AMR and virulence genes in whole‐
genome data (pre‐assembled, partial or complete genomes).
53
Proof of concept for AMR phenotypic prediction
The predicted resistance phenotype was compared with the original phenotypic resistance
profile
Almost complete agreement between predicted resistance phenotype and phenotypic
testing
Zankari, Ea, et al. "Identification of acquired antimicrobial resistance genes." Journal of antimicrobial chemotherapy67.11 (2012): 2640‐2644.
27
7/25/2022
Pipelines for detection of antimicrobial
resistance genes
AMRFinderPlus
https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial‐resistance/AMRFinder/
ARG‐ANNOT – Antibiotic Resistance Gene‐ANNOTation
http://en.mediterranee‐infection.com/article.php?laref=283%26titre=arg‐annot
ARDB – Antibiotic Resistance Gene Database (not maintained anymore)
https://ardb.cbcb.umd.edu/
CARD – The Comprehensive Antibiotic Resistance Database
https://card.mcmaster.ca/
Resfams
http://www.dantaslab.org/resfams/
SSTAR – Sequence Search Tool for Antimicrobial Resistance
https://github.com/tomdeman‐bio/Sequence‐Search‐Tool‐for‐Antimicrobial‐
Resistance‐ SSTAR‐
28
7/25/2022
Limitations of WGS predictions
Can only identify known resistance genes/mutations
• Novel genes or variants may not be detected if low homology to
known ones
Fragmented genomes
• Complicates identification of resistance elements
• Assembly methods may improve, raw data always available
29