Sequence Typing Approaches OHSI 2022Dr. Ghanem

7/25/2022
Sequence‐based Bacterial Typing: Concepts and Approaches
OHSI 2022
7/25/2022
Mostafa Ghanem, University of Maryland
mghanem@umd.edu
Outline
• Different Sequence Based Typing Approaches.
• Concept of Each Approach
• Advantages And Disadvantages
1
7/25/2022
Molecular typing (Genotyping)
a) DNA banding pattern based methods (RAPD, PFGE, RFLP)
b) DNA hybridization‐based methods (Microarrays)
c) DNA sequencing based methods
Sequence based typing
Seq1 ACTGTTGGCACGATTT
Seq2 AGTGTTGCCACGATTT
Seq3 ACTGTTGGCAGGATTT
Seq4 ACTGTAGGCACGATTT
Seq5 ACTGTAGGCACGATTA
Sanger method,
1977
The first molecular sequence based phylogenetic classification of living organisms into three main domains
4
2
7/25/2022
What is a SNP?
Single Nucleotide Polymorphism (SNP)
ATGTTCCTC sequence
ATGTTGCTC reference
*phylogenetically informative differences
Insertion or Deletion (Indel)
ATGTTCCCTC sequence
ATGTTC-CTC reference
*differences not used in hqSNP analysis
Large recombination event that introduces a large
prophage
Use of sequence data to assess relatedness of
organisms
 Differences in sequences can be used to assess relatedness of
organisms and the likelihood of recent common ancestor
 Definition of “recent” becomes important – recent in years or generation
times
 Salmonella in a dry processing plant may stay dormant and rarely if ever multiply (or
imagine anthrax spores in soil)
 Salmonella in a chicken flock may multiply every 30 min (>7,500 times a year)
 Assessing relationships of microbial isolates typically requires
more information than just sequence data
3
7/25/2022
Medini et al. (2008).
Level of discrimination
Low – few or multiple stable genes – look at long term evolutionary trends
High – more genes, possibly variable gene(s) ‐ outbreak investigation / local surveillance
Typing approaches
Protein Serotyping
DNA
PFGE
Pulsed Field Gel Electrophoresis
Total gDNA fragments
16S rRNA
Information 
Ribosomal RNA Sequencing
1 gene
Sequencing
MLST
Multi Locus Sequence Typing
7 genes
wgMLST
Whole Genome Multi Locus Sequence Typing
Thousands of reference genes plus pan genome
WGS
wgSNP or hqSNP
Whole Genome Single Nucleotide Polymorphism Typing
Total gDNA
4
7/25/2022
Sequence Based Approaches
 Single locus based methods
 Multilocus Sequence Typing (MLST)
 K‐mer–based typing approaches
 High quality SNP (hqSNP) typing approaches
 Allele based typing approach(rMLST‐ cgMLST‐wgMLST)
 WGS for phenotypic typing (AMR typing, virulence typing)
Fig: The number of publications related to bacterial typing methods as a

function of time (Losada, et al., 2013)
5
7/25/2022
Typing Approach evaluation criteria
Typeability Capacity to produce clearly interpretable results with most strains of
the bacterial species
Reproducibility Capacity to repeatedly obtain the same typing profile result with the
same bacterial strain
Discriminatory power Ability to produce results that clearly allow differentiation between
unrelated strains of the same bacterial species
Practicality Method should be versatile, relatively rapid, inexpensive, technically

(ease of performance simple and provide readily interpretable results
& interpretation)
Single locus based methods
• 16S rRNA region sequence
• Ribosomal intergenic spacer analysis
• 16S–23S IGSR of MG
• Surface variable and polymorphic genes
vlhA typing of MS
spa typing for Staph
Disadvantage:
Insufficient discriminatory power
Reliability & Evolutionary relationship
12
6
7/25/2022
Traditional MLST (Allele based typing)
• The gold standard before WGS
• HK Genes, Population structure 13
http://beta.mlst.net/Instructions/default.html
Sequence type (ST) VS Clonal Complex (CC)
• ST is identified based on the allelic profile of the 7 genes
• CC is a group of STs that are similar in 5 or more alleles to a
central (ancestral) sequence type.
• A ST could be
• Single locus variant (SLV)
• Double locus variant (DLV)
7
7/25/2022
Sequence type (ST) VS Clonal Complex (CC)
Fig a minimum spanning tree for the 101 samples and eight clonal complexes typed by the seven loci MLST.
15
Ghanem and El‐Gazzar, 2019
Traditional MLST
Advantages:
• High typeability & reproducibility
• Sequence‐based‐ high accuracy and relative discrimination.
• Central database‐ easier exchange of data and comparison of strains
globally.(https://pubmlst.org/)
• Expandable nomenclature
• Now can be performed using WGS
Disadvantages:
• Not useful for organisms with conserved HKGs.
• Targets selected to represent population structure, not as useful for
outbreak detection
16
8
7/25/2022
Whole genome sequence based approaches
Medini et al. (2008).

17
Whole Genome Sequencing
ATGCGTGATCTAGTAGTCTAGGAGCTGACCGATTA
Results interpretation
and action
9
7/25/2022
K‐mer–based typing approaches
 Compare the genome small piece‐by‐small piece
to find pieces that are different
• Assembly and alignment free approach
• Avoid the high computational needs
• Faster and more suitable for real time epidemiological typing
Disadvantage
• Doesn’t consider the genomic context and sequence quality at each locus
• No true phylogenetic relationship
19
High Quality Single Nucleotide Polymorphisms
(hqsnps) Typing Approaches
Reference genome
• Serve as a suitable template against which all reads
from other genomes will be mapped to it.
Reference
mapping
• Has to be a closely related to the mapped samples.
• Selection of ref. genome vary according: SNP
 Genomic diversity of the organism Detection
 Aim of the study
 Context of the investigation SNP
Evaluation
20
10
7/25/2022
What makes a SNP high quality (hq)?
Sequence
Sequence reads Apply a quality filter that filters out
Reads nucleotides in sequence reads for
comparison based on sequence coverage
Sequence and quality
reads
Quality filtered Sequence
Reads ready for analysis
The alphabet soup of analysis – Coverage
Coverage at 40x Coverage at 5x http://missusrousselee.deviantart.com/art/Alphabet‐

Soup‐134724659
 Any single location on the genome can have
zero to hundreds of sequence reads that
cover the one region
11
7/25/2022
What to call a SNP ATGTTACTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
 SNPs called based on: ATGTTCCTC
ATGTTTCTC
ATGTTCCTC ATGTTCCTC
ATGTTCCTC ATGTTCCTC
• Quality ATGTTCCTC
ATGTTGCTC
ATGTTCCTC
ATGTTGCTC
reference
• Coverage Is it a SNP?
• Base frequency
 The differences between the reference and
compared genome are extracted and used
to determine relatedness
Detected SNP screened for
Functional implications (genic or inter‐genic,
•
synonymous or non‐synonymous).
• Distribution across the reference genome (eg.
Highly variable regions)
SNP related to recombination are excluded.
24
12
7/25/2022
Where to call a SNP?
 Not all SNP pipelines are equal – where you call SNPs will affect the total SNP
count
 SNPs relevant for phylogenetic analysis are vertically transmitted, not
horizontally, so horizontal genetic elements like phages can be masked
Mobile elements
genes
Raw reads
Mask mobile elements Only call SNPs in genes
‐do no consider SNPs in this location
High Quality Single Nucleotide Polymorphisms
• Remaining SNP are representative to parsimoniously
informative loci within the core bacterial genome
• hqSNP typing approach is the most widely used whole
genome strain typing methods.
• Works best with clonal organisms like salmonella
26
13
7/25/2022
How to report SNP data – keep it simple
New Cluster: 2016039
Hi folks:
Two isolates are 0 SNPs from each other:
E2017003216 (SE77B52)
E2017003039 (SE77B52)
Two isolates are 2 SNPs from each other:

E2017002910 (SE1B1)
I2017003132 (SE1B1) 27
Limitations of Whole‐genome SNP typing approaches
• Difficulty to apply for long term or global scale multi‐outbreak
analysis
• Computational resources and experienced bioinformatics are
necessary
• challenging in bacteria with high genomic diversity and /or
extra‐chromosomal or mobile genetic elements.
• Creating standard method for WGS SNP typing very difficult and
impractical.
28
14
7/25/2022
hqSNP analyses
Advantages Disadvantages When to Use
Phylogenetically Requires a closely related reference Good for situations where a
informative genome – hqSNP analysis is wgMLST database has not
(build a tree consistent problematic if reference genome is been developed and validated.
with evolution of the not closely related May provide highest amount of
strains) resolution for strain
comparison
SNP position can be Takes a while and requires a lot of
identified on genome computer power
(gene affected can be
identified)
Interpretation of data depends on
genomes added – is not stable and
does not lead to nomenclature
Allele based typing
(rMLST‐ cgMLST‐wgMLST)
Expanding the concept of MLST from 7 genes to genome‐wide gene by gene based typing
approach.
= ≠
Reproducibllity + Portablity
Cg Cg
Discriminatory Power
SNP MLST
100s‐1000s genes
few
gene’s
SNP MLST 5‐7 genes
30
15
7/25/2022
Allele based typing
 Database is built from gene content representing a diverse selection of the
genus/species of the organism being compared
 Each unique gene is referred to as a “locus”
 Any changes – SNP, insertions, deletions – equals a new allele call for a locus
 New alleles are named sequentially when encountered‐ not based on
sequence
2 SNPs 1 indel
Locus 1 ACTAGAGGGAAA ACTAGAGGCTAA ACT-GAGGGAAA
allele 1 allele 2 allele 3
Allele based typing
 Allows for simpler analysis and clear naming of subtypes
 Performs comparison on a gene by gene level
Isolate A Isolate B Isolate C
Locus 1 (20 nt) 1 1 1
Locus 2 (100nt) 8 8 12
Locus 3 (5000nt) 5 5 2
Etc.
Locus 2,005 (5nt) 4 4 4
wgMLST type A A B
16
7/25/2022
http://www.ridom.de/seqsphere/cgmlst/
Fig. Standardized hierarchical microbial WGS typing approach. From bottom to top
with increasing discriminatory power.
MLST vs cgMLST
cgMLST
Discriminatory Power
7 genes MLST
34 Ghanem and El‐Gazzar, 2018
5 genes MLST
17
7/25/2022
Allele based typing
The allele calls at each locus are compared between isolates and
differences are used to determine relatedness
Fig. A) Seven‐locus MLST
dendrogram displaying 101
samples including sanger
sequenced clinical samples.
The sequence type (ST) and
the clonal complex (CC) B)
cgMLST dendrogram
displaying 81 clinical and
reference MG samples.
36
Ghanem and El‐Gazzar, 2019
18
7/25/2022
The new way to use in silico MLST in the NGS era.
Kimura et al., 2017
37
Fig. Schematic representation of the Bacterial Isolate Genome Sequence Database Platform and the gene‐by‐gene
approach to nucleotide sequence analysis.
38
Cody et al., 2014
19
7/25/2022
cgMLST
• Became very popular
• Public databases
• (cgMLST.org‐ BIGSdp)
39
How to report wgMLST data – keep it simple

Hi folks: Two isolates are 0 alleles from each other:
E2017003216 (SE77B52)
E2017003039 (SE77B52)
Two isolates are 2 alleles from each other:

E2017002910 (SE1B1)
I2017003132 (SE1B1)
40
20
7/25/2022
Allele based typing approach
• Unique and expandable nomenclature
• Can be standardized
• No need for reference genome for mapping
• Applied to related and non related genomes (multiple outbreaks)
• Computationally less intensive
• Lineage specific SNP/allele approaches can be used to gain more
discriminatory power.
41
Allele based typing approach
 Faster than analyzing SNP differences
 For WGS data, allele calls can be performed on short
reads (“assembly free”) and assembled genomes
(“assembly‐based”)
 If there is a conflict between the allele calls then no
allele call is made
42
21
7/25/2022
Limitations of cgMLST
• Dependence on variation within a set of predefined
loci.
• Information within noncoding sequences or non‐
predefined loci will not be included in the analysis.
43
Advantages and Caveats of wgMLST analysis
Advantages Disadvantages When to Use

Phylogenetically informative Initial assignment of alleles is Surveillance,
computationally costly especially for a
distributed testing
network
All virulence, serotyping, and antibiotic Comparing character data (allele numbers) Reference

resistance genes can be pulled out as part of rather than genetic data characterization
analysis
Neutralizes the effects of horizontal gene SNPs and indels treated equally Accurate cluster
transfer detection
Allele calling is stable – data standardizable; Requires curation for allele calls Need to

directly comparable between laboratories; communicate with
reproducibility not dependent on choice of partners using
reference strain; amenable to automated stable
bioinformatics nomenclature
22
7/25/2022
hqSNP versus cgMLST Analysis
 Both analyses conducted from the same raw data (typically
short read sequencing data)
 For public health purposes, both correlate well
 i.e the outermost branches of phylogenetic trees are
almost identical
 The two are not mutually exclusive
 For some use cases cgMLST works better, others SNP
works better
45
Limitations of WGS based typing approaches
 Defining the gold standard???
• When to judge two isolates as indistinguishable, closely related,
possibly related, different
• Relating results to clinical and epidemiological data
• Using results to answer different questions
• How the results compare to traditional typing methods (PFGE)
 Cost and difficulty of analysis
23
7/25/2022
Reference Characterization by WGS
“One Shot” Characterization of STEC
ANI GENUS/SPECIES: Escherichia coli
SerotypeFinder SEROTYPE: O104:H4

PATHOTYPE: Shiga toxin producing and Enteroaggregative E. coli (STEC & EAEC)
VirulenceFinder VIRULENCE PROFILE: stx2a, aggR, aggA, sigA, sepA, pic, aatA, aaiC, aap
7-gene MLST SEQUENCE TYPE: ST678
ResFinder ANTIMICROBIAL RESISTANCE GENES: blaTEM-1, blaCTX-M-15, strAB, sul2, tet(A)A, dfrA7
Phylogenetic ID wgMLST CODE: 102:45.26.35.3
Summary of Potential WGS Applications
 Outbreak investigation
 Sporadic vs outbreak
 Not just cluster but phylogenetic relationships
 Microbial Source Tracking (MST)
 Microbial Surveillance
 Food
 Environment
 Animals, soil, food prep areas, hospitals, etc
 Antibiotic resistance monitoring
 Genotype predicts phenotype
 Mobile vs integrated
 Virulence gene monitoring
 What else???
24
7/25/2022
Summary
 Different Sequence Based Typing Approaches.
 Concept of Each Approach
 Advantages and Disadvantages
49
Resources and References
New York Integrated Food Safety Center of Excellence, Molecular Epidemiology and Sequencing Approaches
in Public Health ‐ Webinars
Schürch, A. C., et al. "Whole genome sequencing options for bacterial strain typing and
epidemiologic analysis based on single nucleotide polymorphism versus gene‐by‐gene–
based approaches." Clinical Microbiology and Infection 24.4 (2018): 350‐354.
Pérez‐Losada, Marcos, et al. "Pathogen typing in the genomics era: MLST and the future
of molecular epidemiology." Infection, Genetics and Evolution 16 (2013): 38‐53.
Pérez‐Losada, Marcos, Miguel Arenas, and Eduardo Castro‐Nallar. "Microbial sequence
typing in the genomic era." Infection, Genetics and Evolution (2017).
Cody, Alison J., Julia S. Bennett, and Martin CJ Maiden. "Multi‐locus sequence typing and
the gene‐by‐gene approach to bacterial classification and analysis of population
variation." Methods in microbiology. Vol. 41. Academic Press, 2014. 201‐219.
Chui, Linda, and Vincent Li. "Technical and Software Advances in Bacterial Pathogen
Typing." Methods in Microbiology. Vol. 42. Academic Press, 2015. 289‐327.
50
25
7/25/2022
Questions?
Thanks
Mostafa Ghanem
Department of Veterinary Medicine, University of Maryland
301.314.1191/ mghanem@umd.edu
26
7/25/2022
WGS for phenotypic predictions
 The center for genomic epidemiology provide web based services
 AMR typing using Resfinder
 Virulence typing using virulence finder
 It uses BLAST for identification of acquired AMR and virulence genes in whole‐
genome data (pre‐assembled, partial or complete genomes).
53
Proof of concept for AMR phenotypic prediction
 The ResFinder tool was utilized to ID resistance genes in 23 bacterial isolates of 5 different

species.
 A predicted resistance phenotype was determined based on resistance genes identified

using blast with 100 % Identity
 The predicted resistance phenotype was compared with the original phenotypic resistance
profile
 Almost complete agreement between predicted resistance phenotype and phenotypic
testing
 Zankari, Ea, et al. "Identification of acquired antimicrobial resistance genes." Journal of antimicrobial chemotherapy67.11 (2012): 2640‐2644.
27
7/25/2022
Proof of concept for genotypic monitoring

1. Zankari E., et al. Genotyping using whole‐genome sequencing is a realistic alternative to surveillance
based on phenotypic antimicrobial susceptibility testing. J Antimicrob Chemother. 2013. 68(4):771‐7.
2. Gordon NC., etal. Prediction of Staphylococcus aureus antimicrobial resistance by whole‐genome
sequencing. J Clin Microbiol. 2014. 52(4):1182‐91.
3. Stoesser N, et al. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella
pneumoniae isolates using whole genomic sequence data. J Antimicrob Chemother. 2013.
68(10):2234‐44.
4. Chakravorty S., et al. Genotypic susceptibility testing of Mycobacterium tuberculosis for amikacin and
kanamycin resistance using a rapid Sloppy Molecular Beacon based assay identifies more cases of low level
drug resistance than phenotypic Lowenstein‐Jensen testing. J Clin Microbiol. 2015 53:43‐51.
5. Tyson, G.H. et al. Whole‐genome sequencing accurately predicts antimicrobial resistance in
Escherichia coli. J Antimicrob Chemother. 2015 Oct;70(10):2763‐9.
6. Zhao, S., et al. Whole genome sequencing analysis accurately predicts antimicrobial resistance phenotypes
in Campylobacter. Appl Environ Microbiol. 2015 Oct 30;82(2)
7. Tyson G.H., et al. Using genotypic methods to determine streptomycin resistance breakpoints for Salmonella
and Escherichia coli. FEMS Microbiol Lett. 2016 Feb;363(4).
And more to come
Pipelines for detection of antimicrobial
resistance genes
 AMRFinderPlus
https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial‐resistance/AMRFinder/
 ARG‐ANNOT – Antibiotic Resistance Gene‐ANNOTation
http://en.mediterranee‐infection.com/article.php?laref=283%26titre=arg‐annot
 ARDB – Antibiotic Resistance Gene Database (not maintained anymore)
https://ardb.cbcb.umd.edu/
 CARD – The Comprehensive Antibiotic Resistance Database
https://card.mcmaster.ca/
 Resfams
http://www.dantaslab.org/resfams/
 SSTAR – Sequence Search Tool for Antimicrobial Resistance
https://github.com/tomdeman‐bio/Sequence‐Search‐Tool‐for‐Antimicrobial‐
Resistance‐ SSTAR‐
28
7/25/2022
Limitations of WGS predictions
 Can only identify known resistance genes/mutations
• Novel genes or variants may not be detected if low homology to
known ones
 Need a comprehensive, accurate, highly curated and updated resistance

gene database
 Expertise needed to analyze data

• Automation making it easier
 Fragmented genomes
• Complicates identification of resistance elements
• Assembly methods may improve, raw data always available
29

Sequence Typing Approaches OHSI 2022Dr. Ghanem

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sequence Typing Approaches OHSI 2022Dr. Ghanem

Uploaded by

Copyright:

Available Formats

7/25/2022

Medini et al. (2008).

Fig: The number of publications related to bacterial typing methods as a

Practicality Method should be versatile, relatively rapid, inexpensive, technically

Medini et al. (2008).

Coverage at 40x Coverage at 5x http://missusrousselee.deviantart.com/art/Alphabet‐

New Cluster: 2016040

Two isolates are 2 SNPs from each other:

How to report wgMLST data – keep it simple

New Cluster: 2016040

Two isolates are 2 alleles from each other:

Advantages Disadvantages When to Use

All virulence, serotyping, and antibiotic Comparing character data (allele numbers) Reference

Allele calling is stable – data standardizable; Requires curation for allele calls Need to

ANI GENUS/SPECIES: Escherichia coli

SerotypeFinder SEROTYPE: O104:H4

Phylogenetic ID wgMLST CODE: 102:45.26.35.3

 The ResFinder tool was utilized to ID resistance genes in 23 bacterial isolates of 5 different

 A predicted resistance phenotype was determined based on resistance genes identified

Proof of concept for genotypic monitoring

 Need a comprehensive, accurate, highly curated and updated resistance

 Expertise needed to analyze data

You might also like